PhishCCA - Classifying phishing sites through content and certificate analysis

Phishing attacks are one of the most commonly used attack vectors in the last decade. Although various counter-measures have been proposed, many of them have huge flaws, particularly using machine learning models which always falls short. Few such particular flaws are lack of usage of appropriate data for the model and the high false positive rate due to the similarity in structure and content between the target site and the phishing site.

In this paper, some of these flaws are tackled through the proposed system - PhishCCA , a two layered classification system which initially classifies the site into the target site(Eg.:Paypal) and then further classifies if it's a phishing site or benign. This is achieved by developing a HTML content based classifier which classifies HTML pages into target sites and a TLS certificate based classifier which further classifies the website as a phishing site or benign. Although the integrated system hasn't been completely built, the HTML classifier achieves an accuracy of 77% and the TLS classifer achieves an accuracy of 98% which showcases the promise in this technique.

Model

Files

Certificate Classifier
- Data
- Cert_LSTM.ipynb - LSTM model for certificate based classification
- Cert_RF.ipynb - Random forest model for certificate based classification
Content Classifer
- Dataset
- HTML_Classification.ipynb - LSTM model for HTML content based classifcation

Results

Model	Accuracy	Precision	Recall
Cert-Random Forest	0.98	0.98	0.998
Cert-LSTM	0.76	0.78	0.72
Content-LSTM	0.77	0.81	0.72

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Certificate Classifier		Certificate Classifier
Content Classifier		Content Classifier
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishCCA - Classifying phishing sites through content and certificate analysis

Model

Files

Results

About

Releases

Packages

Languages

License

arulthileeban/PhishCCA

Folders and files

Latest commit

History

Repository files navigation

PhishCCA - Classifying phishing sites through content and certificate analysis

Model

Files

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages