Analyzing and predicting spear-phishing using machine learning methods


  • Dadvandipour Samad Miskolci Egyetem
  • Ganie Aadil Gani Miskolci Egyetem



Phishing, Machine learning, Hashing, Email Phishing, Topic modeling, RandomForest


Phishing implies misdirecting the client by masking himself/herself as a reliable individual, to take the Critical material, for example, bank account number, credit card numbers, and so on; one of the noticeably utilized Phishing these days is spear phishing, and it is one of the effective phishing assaults given its social, mental boundaries. In this paper, we will mitigate the impact of spear phishing by utilizing the multi-layer approach. The multi-layer approach is the best method of managing the web interruption, as the intruder needs to experience shift levels. Practically all the scientists are dealing with the content of the email; however, this paper picks a novel method to counter the phishing messages by utilizing both the attachment and content of an email. We applied sentimental analysis on emails, including both content of the email and the attachment, to check whether they are spam or not using SVM classifier and Randomforest Classifier; the former showed 96 percent accuracy while, as later offers 97.66 percent accuracy. SVM showed false-positive 0 percent and false-negative 4 percent, while RandomForest showed 0 percent false-positive and 2.33 percent false-negative ratios. We also performed topic modeling using LDA(Latent Dirichlet Allocation)) from Gensim package to get the dominant topics in our dataset. We visualized the results of our topic model using pyLDvis. The perplexity and coherence score of our topic model is -12.897670565510511 and 0.44700287476452394, respectively.