Phishing Detection

The research project examines the effectiveness and robustness of visual similarity-based phishing detection models using a dataset of 450,000 real-world phishing websites.

This project focuses on evaluating the effectiveness and robustness of visual similarity-based phishing detection models by analyzing a dataset of 450,000 real-world phishing websites. The study identifies performance discrepancies among various models and investigates their vulnerabilities to adversarial attacks, particularly those that manipulate visual elements like logos. By comprehensively examining these models, the research underscores the importance of real-world evaluations to enhance the reliability of phishing detection systems.

For the experiment of the phishing research, we collect a real-world phishing dataset (i.e., screenshots and HTML files) with URLs from APWG eCX at the benignning. Then, we refine the dataset by filtering out the pages that involve errors (e.g., HTTP 404) or CAPTCHA. Note that we use the APWG dataset for testing model purpose alone. Second, we prepare two additional datasets: the baseline from combining PhishIntention (Liu et al., 2022) with VisualPhishNet (Abdelnabi et al., 2020) and the extended one with more reference brand. Next, we carefully select six state-of-the-art visual similarity-based anti-phishing techniques and re-train them with these datasets. Third, we systematically evaluate the models with the APWG dataset. Lastly, we manipulate visual components (e.g., logo images) with varying transformations, assessing the robustness of the above models.

This figure illustrates the overview of the experiment.

Table below showcases our innovative approach to processing the dataset through a series of visible manipulation techniques and perturbation-based adversarial attacks. This methodology reflects real-world phishing tactics, allowing us to generate a diverse set of adversarial samples that challenge the detection models. The manipulations include alterations to visual components such as logos and layout, providing a comprehensive assessment of how well the models can withstand sophisticated evasion strategies.

This table shows example and description of visible manipulation methods.

Moreover, I also re-implemented several cutting-edge phishing detection models, including Phishpedia (Lin et al., 2021), PhishIntention (Liu et al., 2022), and PhishZoo (Afroz & Greenstadt, 2011). The results indicate that while these models performed well in controlled settings, they significantly underperformed against real-world phishing samples. These findings reveal critical areas for improvement and highlight the necessity of continuously updating models to adapt to evolving phishing tactics.

If you want to reference our work or read the paper (Ji et al., 2024) for the detail, you can use and check the following BibTeX citation:

@misc{ji2024evaluatingeffectivenessrobustnessvisual,
      title={Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models},
      author={Fujiao Ji and Kiho Lee and Hyungjoon Koo and Wenhao You and Euijin Choo and Hyoungshick Kim and Doowon Kim},
      year={2024},
      eprint={2405.19598},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2405.19598},
}

References

2024

arXiv
Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models

Fujiao Ji, Kiho Lee, Hyungjoon Koo, and 4 more authors

2024

Abs Bib HTML

Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing models using a large-scale dataset of 450K real-world phishing websites. Our analysis reveals that while certain models maintain high accuracy, others exhibit notably lower performance than results on curated datasets, highlighting the importance of real-world evaluation. In addition, we observe the real-world tactic of manipulating visual components that phishing attackers employ to circumvent the detection systems. To assess the resilience of existing models against adversarial attacks and robustness, we apply visible and perturbation-based manipulations to website logos, which adversaries typically target. We then evaluate the models’ robustness in handling these adversarial samples. Our findings reveal vulnerabilities in several models, emphasizing the need for more robust visual similarity techniques capable of withstanding sophisticated evasion attempts. We provide actionable insights for enhancing the security of phishing defense systems, encouraging proactive actions. To the best of our knowledge, this work represents the first large-scale, systematic evaluation of visual similarity-based models for phishing detection in real-world settings, necessitating the development of more effective and robust defenses.
@article{ji2024evaluatingeffectivenessrobustnessvisual, title = {Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models}, author = {Ji, Fujiao and Lee, Kiho and Koo, Hyungjoon and You, Wenhao and Choo, Euijin and Kim, Hyoungshick and Kim, Doowon}, year = {2024}, eprint = {2405.19598}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, url = {https://arxiv.org/abs/2405.19598}, }

2022

USENIX Security

Inferring phishing intention via webpage appearance and dynamics: A deep vision based approach

Ruofan Liu, Yun Lin, Xianglin Yang, and 3 more authors

2022

Bib

@article{liu2022inferring,
  title = {Inferring phishing intention via webpage appearance and dynamics: A deep vision based approach},
  author = {Liu, Ruofan and Lin, Yun and Yang, Xianglin and Ng, Siang Hwee and Divakaran, Dinil Mon and Dong, Jin Song},
  booktitle = {31st USENIX Security Symposium (USENIX Security 22)},
  pages = {1633--1650},
  year = {2022},
}

2021

USENIX Security

Phishpedia: A hybrid deep learning based approach to visually identify phishing webpages

Yun Lin, Ruofan Liu, Dinil Mon Divakaran, and 6 more authors

2021

Bib

@article{lin2021phishpedia,
  title = {Phishpedia: A hybrid deep learning based approach to visually identify phishing webpages},
  author = {Lin, Yun and Liu, Ruofan and Divakaran, Dinil Mon and Ng, Jun Yang and Chan, Qing Zhou and Lu, Yiwen and Si, Yuxuan and Zhang, Fan and Dong, Jin Song},
  booktitle = {30th USENIX Security Symposium (USENIX Security 21)},
  pages = {3793--3810},
  year = {2021},
}

2020

CCS

Visualphishnet: Zero-day phishing website detection by visual similarity

Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz

2020

Bib

@article{abdelnabi2020visualphishnet,
  title = {Visualphishnet: Zero-day phishing website detection by visual similarity},
  author = {Abdelnabi, Sahar and Krombholz, Katharina and Fritz, Mario},
  booktitle = {Proceedings of the 2020 ACM SIGSAC conference on computer and communications security},
  pages = {1681--1698},
  year = {2020},
}

2011

IEEE ICSC

Phishzoo: Detecting phishing websites by looking at them

Sadia Afroz, and Rachel Greenstadt

2011

Bib

@article{afroz2011phishzoo,
  title = {Phishzoo: Detecting phishing websites by looking at them},
  author = {Afroz, Sadia and Greenstadt, Rachel},
  booktitle = {2011 IEEE fifth international conference on semantic computing},
  pages = {368--375},
  year = {2011},
  organization = {IEEE},
}