Phishing Email Detection with ML

AI Evaluation & Robustness Oct 2, 2025 Published project

Security text classification and model comparison

This project builds and evaluates a phishing-email detection workflow using labeled email text. It compares classical machine-learning models with attention to phishing recall, false positives, feature interpretation, and practical monitoring needs.

View source code Back to Projects

PythonScikit-learnTF-IDFRandom ForestSVMLogistic Regression

Share project

Challenge

Security classification needs more than headline accuracy; phishing recall and false positives matter.
Text features can capture useful cues but may also learn dataset-specific artifacts.
A useful model should balance performance, simplicity, and interpretability.

System architecture

Email datacleaning and labels

Text featuresTF-IDF representation

Model comparisonRF · SVM · LR

Evaluationrecall, errors, terms

Data and inputs

18,650 raw rows and 17,538 rows after cleaning.
Safe Email and Phishing Email classes with an 80/20 train-test split.
5,000 TF-IDF features used to represent email text.

Technical approach

Clean missing and duplicate email records before modeling.
Represent email text with TF-IDF features.
Compare Random Forest, Support Vector Machine, and Logistic Regression models.
Review confusion matrices, phishing recall, and top phishing-related terms.

Evaluation and results

Key indicators

17,538 cleaned email rows

Key indicators

5,000 TF-IDF features

Key indicators

Selected linear SVM accuracy 0.9763

All evaluated models reached strong performance above 97% accuracy.
Linear SVM achieved 0.9763 accuracy and a strong balance of phishing recall and operational simplicity.
RBF SVM reached 0.9772 accuracy, but the linear SVM remained simpler to interpret.
The selected model detected about 98% of phishing emails in the test set.

Implementation and code

Implementation focus

The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.

Source code

The code is available for exploring the implementation details and extending the experiment when needed.

Open source code

Scope and responsible use

The project demonstrates detection modeling on available data. Operational security use would require continuous data refresh, monitoring, adversarial testing, and privacy-aware logging.

Future development

Evaluate on newer and more diverse phishing datasets.
Add calibration, threshold tuning, and cost-sensitive error analysis.
Test transformer-based encoders against the classical TF-IDF baseline.

Technical contribution

The project demonstrates careful security-oriented model evaluation: comparing baselines, prioritizing recall, analyzing false positives, and interpreting text features with domain awareness.