AI Evaluation & Robustness Mar 24, 2025 Published project
SMS Spam Traditional NLP Classification

Traditional text-classification workflow

This project builds a spam-detection workflow for SMS messages using classical NLP representations and machine-learning models. It emphasizes model comparison, class-level metrics, and evaluation beyond headline accuracy.

PythonScikit-learnBag of WordsTF-IDFNaive BayesLogistic RegressionSVM

Challenge

  • Spam detection is an imbalanced text-classification task where overall accuracy can hide weak spam recall.
  • Different text representations can change how classifiers separate ham and spam messages.
  • Practical evaluation requires confusion matrices, recall, F1-score, and ranking behavior.

System architecture

SMS messagesham + spam
Text vectorsBoW + TF-IDF
Model comparisonNB + LR + SVM
EvaluationF1 + ROC-AUC

Data and inputs

  • 5,574 SMS messages with 4,827 ham and 747 spam messages.
  • Binary text-classification task using Bag of Words and TF-IDF representations.
  • The reported vocabulary size is 6,879 features with high sparsity.

Technical approach

  • Preprocess SMS text and build sparse vector representations.
  • Train Naive Bayes, Logistic Regression, and Support Vector Machine models.
  • Compare Bag of Words and TF-IDF across multiple classifiers.
  • Review confusion matrices, ROC curves, spam recall, and spam F1-score.

Evaluation and results

Key indicators

5,574 SMS messages

Key indicators

6 model/representation combinations

Key indicators

SVM + TF-IDF accuracy 97.67%

Key indicators

Spam F1-score 0.90

  • SVM with TF-IDF achieved the best reported overall accuracy at 97.67%.
  • The same configuration reached a 0.90 spam F1-score.
  • The comparison showed why spam recall and F1-score should be reviewed alongside accuracy.

Implementation and code

Implementation focus

The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.

Source code

The code is available for exploring the implementation details and extending the experiment when needed.

Open source code

Scope and responsible use

The project focuses on language-data modeling and evaluation. Broader use would require domain-specific validation, edge-case assessment, monitoring, and testing on fresh data.

Future development

  • Evaluate transformer-based spam models against classical baselines.
  • Add calibration and threshold tuning for recall-sensitive use cases.
  • Test robustness on newer SMS, messaging-app, and multilingual datasets.

Technical contribution

The project demonstrates careful evaluation for imbalanced text classification: comparing representations, reading class-level tradeoffs, and identifying a strong baseline workflow for spam detection.