AI Evaluation & Robustness Mar 7, 2025 Published project
Breast Cancer Naive Bayes Classification

Health-data classification workflow

This project builds a supervised classification workflow for diagnostic tabular data. It focuses on exploratory analysis, feature scaling, probabilistic modeling, and evaluation that distinguishes overall accuracy from class-level recall and precision.

PythonScikit-learnGaussian Naive BayesPandasEDAFeature scaling

Challenge

  • Clinical-style tabular datasets require careful handling because class-level recall can be more important than a single accuracy value.
  • Numerical diagnostic features need exploration, scaling, and interpretation before model results can be trusted.
  • A lightweight baseline is useful for understanding whether meaningful class separation exists before introducing more complex models.

System architecture

Diagnostic featuresnumerical measurements
EDA and scalingdistribution review
Probabilistic modelGaussian NB
Evaluationaccuracy + recall

Data and inputs

  • Breast Cancer Wisconsin dataset from scikit-learn.
  • 569 samples with 30 numerical diagnostic features.
  • Binary target with benign and malignant classes.

Technical approach

  • Review feature distributions, correlations, and class patterns.
  • Scale numerical features before model training.
  • Train Gaussian Naive Bayes as a fast probabilistic baseline.
  • Evaluate accuracy, precision, recall, F1-score, and confusion matrix behavior.

Evaluation and results

Key indicators

569 samples

Key indicators

30 diagnostic features

Key indicators

96% test accuracy

Key indicators

0.99 malignant recall

  • The model achieved 96% test accuracy.
  • The malignant class reached 0.99 recall in the reported evaluation.
  • Train and test accuracy were close, suggesting no severe overfitting in this baseline workflow.

Implementation and code

Implementation focus

The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.

Source code

The code is available for exploring the implementation details and extending the experiment when needed.

Open source code

Scope and responsible use

This project demonstrates modeling and evaluation on health-related data and is not intended for clinical decision-making. Any clinical use would require external validation, expert review, calibration, and regulatory oversight.

Future development

  • Compare additional models and calibrated probability outputs.
  • Add explainability views for influential diagnostic features.
  • Evaluate robustness across external datasets and different train/test splits.

Technical contribution

The project demonstrates disciplined model evaluation for sensitive tabular classification: exploring the data, building an interpretable baseline, and reading class-level metrics instead of relying on accuracy alone.