Clinical tabular modeling and diagnostic evaluation
This project builds a neural-network classifier for heart-disease prediction using structured clinical attributes. It emphasizes preprocessing, tuning, ROC/AUC evaluation, and error analysis rather than treating accuracy as the only success signal.
Challenge
- Tabular clinical data requires careful encoding, scaling, and split discipline.
- A useful classifier needs balanced class-level evaluation, not only overall accuracy.
- Health-related modeling should keep evaluation results clearly separated from clinical decision-making.
System architecture
Data and inputs
- Kaggle Heart Disease dataset based on UCI-style clinical attributes.
- 1,025 records, 14 original attributes, and a binary disease/no-disease target.
- Final encoded/scaled feature matrix with 27 features and a 70/10/20 train-validation-test split.
Technical approach
- One-hot encode categorical variables and standardize numerical features.
- Train DNN variants with early stopping and validation monitoring.
- Compare baseline, improved, dropout, L2, and batch-normalized variants.
Evaluation and results
1,025 clinical-style records
Test accuracy 0.9659
AUC 0.9813
- Best model reached 0.9659 test accuracy, 0.9658 weighted F1, and 0.9813 AUC.
- Disease-class recall reached 0.9905, while only 7 out of 205 test samples were misclassified.
- The analysis reviews both aggregate metrics and the small set of errors.
Implementation and code
Implementation focus
The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.
Source code
The code is available for exploring the implementation details and extending the experiment when needed.
Scope and responsible use
This project demonstrates modeling and evaluation on health-related data and is not intended for clinical decision-making. Any clinical use would require external validation, expert review, calibration, and regulatory oversight.
Future development
- Add external validation on another heart-disease dataset.
- Compare tree-based models and calibrated probabilities.
- Expand error analysis with feature-level interpretation.
Technical contribution
The project demonstrates disciplined supervised modeling on sensitive tabular data: preprocessing, tuning, diagnostic metrics, and responsible interpretation.