Evidence-grounded diagnostic reasoning experiment
This project studies how retrieval and prompt design affect asthma-related reasoning in a controlled setting. It compares zero-shot, chain-of-thought, and neutral reasoning prompts over synthetic patient cases with local retrieval and evaluation metrics.
Challenge
- Clinical-style reasoning can be biased by leading prompts.
- Asthma-like symptoms need comparison against alternative explanations such as COPD-like cases.
- A controlled experiment needs synthetic cases, retrieved context, and metrics that reveal both fluency and repetition.
System architecture
Data and inputs
Custom asthma knowledge base, recursive chunks, sentence-transformer embeddings, FAISS vector store, and 10 synthetic positive/negative patient cases.
Technical approach
- Build a local retrieval index for asthma-related context.
- Compare zero-shot, chain-of-thought, and neutral chain-of-thought prompting.
- Evaluate responses with BLEU, ROUGE-L, METEOR, Distinct-2, perplexity, and Self-BLEU.
Evaluation and results
10 synthetic patient cases
3 prompting styles
BLEU / ROUGE-L / METEOR / Self-BLEU
- Neutral chain-of-thought improved objectivity in a COPD-like negative case.
- Perplexity, Self-BLEU, and Distinct-2 added useful signals beyond lexical overlap metrics.
- The project highlights that lexical metrics alone are not enough for judging reasoning quality.
Implementation and code
Implementation focus
The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.
Source code
The code is available for exploring the implementation details and extending the experiment when needed.
Scope and responsible use
This project demonstrates modeling and evaluation on health-related data and is not intended for clinical decision-making. Any clinical use would require external validation, expert review, calibration, and regulatory oversight.
Future development
- Add stronger clinical reasoning rubrics for evaluation.
- Compare more retrieval strategies and larger local models.
- Separate citation support from final-answer fluency.
Technical contribution
The project connects RAG, prompt design, diagnostic-style reasoning, and evaluation discipline in a safety-sensitive setting.