Evidence-grounded regulatory RAG system
GovRAG Copilot is a bilingual Retrieval-Augmented Generation system designed around Saudi PDPL and SDAIA guidance. It focuses on traceable answers, structured compliance drafting, gap detection, and visual outputs grounded in source evidence.
Challenge
- PDPL and related guidance can be distributed across Arabic and English regulatory documents.
- Users need answers that point back to source passages rather than unsupported model text.
- Compliance workflows often require structured outputs such as privacy notices, ROPA entries, breach drafts, and transfer assessments.
System architecture
Data and inputs
- Arabic and English regulatory documents split into article-aware chunks.
- Page, article, and citation metadata attached to retrieved passages.
- A controlled query flow that normalizes Arabic text and expands compliance terminology.
Technical approach
- Ingestion pipeline prepares source documents and stores article-level chunks.
- Hybrid retrieval combines BM25 and TF-IDF to balance exact legal terms with broader topical matches.
- Generation layer supports extractive, local Ollama, and HuggingFace/Qwen-style backends.
- Gradio interface separates question answering, drafting, gap detection, search inspection, visuals, and project information.
Evaluation and results
238 article-level chunks
190 passing tests
5 Gradio workflow tabs
- Used automated tests to check important behaviors and template outputs.
- Tracked document coverage through 238 article-level chunks.
- Focused evaluation on retrieval quality, citation discipline, faithfulness, completeness, and bilingual consistency.
Implementation and code
Implementation focus
The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.
Source code
The code is available for exploring the implementation details and extending the experiment when needed.
Scope and responsible use
The project is a focused modeling and evaluation study. Broader use should be supported by validation on additional data, robustness checks, monitoring, and domain-specific evaluation.
Future development
- Add a formal retrieval benchmark with labeled question-answer-citation pairs.
- Add freshness checks for updated PDPL/SDAIA documents.
- Improve answer evaluation with stronger faithfulness and citation-support scoring.
Technical contribution
GovRAG brings together practical RAG engineering: ingestion, retrieval design, multilingual handling, citation discipline, workflow UI design, visual outputs, and evaluation beyond a general chat assistant.