GovRAG Copilot

Generative AI & RAG May 9, 2026 Published project

GovRAG Copilot

Evidence-grounded regulatory RAG system

GovRAG Copilot is a bilingual Retrieval-Augmented Generation system designed around Saudi PDPL and SDAIA guidance. It focuses on traceable answers, structured compliance drafting, gap detection, and visual outputs grounded in source evidence.

View source code Back to Projects

PythonRAGBM25TF-IDFQwen2.5GradioMatplotlib

Share project

Challenge

PDPL and related guidance can be distributed across Arabic and English regulatory documents.
Users need answers that point back to source passages rather than unsupported model text.
Compliance workflows often require structured outputs such as privacy notices, ROPA entries, breach drafts, and transfer assessments.

System architecture

Regulatory sourcesPDPL and SDAIA documents

Article chunksArabic normalization and metadata

Hybrid retrievalBM25 + TF-IDF

Grounded outputQ&A, drafting, visuals, citations

Data and inputs

Arabic and English regulatory documents split into article-aware chunks.
Page, article, and citation metadata attached to retrieved passages.
A controlled query flow that normalizes Arabic text and expands compliance terminology.

Technical approach

Ingestion pipeline prepares source documents and stores article-level chunks.
Hybrid retrieval combines BM25 and TF-IDF to balance exact legal terms with broader topical matches.
Generation layer supports extractive, local Ollama, and HuggingFace/Qwen-style backends.
Gradio interface separates question answering, drafting, gap detection, search inspection, visuals, and project information.

Evaluation and results

Key indicators

238 article-level chunks

Key indicators

190 passing tests

Key indicators

5 Gradio workflow tabs

Used automated tests to check important behaviors and template outputs.
Tracked document coverage through 238 article-level chunks.
Focused evaluation on retrieval quality, citation discipline, faithfulness, completeness, and bilingual consistency.

Implementation and code

Implementation focus

The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.

Source code

The code is available for exploring the implementation details and extending the experiment when needed.

Open source code

Scope and responsible use

The project is a focused modeling and evaluation study. Broader use should be supported by validation on additional data, robustness checks, monitoring, and domain-specific evaluation.

Future development

Add a formal retrieval benchmark with labeled question-answer-citation pairs.
Add freshness checks for updated PDPL/SDAIA documents.
Improve answer evaluation with stronger faithfulness and citation-support scoring.

Technical contribution

GovRAG brings together practical RAG engineering: ingestion, retrieval design, multilingual handling, citation discipline, workflow UI design, visual outputs, and evaluation beyond a general chat assistant.