Generative AI & RAG May 9, 2026 Published project
GovRAG Copilot

Evidence-grounded regulatory RAG system

GovRAG Copilot is a bilingual Retrieval-Augmented Generation system designed around Saudi PDPL and SDAIA guidance. It focuses on traceable answers, structured compliance drafting, gap detection, and visual outputs grounded in source evidence.

PythonRAGBM25TF-IDFQwen2.5GradioMatplotlib

Challenge

  • PDPL and related guidance can be distributed across Arabic and English regulatory documents.
  • Users need answers that point back to source passages rather than unsupported model text.
  • Compliance workflows often require structured outputs such as privacy notices, ROPA entries, breach drafts, and transfer assessments.

System architecture

Regulatory sourcesPDPL and SDAIA documents
Article chunksArabic normalization and metadata
Hybrid retrievalBM25 + TF-IDF
Grounded outputQ&A, drafting, visuals, citations

Data and inputs

  • Arabic and English regulatory documents split into article-aware chunks.
  • Page, article, and citation metadata attached to retrieved passages.
  • A controlled query flow that normalizes Arabic text and expands compliance terminology.

Technical approach

  • Ingestion pipeline prepares source documents and stores article-level chunks.
  • Hybrid retrieval combines BM25 and TF-IDF to balance exact legal terms with broader topical matches.
  • Generation layer supports extractive, local Ollama, and HuggingFace/Qwen-style backends.
  • Gradio interface separates question answering, drafting, gap detection, search inspection, visuals, and project information.

Evaluation and results

Key indicators

238 article-level chunks

Key indicators

190 passing tests

Key indicators

5 Gradio workflow tabs

  • Used automated tests to check important behaviors and template outputs.
  • Tracked document coverage through 238 article-level chunks.
  • Focused evaluation on retrieval quality, citation discipline, faithfulness, completeness, and bilingual consistency.

Implementation and code

Implementation focus

The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.

Source code

The code is available for exploring the implementation details and extending the experiment when needed.

Open source code

Scope and responsible use

The project is a focused modeling and evaluation study. Broader use should be supported by validation on additional data, robustness checks, monitoring, and domain-specific evaluation.

Future development

  • Add a formal retrieval benchmark with labeled question-answer-citation pairs.
  • Add freshness checks for updated PDPL/SDAIA documents.
  • Improve answer evaluation with stronger faithfulness and citation-support scoring.

Technical contribution

GovRAG brings together practical RAG engineering: ingestion, retrieval design, multilingual handling, citation discipline, workflow UI design, visual outputs, and evaluation beyond a general chat assistant.