Architecture
RAG Pipeline
User Query
Natural language student question
Retrieve
Semantic chunks across 8 docs
Generate
Claude 4.5 + safety prompt
Cited Answer
Grounded + source citations
Model
Claude 4.5 Sonnet
Strong instruction-following & safe scope
Temperature
0.4
Factual accuracy over creativity
Memory
Window = 10
Short context, no drift
Retrieval
Semantic Chunks
Relevance over keyword match
Citations
Always On
Full source transparency
Safety
5 Prompt Rules
Anti-hallucination + scope control
Prompt Engineering
Safety System Prompt
Safety was engineered at the prompt level — 5 explicit behavioural rules preventing hallucination and enforcing scope. This maps directly to F5's approach of AI defensive measures.
System Prompt Rules (designed by Shraddha Kadam)
Evaluation Results
24-Test Evaluation Framework
Category 1
Factual
Direct answers in documents — clear retrieval, accurate grounding
8 questions
Category 2
Synthesis
Multi-document queries — retrieval occasionally misses a chunk
8 questions
Category 3
Edge-Case
Out-of-scope queries — high variance, hallucination is main failure
8 questions
Score Distribution — All 24 Test Cases
Failure Analysis
Failure Modes & Fixes
Critical — Edge Cases
Confident Hallucination on Missing Info
When documents lacked information, the model generated fluent but unsupported answers rather than refusing.
↳ Fix: Stronger uncertainty prompts + multi-hop retrieval verification
Moderate — Synthesis
Incomplete Multi-Document Retrieval
Synthesis queries sometimes missed a critical document chunk — fluent but incomplete answer presented as complete.
↳ Fix: Improved recall + multi-hop retrieval + better chunking structure
Minor — All Categories
Over-Verbose Factual Answers
Correct information buried in unnecessary context — reduces clarity without improving accuracy.
↳ Fix: Query-type detection + length guidance per category in system prompt