How do you evaluate a RAG system’s performance when the answer is not present in the retrieved documents? (Does it correctly say "I don't know"?)

Question

Accepted Answer

You intentionally query the system with questions that cannot be answered by your database. Success is measured by the model's ability to admit it doesn't know the answer rather than hallucinating a guess.

How do you evaluate a RAG system’s performance when the answer is not present in the retrieved documents? (Does it correctly say "I don't know"?)

Practice Your Response

Similar Questions in Reliability & Evaluation