QuestionsLeaderboardAppendixBlogPracticeProfile
Back to Repository
Reliability & EvaluationMedium

How do you evaluate a RAG system’s performance when the answer is not present in the retrieved documents? (Does it correctly say "I don't know"?)

Practice Your Response

Similar Questions in Reliability & Evaluation

Medium

You’ve updated your system prompt to fix a specific bug. How do you ensure this "fix" didn't break 10 other things the model was previously doing correctly?

View
Medium

How do you calculate the ROI of a prompt change? If a new prompt is 5% more accurate but 50% more expensive in tokens, how do you decide if it’s worth it?

View
Easy

Define Exact Match (EM) vs. F1 Score in the context of an extraction task (e.g., extracting dates from a PDF). When should you use EM?

View

Built for the AI Engineering community.

BlogPrivacyTermsContact