Similar Questions in Reliability & Evaluation
Medium
If your retriever returns 5 documents but only 1 was actually related to answering the question, how do you penalize the retriever for the "noise"?
View
Easy
Define Exact Match (EM) vs. F1 Score in the context of an extraction task (e.g., extracting dates from a PDF). When should you use EM?
View
Medium
You’ve updated your system prompt to fix a specific bug. How do you ensure this "fix" didn't break 10 other things the model was previously doing correctly?
View