QuestionsLeaderboardAppendixBlogPracticeProfile
Back to Repository
Reliability & EvaluationEasy

Define Exact Match (EM) vs. F1 Score in the context of an extraction task (e.g., extracting dates from a PDF). When should you use EM?

Practice Your Response

Similar Questions in Reliability & Evaluation

Medium

How do you measure "Time to First Token" (TTFT) vs. "Total Runtime"? Which one matters more for user experience in a chatbot?

View
Medium

Guardrails add an extra check. How do you evaluate if the safety benefit of a guardrail outweighs the 200ms latency penalty it adds?

View
Medium

You’ve updated your system prompt to fix a specific bug. How do you ensure this "fix" didn't break 10 other things the model was previously doing correctly?

View

Built for the AI Engineering community.

BlogPrivacyTermsContact