QuestionsLeaderboardAppendixBlogPracticeProfile
Back to Repository
Reliability & EvaluationMedium

When would you evaluate a model without having a "correct" answer to compare it against? (e.g., checking for tone or politeness).

Practice Your Response

Similar Questions in Reliability & Evaluation

Medium

If your retriever returns 5 documents but only 1 was actually related to answering the question, how do you penalize the retriever for the "noise"?

View
Medium

You’ve updated your system prompt to fix a specific bug. How do you ensure this "fix" didn't break 10 other things the model was previously doing correctly?

View
Medium

How would you automate the process of trying to make your model "break" or "hallucinate"?

View

Built for the AI Engineering community.

BlogPrivacyTermsContact