QuestionsLeaderboardAppendixBlogPracticeProfile
Back to Repository
Reliability & EvaluationMedium

When would you evaluate a model without having a "correct" answer to compare it against? (e.g., checking for tone or politeness).

Practice Your Response

Similar Questions in Reliability & Evaluation

Medium

Instead of checking for exact words, how would you use BERTScore or Cosine Similarity of embeddings to evaluate if an LLM's summary is accurate?

View
Easy

Why is a standard unit test (asserting that output == "expected") often a bad way to test an LLM? How do you handle a model that gives three different, but correct, answers to the same prompt?

View
Medium

If your model’s accuracy suddenly drops by 10% on Tuesday, how do you determine if the Model changed (API update), the Data changed (new documents in RAG), or User Behavior changed?

View

Built for the AI Engineering community.

BlogPrivacyTermsContact