QuestionsLeaderboardAppendixBlogPracticeProfile
Back to Repository
Reliability & EvaluationMedium

Instead of checking for exact words, how would you use BERTScore or Cosine Similarity of embeddings to evaluate if an LLM's summary is accurate?

Practice Your Response

Similar Questions in Reliability & Evaluation

Easy

Why is a standard unit test (asserting that output == "expected") often a bad way to test an LLM? How do you handle a model that gives three different, but correct, answers to the same prompt?

View
Medium

How do you programmatically check if an LLM is making things up that aren't in the provided search results?

View
Medium

If you are using an LLM to grade another LLM, why is it critical to provide a "multi-point rubric" rather than just asking "Is this answer good?"

View

Built for the AI Engineering community.

BlogPrivacyTermsContact