QuestionsLeaderboardAppendixBlogPracticeProfile
Back to Repository
Reliability & EvaluationMedium

You’ve updated your system prompt to fix a specific bug. How do you ensure this "fix" didn't break 10 other things the model was previously doing correctly?

Practice Your Response

Similar Questions in Reliability & Evaluation

Medium

If your model’s accuracy suddenly drops by 10% on Tuesday, how do you determine if the Model changed (API update), the Data changed (new documents in RAG), or User Behavior changed?

View
Medium

Instead of checking for exact words, how would you use BERTScore or Cosine Similarity of embeddings to evaluate if an LLM's summary is accurate?

View
Hard

At what stage of the evaluation pipeline is a human absolutely necessary, and where can they be replaced by an automated "Judge LLM"?

View

Built for the AI Engineering community.

BlogPrivacyTermsContact