Similar Questions in Reliability & Evaluation
Medium
How do you programmatically check if an LLM is making things up that aren't in the provided search results?
View
Medium
How would you automate the process of trying to make your model "break" or "hallucinate"?
View
Medium
You’ve updated your system prompt to fix a specific bug. How do you ensure this "fix" didn't break 10 other things the model was previously doing correctly?
View