Similar Questions in Reliability & Evaluation
Easy
Why is a standard unit test (asserting that output == "expected") often a bad way to test an LLM? How do you handle a model that gives three different, but correct, answers to the same prompt?
View
Medium
How do you programmatically check if an LLM is making things up that aren't in the provided search results?
View
Medium
How would you automate the process of trying to make your model "break" or "hallucinate"?
View