How do you programmatically check if an LLM is making things up that aren't in the provided search results?

Question

Accepted Answer

Measure if the answer is derived strictly from the retrieved context. You evaluate this by breaking the answer into individual claims and checking if each claim can be supported by a sentence in the source documents.

How do you programmatically check if an LLM is making things up that aren't in the provided search results?

Practice Your Response

Similar Questions in Reliability & Evaluation