How do you measure if the LLM actually answered the user’s question, even if the facts it provided were technically true?

Question

Accepted Answer

This measures if the answer addresses the user's intent. A model could provide a perfectly "faithful" answer about the wrong topic (e.g., user asks about "Revenue" and the model gives a factual answer about "Hiring").

How do you measure if the LLM actually answered the user’s question, even if the facts it provided were technically true?

Practice Your Response

Similar Questions in Reliability & Evaluation