When would you evaluate a model without having a "correct" answer to compare it against? (e.g., checking for tone or politeness).

Question

Accepted Answer

Reference-based compares against a "correct" answer. Reference-free evaluates the output on its own merits—for example, checking for toxic language, verifying that the tone is professional, or checking if the code generated is syntactically valid.

When would you evaluate a model without having a "correct" answer to compare it against? (e.g., checking for tone or politeness).

Practice Your Response

Similar Questions in Reliability & Evaluation