Instead of checking for exact words, how would you use BERTScore or Cosine Similarity of embeddings to evaluate if an LLM's summary is accurate?

Question

Accepted Answer

You convert both the "Correct Answer" and the "Model Answer" into mathematical vectors (embeddings). If the Cosine Similarity (the angle between vectors) is high (e.g., >0.9), the answers are semantically the same even if the word choice differs.

Instead of checking for exact words, how would you use BERTScore or Cosine Similarity of embeddings to evaluate if an LLM's summary is accurate?

Practice Your Response

Similar Questions in Reliability & Evaluation