PaLM 2 Evaluation: Automatic Summarization
Here we go again — Struggling with Contaminated Training Data
Here we go again — Struggling with Contaminated Training Data
The evaluation of a large language model such as PaLM 2 is extremely challenging for one main reason: The evaluation data may have been in the training data.
In other words, there is a risk of data contamination, a.k.a., data leakage.
Similarly to OpenAI with GPT-4, Google tried to minimize the …