Do Bigger Evaluation Datasets Make Your Results More Significant?
A very common belief among AI practitioners
A very common belief among AI practitioners
I made a tweet poll asking the following questions:
What the results say is that many machine translation researchers think that increasing the size of the evaluation data makes the results more significant…
The size of the test set shouldn’t have any impact on the evaluation, provided that the test set has been …