TLDR: > Giving few-shot examples to the generator didn't work. > Using an LLM-ju...

TLDR:

> Giving few-shot examples to the generator didn't work.

> Using an LLM-judge (with no training) didn't work.

> Using an embedding + KNN-classifier (lots of training data) worked.

I don't know why they didn't try fine-tuning the LLM-judge, or at least give it some few-shot examples.

But it shows that embeddings can make very simple classifiers work well.