Seems interesting to let the LLMs design their own reasoning traces instead of being constrained by human labelers. I could imagine some self consistency approaches to find common high-quality reasoning traces.
Seems like a bitter lesson moment for reasoning traces.