Yeah, the cost figures need more scrutiny; they started with Llama3 which they g...

Palmik · on Jan 28, 2025

I don't know where you're getting your information from. Maybe you're confusing DeepSeek v3/r1 and the distilled r1 models.

DeepSeek V3/R1 architecture isn't anything like Llama 3. Llama 3 isn't even a mixture of experts, not to mention the various other differences like attention compression etc

bambax · on Jan 28, 2025

Indeed I got confused. DeepSeek V3 is not based on Llama 3. Sorry about that.

blackeyeblitzar · on Jan 28, 2025

You make a good point, that maybe the models won’t perform much better with those improvements, or at least not enough to get people to pay more.

I’m curious about the Llama3 bit - do you have a source for that? I’ve been hearing they trained using OpenAI outputs (not sure how that would work).