Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m really struggling to find a use case for these local models when even ChatGPT 3.5 can do it as good as any of them so far.


The article shows (fine tuned) Mistral 7B outperforming GPT-4, never mind GPT-3.5.


This model is not close to even 3.5 from when I used it. It first of all does not follow instructions properly and it just runs on and on


What you're describing is the behavior you get from any base model that has not been instruction-tuned. The article is clear that this model is not for "direct use". It needs tuning for a specific application.


how does one fine tune it to follow instructions? I would have thought they have open source training set for these instruction-follow fine tunes?


Not everyone wants to send all their data to OpenAI or Microsoft. Sometimes it isn't legally possible even if you want to. And not every use-case is blessed with a permanent internet connection.

And for some use-cases, the "alignment" work on GPT 3.5 and 4 gets more in the way than it helps (even OpenAI admits that alignment makes the model perform worse, even on generic benchmarks).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: