Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Moshi never released the base model, only two conversationally finetuned models. They also never released training code except for the codec. Though I don't see any training code for Hertz either, just 3 inference notebooks, and model code full of no_grad. No paper either to help me understand how this was trained and what the architecture is like. So I'm not too sure about researcher-friendliness unless I'm missing something.


We're working on a HuggingFace release that will help with finetuning. We'd like to do a paper, after a larger release - we're a team of 4.


Very impressive for just 4 people. What's the team background and how long have you been working on this?


I'm not part of their team, but lived with them for a couple months. They've been working on it for ~5 months, and their background is 16-20 year olds who are too smart for university.


For a rag-tag group of transcendental audiophiles operating electronic circuitry, it ionizes and atomizes well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: