Great stuff - and training was so cheap too - it would have cost less than $200 ...

Great stuff - and training was so cheap too - it would have cost less than $200 on Runpod, and half that price on spot instances.

I guess it's time languages other than Python, especially niche ones, started collating their own language specific datasets.

Personally, I daydream about an Elixir specific LLM I could run locally, trained or fine tuned to respond in an idiomatic fashion, and plug into a tool like Cursor.so.

Are there any examples of the internal dataset used in the 80K instruction / answer pairs Phind used to tune this?