Interesting - is it viable do you think to package a llm like that with an existing game and run it locally - I assume it will be intensive to run but wouldn't that eliminate inference costs?
It would be intensive but it's very doable. You could use koboldcpp or something like that with an exposed endpoint just on the local machine and use that. You'll likely run into issues with GPU vendors and ensuring that you've got the right software versions running, but with some checking, it should be viable. Maybe include a fallback in case the system can't produce results in a timely manner.
yeah that's what I'm saying - it would eliminate inference costs. What I was asking is how feasible is it to package these local llms with another standalone app. For ex. a game