Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks great! Are there other benchmarks? How does the speed compare to other LLM engines like llama.cpp / vllm (on GPUs)? Is it able to do continuous batching of incoming requests like vllm?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: