Looks great! Are there other benchmarks? How does the speed compare to other LLM...

		nailk on Sept 2, 2023 \| parent \| context \| favorite \| on: Llama 2 70B on M2 Max at 7 tokens/sec Looks great! Are there other benchmarks? How does the speed compare to other LLM engines like llama.cpp / vllm (on GPUs)? Is it able to do continuous batching of incoming requests like vllm?