jezzarax's comments

jezzarax · on June 14, 2024

llama.cpp + llama-3-8b in Q8 run great on a single T4 machine. Cannot remember the TPS I got there, but it was much above 6 mentioned in the article.

veryrealsid · on June 14, 2024

Interesting, I got very different results depending on how I ran the model, will definitely give this a try!

edit: Actually could you share how long it took to make a query? One of our issues is we need it to respond in a fast time frame

jezzarax · on June 14, 2024

I checked some logs from my past experiments, the decoding went for about 400 tps over a ~3k token query, so about 7 seconds to process it, and then the generation speed was about 28 tokens.

jezzarax · on April 4, 2023

Munich airport T2 has water fountains for at least the last 2 years

jezzarax · on Jan 6, 2023

In theory it’s easier/possible with some types of models, harder/impossible with others, but only if the model and the data processing around it is disclosed.

The bigger issue here is that some seemingly unrelated factors and their combinations (postal code, time being active during the day, even the vocabulary used in social communication) could be predictive for the user’s economic status.