I checked some logs from my past experiments, the decoding went for about 400 tps over a ~3k token query, so about 7 seconds to process it, and then the generation speed was about 28 tokens.
In theory it’s easier/possible with some types of models, harder/impossible with others, but only if the model and the data processing around it is disclosed.
The bigger issue here is that some seemingly unrelated factors and their combinations (postal code, time being active during the day, even the vocabulary used in social communication) could be predictive for the user’s economic status.