Not an AI researcher here so this is probably common knowledge for people in thi...

danielhanchen · 2025-07-23T09:24:52 1753262692

Yes you're exactly thinking correctly! We shouldn't quantize a model naively to 2bit or 4bit, but we should do it smartly!

qxfys · 2025-07-23T14:25:39 1753280739

How do you pick which one should be 2, which one should be 4, etc. Is this secret sauce? or, something open?

danielhanchen · 2025-07-24T22:42:58 1753396978

Oh I wrote about it here: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs We might provide some scripts for them in the future!

qxfys · 2025-07-25T07:55:56 1753430156

Thanks! But, I can't find any details on how you "intelligently adjust quantization for every possible layer" from that page. I assume this is a secret?

I am wondering about the possibility that different use cases might require different "intelligent quantization", i.e., quantization for LLM for financial analysis might be different from LLM for code generation. I am currently doing a postdoc in this. Interested in doing research together?

danielhanchen · 2025-07-25T11:39:51 1753443591

Oh we haven't yet published about it yet! I talk about in bits and pieces - we might do a larger blog on it!

Yes different use cases will be different - oh interesting! Sorry I doubt I can be of much in our research - I'm mainly an engineering guy so less research focused!