GP could be mentioning quantization aware training, during which the weight and ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		rfoo on March 5, 2023 \| parent \| context \| favorite \| on: Show HN: Llama-dl – high-speed download of LLaMA, ... GP could be mentioning quantization aware training, during which the weight and gradient are still computed in fp16/fp32.

CuriouslyC on March 6, 2023 [–]

It can go farther than that, it seems like the weight gradients are the main thing where the precision is a bottleneck (see https://arxiv.org/abs/1805.11046).

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact