Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Still waiting for the day that medium term memory (token average pooling like in sentence transformers) becomes used for this. It's staring all of these companies in the face and apparently no one thinks to implement it.


I've been thinking along the same lines. The token window IMO should be a conceptual inverted pyramid, where there most recent tokens are retained verbatim but previous iterations are compressed/pooled more and more as the context grows. I'm sure there's some effort/research in this direction. It seems pretty obvious.


But some of the earlier tokens are also the most important ones, right? Like the instructions and rules you want it to follow.


Phrase embeddings could bring a 32x reduction in sequence length because:

> Text Embeddings Reveal (Almost) As Much As Text. ... We find that although a naïve model conditioned on the embedding performs poorly, a multi step method that iteratively corrects and re embeds text is able to recover 92% of 32-token text inputs exactly. We train our model to decode text embeddings from two state of the art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.

https://arxiv.org/abs/2310.06816


They are. Moreover, the idea that AI companies are missing and/or not implementing this “obvious” tactic is hilarious. Folks, these approaches have profound consequences for training and inference performance. Y’all aren’t pointing out some low hanging fruit here, lol


Actually, yes I am pointing out low hanging fruit here. These approaches do not have "profound consequences" for inference or training performance. In fact, sentence transformer models run orders of magnitude more quickly. Performance penalties will be small.

Also, I actually have several top NLP conference publications, so I'm not some charlatan when I say these things. I've actually physically used and seen these techniques improve LLM recall. It really actually works.

Here's more examples of low hanging fruit. The proof in that they work is in the implementations which I provide. You can run them, they work!: https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

Check yourself before you try to check others.


> In fact, sentence transformer models run orders of magnitude more quickly. Performance penalties will be small.

They do not. Sentence transformers aren't new, and have well-known trade offs. What source or line of reasoning misled you to believe otherwise?

> Here's more examples of low hanging fruit. The proof in that they work is in the implementations which I provide. You can run them, they work!: https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

This...is your blog about prompt engineering. What do you believe this "proves"? How have you blown away current production encoding or attention mechanisms?


Concur. LLM are still very young. We’re barely a year out from the ChatGPT launch. Everyone is iterating like mad. Several stealth companies working on new approaches with the potential to deliver performance leaps.

You ain’t seen nuthin’ yet…


Out of curiosity, why do you think the answer would be so simple and also completely untested?


Too much money being thrown around on BS in the LLM space, hardly any of it is going to places where it matters. Ignorance on the part of investors.

For example, the researchers working hard on better text sampling techniques (i.e. https://arxiv.org/abs/2202.00666), or on better constraint techniques (i.e. like this https://arxiv.org/abs/2306.03081), or on actual negative prompting/CFG in LLMs (i.e. like this https://github.com/huggingface/transformers/issues/24536) are doing far FAR more to advance the state of AI than dozens of VC backed LLM companies operating today. They are all laboring in relative obscurity.

HN, and the NLP community have some serious blindspots with knowing how to exploit their own technology. At least someone at Andreessen Horowitz got a clue and gave some funding to Oogabooga - still waiting for Automatic1111 to get any funding.


Another curiosity, what do we estimate (if it's even possible) the context window of a human? Obviously an extremely broad question, and of course it must have some sort of decay factor... but... would be interesting to get a rule of thumb number in terms of token count. I can imagine its massive!


Human memory, in my limited understanding, doesn’t have the bifurcation of weights and context that LLMs do. It’s all a bit blurrier than that.

Something interesting that I heard from people trying to memorize things better is that memory “storage space” limits for people are essentially irrelevant. We’re limited by our learning and forgetting speeds. There’s no evidence of brains getting “full”.

Think of it like a giant warehouse of plants, with one employee. He can accept shipments (learning). He can take care of plants (remembering). Too long without care and they die (forgetting). The warehouse is big enough that it is not a limiting factor in how many plants he can keep alive. If it was 10x bigger it wouldn’t make a bit of difference.


I don't think it's massive. In fact, since it's roughly equivalent to working memory, I suspect it's on the order of 100 tokens at most.

It's just that, unlike these AIs, we're capable of online learning.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: