More

gsam · 2025-02-07T01:50:23 1738893023

I don't like wading into this debate when semantics are very personal/subjective. But to me, it seems like almost a sleight of hand to add the stochastic part, when actually they're possibly weighted more on the parrot part. Parrots are much more concrete, whereas the term LLM could refer to the general architecture.

The question to me seems: If we expand on this architecture (in some direction, compute, size etc.), will we get something much more powerful? Whereas if you give nature more time to iterate on the parrot, you'd probably still end up with a parrot.

There's a giant impedance mismatch here (time scaling being one). Unless people want to think of parrots being a subset of all animals, and so 'stochastic animal' is what they mean. But then it's really the difference of 'stochastic human' and 'human'. And I don't think people really want to face that particular distinction.

ggm · 2025-02-07T02:29:15 1738895355

"Expand the architecture" .. "get something much more powerful" .. "more dilithium crystals, captain"

Like I said elsewhere in this overall thread, we've been here before. Yes, you do see improvements in larger datasets, weighted models over more inputs. I suggest, I guess I believe (to be more honest) that no amount of "bigger" here will magically produce AGI simply because of the scale effect.

There is no theory behind "more" and that means there is no constructed sense of why, and the absence of abstract inductive reasoning continues to say to me, this stuff isn't making a qualitative leap into emergent anything.

It's just better at being an LLM. Even "show your working " is pointing to complex causal chains, not actual inductive reasoning as I see it.

gsam · 2025-02-07T02:40:31 1738896031

And that's actually a really honest answer. Whereas someone of the opposite opinion might be like parroting in the general copying-template sense actually generalizes to all observable behaviours because templating systems can be turing-complete or something like that. It's templates-all-the-way-down, including complex induction as long as there is a meta-template to match on its symptoms it can be chained on.

Induction is a hard problem, but humans can skip infinite compute time (I don't think we have any reason to believe humans have infinite compute) and still give valid answers. Because there's some (meta)-structure to be exploited.

Architecturally if machines / NN can exploit this same structure is a truer question.

visarga · 2025-02-07T04:40:37 1738903237

> this stuff isn't making a qualitative leap into emergent anything.

The magical missing ingredient here is search. AlphaZero used search to surpass humans, and the whole Alpha family from DeepMind is surprisingly strong, but narrowly targeted. The AlphaProof model uses LLMs and LEAN to solve hard math problems. The same problem solving CoT data is being used by current reasoning models and they have much better results. The missing piece was search.

UniverseHacker · 2025-02-07T01:58:46 1738893526

I'm sure both of you know this, but "stochastic parrot" refers to the title of a research article that contained a particular argument about LLM limitations that had very little to do with parrots.

danielmarkbruce · 2025-02-07T02:20:26 1738894826

The term is much more broadly known than the content of that (rather silly) paper.... I'm not even certain that it's the first use of the term.

ggm · 2025-02-07T02:39:24 1738895964

https://books.google.com/ngrams/graph?content=Stochastic%2C+...

ggm · 2025-02-07T02:40:00 1738896000

And the word "hallucination" ... has very little to do with...

HappMacDonald · 2025-02-08T23:26:18 1739057178

But it's far easier for human parrots to parrot the soundbyte "stochastic parrot" as a thought-terminating cliche.

gsam · 2025-02-06T22:48:52 1738882132

In my mind, the pure reinforcement learning approach of DeepSeek is the most practical way to do this. Essentially it needs to continually refine and find more sound(?) subspaces of the latent (embedding) space. Now this could be the subspace which is just Python code (or some other human-invented subspace), but I don't think that would be optimal for the overall architecture.

The reason why it seems the most reasonable path is because when you create restrictions like this you hamper search viability (and in a high multi-dimensional subspace, that's a massive loss because you can arrive at a result from many directions). It's like regular genetic programming vs typed-genetic programming. When you discard all your useful results, you can't go anywhere near as fast. There will be a threshold where constructivist, generative schemes (e.g. reasoning with automata and all kinds of fun we've neglected) will be the way forward, but I don't think we've hit that point yet. It seems to me that such a point does exist because if you have fast heuristics on when types unify, you no longer hamper the search speed but gain many benefits in soundness.

One of the greatest human achievements of all time is probably this latent embedding space -- one that we can actually interface with. It's a new lingua franca.

These are just my cloudy current thoughts.

HarHarVeryFunny · 2025-02-07T00:41:05 1738888865

DeepSeek's approach with R1 wasn't pure RL - they used RL only to develop R0 from their V3 base model, but then went though two iterations of using current model to generate synthetic reasoning data, SFT on that, then RL fine-tuning, and repeat.

danielmarkbruce · 2025-02-06T23:30:29 1738884629

fwiw, most people don't really grok the power of latent space wrt language models. Like, you say it, I believe it, but most people don't really grasp it.

ttul · 2025-02-07T02:53:37 1738896817

Image generation models also have an insanely rich latent space. People will be squeezing value out of SDXL for many years to come.

gsam · 2025-01-31T02:17:07 1738289827

> Neural networks are notoriously bad at graphs.

AlphaFold is based on graph neural networks. The biggest issue is that we still do not know how to best encode graph problems in ways neural networks can exploit. Current graph neural network techniques exploit certain invariants but cannot distinguish between various similar graphs. And yet, they're still generating meaningful insights.

thesz · 2025-01-31T08:50:45 1738313445

> AlphaFold is based on graph neural networks.

And what is the size of graphs processed by AlphaFold? How does it compare to the size of program dependence graph?

Here's evaluation of boolean satisfiability encoding of protein folding problem: https://pmc.ncbi.nlm.nih.gov/articles/PMC7197060/

Boolean satisfiability solvers are used in satisfiability modulo theories solvers that, in turn, are used to prove various things about programs.

gsam · 2025-01-29T21:31:43 1738186303

In my view there's two modes of creativity:

1. That two distant topics or ideas are actually much more closely related. The creative sees one example of an idea and applies it to a discipline that nobody expects. In theory, reduction of the maximally distant can probably be measured with a tangible metric.

2. Discovery of ideas that are even more maximally distant. Pushing the edge, and this can be done by pure search and randomness actually. But it's no good if it's garbage. The trick is, what is garbage? That is very context dependent.

(Also, a creative might be measured on the efficiency of these metrics rather than absolute output)

docfort · 2025-01-30T12:20:09 1738239609

Terry Tao has referred to this classification system as foxes vs hedgehogs. https://en.m.wikipedia.org/wiki/The_Hedgehog_and_the_Fox

gsam · on July 10, 2023

Anyone else think that there is already primitive image data encoded in biological data? Essentially basic shapes and patterns which are passed down semi-generationally.

dghughes · on July 10, 2023

Anton Petrov has a recent video on YouTube I never watched it yet but it's title is "Could Life Be Transmitted Via Radio Waves? Information Panspermia". Just a bit of fun I'm sure Anton isn't too wild he puts out some interesting videos but not in a way to push quackery.

https://www.youtube.com/watch?v=K4Zghdqvxt4

nico · on July 10, 2023

Fascinating, thank you

Recently here on HN someone posted a quote saying something like “if you shine light at something for a long enough time, don’t be surprised if you end up getting a plant”

It was about how the environment seems to reorganize in certain ways to use up energy (the latest Veritasium video about entropy also talks about this)

dekhn · on July 10, 2023

I do not know of any true "image" data. Most complex patterns in nature are created by generative processes rather than direct encoding.

stevezsa8 · on July 10, 2023

I guess it's possible if this conferred some survival advantage.

It can be useful to work from the evidence to a conclusion instead of the other way round.

But wondering and philosophising can be fun :]

It would be cool if humans could pass knowledge via their offspring. But I always get worried thinking if I'm the asshole, I wouldn't want my kid to be one too.

anigbrowl · on July 10, 2023

I always get worried thinking if I'm the asshole, I wouldn't want my kid to be one too.

If you were, you would.

yieldcrv · on July 10, 2023

Just has to not be a disadvantage.

Plenty of mutations have no purpose whatsoever and were unrelated to survival or manifest after reproduction so are not selected for or against.

f6v · on July 10, 2023

I think it would have very high energy requirements. For this trait to survive over generations there would need to be a tremendous evolutionary benefit. What would that be for a “primitive image data”?

yetihehe · on July 10, 2023

Maybe things like "long green shape" (cats' fear of cucumbers because they resemble snakes), or "a series of black and yellow stripes", or even "a black blob with many appendages" to watch out for spiders? Encoding some primitive image data so that further generations know what to avoid or pursue seems like a very tremendous evolutionary benefit.

techdragon · on July 10, 2023

Yeah, I expect this isn’t going to be how that sort of mechanism works, but it’s always been an interesting concept for me, that while “genetic memory” as presented in much fiction is extremely unlikely just from the sheer entropic hill such mechanisms would have to evolutionarily climb to be able to pass on so much information (on top of the baseline necessary information for reproduction, the majority of memory won’t on average confer a lot of reproductive advantages, so it’s statistically more likely to get optimised out by the random mistakes of evolution, hence entropically “uphill”) …

Yet while this fictional form is unlikely we have quite a lot of good examples and evidence for “inherited information”. You have to be careful with it since it’s too easy to accidentally include side channels for organisms to learn the information and thus break the test. Such as insects being genetically driven towards food by smell at a molecular chemical interaction level, and the smell becoming associated with the information you wish to test. A bee colony can’t be reliably tested unless you raise it from a new queen in an odourless environment if you wish to see if bees genetically know that the shape of a flower is associated with food. It’s tough to subtract the potential that a colony will have learned and “programmed” later generations of bees with things like the classic waggle dancing in order to more efficiently gather food.

We do have good ones though like cats and snake shaped objects, it’s surprisingly consistent, and pops up in some other animal species. It’s wired into our brains a bit to watch out for such threats. There’s a significant bias towards pareidolia in human brains and it’s telling how deeply wired we have some of these things, but it is there and study shows it seems to form well before our cognitive abilities do… these all have some obvious reproductive advantages however so it makes sense that the “instinct” would be preserved over generations as it confers an advantage. But it’s still impressive that it can encode moderately complex information like “looks like the face of my species” or “cylindrical looking objects on the ground might be dangerous”… even if it’s encoded in a lossy subconscious instinctual level.

TeMPOraL · on July 10, 2023

> But it’s still impressive that it can encode moderately complex information like “looks like the face of my species” or “cylindrical looking objects on the ground might be dangerous”… even if it’s encoded in a lossy subconscious instinctual level.

I think it helps that the encoding does not have to be transferable in any way. This kind of "memory" has no need for portability between individuals or species - it doesn't even need to be factored out as a thing in any meaningful sense. I.e. we may not be able to isolate where exactly the "snake-shaped object" bit of instinct is stored, and even if we could, copy-pasting it from a cat to a dog wouldn't likely lead the (offspring of the) latter to develop the same instinct. The instinct encoding has to only ever be compatible with one's direct offspring, which is a nearly-identical copy, and so the encoding can be optimized down to some minimum tweaks - instructions that wouldn't work in another species, or even if copy-pasted down couple generations of one's offspring.

(In a way, it's similar to natural language, which rapidly (but not instantly) loses meaning with distance, both spatial/social and temporal.)

In discussing this topic, one has to also remember the insight from "Reflections on Trusting Trust" - the data/behavior you're looking for may not even be in the source code. DNA, after all, isn't universal, abstract descriptor of life. It's code executed by a complex machine that, as part of its function, copies itself along with the code. There is lots of "hidden" information capacity in organisms' reproduction machinery, being silently passed on and subject to evolutionary pressures as much as DNA itself is.

techdragon · on July 10, 2023

Oh absolutely... and that's a great analogy for the more computer oriented, "Reflections on Trusting Trust" highlights how it can be the supporting infrastructure of replication that passes on the relevant information... a compiler attack like that is equivalent to things like epigenetic information transfer... and for fun bonus measure since it came to mind... the short story Coding Machines goes well for really helping to never forget the idea behind "Reflections on Trusting Trust" https://www.teamten.com/lawrence/writings/coding-machines/

It definitely would be minimised data transfer, be it via an epigenetic nudge that just happens to work by sheer dumb luck because of some other existing mechanism or a sophisticated DNA driven growth of some very specific part of the mammalian connectome that we do not yet understand because we've barely got the full connectome maps of worms and insects, mammals are a mile away at the moment... no matter the mechanism evolution will have optimised it pretty heavily for simply information robustness reasons, fragile genetic/reproductive information transfer mistakes that work, break and get optimised out in favour of the more robust ones that don't break and more reliably pass on their advantage.

f6v · on July 10, 2023

You need to compare that with an alternative solution where this information is learned by each generation and then asses the survival advantage of having it encoded in DNA. This is outside my field and I don’t have a strong opinion.

gsam · on Oct 31, 2021

Category theory isn't the only way to solve this. Arguably a purely continuous dynamical system like differential equations with certain boundary conditions would similarly work. Discrete dynamical systems however, are much better at representing finite, discrete relationships, particularly with recursion. I'm only just learning about these topics, but simple rules lead to modelling indeterminately complex behaviour (Rule 30, logistic map). These can be viewed quite clearly through the lens of category theory as functors and fixed points. However, the gaps between automata theory, discrete (and non-linear) dynamical systems and finally category theory are still very wide at the moment.

malux85 · on Oct 31, 2021

This is one of the most interesting comments I have read in months, can I contact you?

gsam · on Nov 2, 2021

Do you have somewhere to send you some details? Otherwise, you can probably find a mail on my Github with the same name.

gsam · on Sept 15, 2021

CIM/WBEM goes all the way back to 1996. They essentially wanted a management infrastructure on all kinds of devices (including different architectures, so actually C made sense then), but that also notably included remote access. At the time, SOAP was still popular, so here we are with a rather silly transport protocol and all kinds of overhead reinventing things like SSH. However, the overall goal still makes sense, it was essentially a way of 'object'-ifying everything from logs to other metrics. This fit in with the overall mode of thinking in MS with DCOM and COM (and registry), and structured configuration/management. I'm sure it's paid massive dividends on Azure Linux infrastructure. For highly structured objects, SOAP and XML aren't a terrible fit, but I doubt many people would do the same thing again today.

Honestly, they just needed to rewrite it in a safer stack. However, that still may not have saved them from all these vulnerabilities, given the scope of what they're implementing as remote management protocols. The relative scrutiny, fuzzing and manpower just hasn't been there, especially when it's obfuscated by various layers.

onionisafruit · on Sept 15, 2021

Not to take away from the rest of what you said, but I don’t think SOAP was _still_ popular in 1996. I don’t think it had become popular yet. I don’t think I even heard of SOAP before 1999 or 2000. I’m not a trend setter or anything, but if it was popular, I probably would have at least heard of it.

homarp · on Sept 15, 2021

https://www.xml.com/pub/a/ws/2001/04/04/soap.html says SOAP was started in 98

gsam · on Sept 16, 2021

That's fair, I was more speaking about XML and its use as a form of binary transport. Things like WS-Management and explicit SOAP obviously came a little bit later, and SOAP-like technologies were popularized for more general use in the 2000s. I think it's fair to say my experiences in general lean more towards observing standards groups.

gsam · on Aug 16, 2021

Implementing it with a hylomorphism is actually quite clean, however, understanding the plumbing to get there is not. The principles are generally quite simple, however, there's no non-mathematical wordings to describe it.

gsam · on March 11, 2020

Have you seen any signs of which compression algorithm might be the one affected? Presumably it's one of their more snowflake ones. If it is in a library, surely SMB isn't the only affected resource? Perhaps it's not the library, but the plumbing or the headers and such.

gsam · on Jan 16, 2020

It's not actually clear that they prioritize their own products all that much. Certainly in regards to Project Zero, it seems the point is that they are detached from the rest of the product teams. (Correct me if I'm wrong)

Outside of the security teams, I think it's actually that Chromium is much better fuzzed and scrutinized. They just have so many more resources, including those for security.

tjpnz · on Jan 16, 2020

>It's not actually clear that they prioritize their own products all that much. Certainly in regards to Project Zero, it seems the point is that they are detached from the rest of the product teams. (Correct me if I'm wrong)

Most of what I've read relates to Microsoft products - and I'm not saying that Microsoft is better/worse than the rest when it comes to security.