It seems like AI system needs to be able to create subsystem that act like they're composable, that have properties like fully composable items but can't be fully composable. Full composability, as the author eludes, requires a messy reality be fully condensed to a compact model - and by now, we know the world is going to be more messy than exactly representable by a reasonably compact model.
An alternative to an AI just using compact models is an AI searching for areas approximatable by a compact model and then having an approximation of the boundary/limit of this models (and constantly updating that boundary/limit).
At this point I think just having the composability without the compactness would be a major breakthrough. Heck, even just figuring out how to efficiently harness multimodal inputs or better handle inputs that vary over time (e.g. video) would be a breakthrough as well. Case in point: nearly all academic models for object detection are borderline useless in practice because they work on individual frames, rather than on video, and as a result they aren't very robust and they lose objects frame to frame for no good reason. There are tracking models, but they are far less researched than single-frame ones, and much more complex. In essence, it's like that drunk who was looking for his car keys under a street lamp rather than where he lost them because it's easier to search for them there.
People don't realize how little of this surface we've scratched so far, and how awesome this stuff will get once we properly scratch it all over.
On the topic of handling multi-modal detections over time, this is often termed "object tracking" in academic literature. If you wanna dig deeper, check out this youtube channel: https://www.youtube.com/channel/UCa2-fpj6AV8T6JK1uTRuFpw/fea... there are also a couple universities with classes on this topic with lecture notes online. Often solutions in this space are not ML based, focusing on probabilistic approaches for associating and segmenting data from ML-based single frame detections. This sort of pipeline is a standard feature of many autonomous vehicle stacks.
I work in this field, I could create my own "youtube channel" about it, if I didn't have better things to do. This doesn't change the fact that the field is relatively nascent, and a lot of people are researching stuff that doesn't really matter all that much, at the expense of the stuff that does matter.
At this point I think just having the composability without the compactness would be a major breakthrough.
I'd claim that practical composibility without compactness impossible. Being huge, like a deep neural net, makes composability intractable. If your "elements" are akin to deep nets, composition will be very, very hard to calculate and so impossible to train.
By "composability" I mean you'd be able to train models separately and then perhaps slightly refine the composed model as a whole on the final task(s). And models would be using each other's outputs, sort of like today you can take an ImageNet backbone and put an object detection head on top, except at a higher level. I.e. an optical (+IR) multi-camera object tracking model, fused with lidar and radar, feeding into a path planning model. As of today, I have just described a billion dollars worth of modeling at least. That needs to be simpler and cheaper. Currently stuff like this has to be trained end to end, which, as you've pointed out, is very difficult to do, to the point where there are only a handful of companies that can convincingly pull it off (Tesla, Google, OpenAI, _maybe_ Facebook), and even they are arguably at a local minimum this approach allows. And this local minimum won't get us to "real" AI, which I think was the point of this rather rambling article.
Category theory isn't the only way to solve this. Arguably a purely continuous dynamical system like differential equations with certain boundary conditions would similarly work. Discrete dynamical systems however, are much better at representing finite, discrete relationships, particularly with recursion. I'm only just learning about these topics, but simple rules lead to modelling indeterminately complex behaviour (Rule 30, logistic map). These can be viewed quite clearly through the lens of category theory as functors and fixed points. However, the gaps between automata theory, discrete (and non-linear) dynamical systems and finally category theory are still very wide at the moment.
Any system capable of homeostasis--maintaining its own existence in the face of a range of environmental conditions--is using models. The models can be hard-coded in genetic material (which was "computed" over aeons of selection pressure) or dynamically learned and encoded in the span of the organism's life. Still models.
It's like bootstrapping certain operating systems from source--some of the dependencies exist only in machine code; the original source code has been lost to the sands of time. A human being is modeled by 23 chromosomes' worth of DNA, and when that model is planted in a properly-functioning adult female reproductive system it turns itself into a new human being.
I don't see why the world of CPUs and nonvolatile memory is any less capable of being a substrate for intelligent, self-replicating entities as the world of particles and atoms.
I’m not sure what exactly you mean by aesthetics but photons going into your eyeball and ending up as “images” constructed by your brain is much closer to a model fed by some data than it being some objective reality ingestor
The irony is that there’s an iceberg of robustly composed and functional models that the “brittle” Sphex wasp behavior is resting on top of, like everything that allows the wasp to even know where it’s nest is and how to stun prey and bring it back.
Disrupting the behavioral program causes the wasp to run a loop, not to curl up and die. AI has a very, very long way to go.
Nobody should believe this kind of article even if someone really clever wrote it