Interesting approach. I just finished some work for a similar task in a different domain.
One thing that surprised me: tantivy's BM25 search is faster, more expressive, and more scalable than SQLite. If you're just building a local search (or want to optimize for local FTS), I would strongly recommend looking into tantivy.
If you have the resources, it would be very interesting to throw a some models (especially smart-but-context-constrained cheaper ones) at some of the benchmark programming problems and see if this approach can show an effective improvement.
On Tantivy: Agree it's the better search engine, but context-mode is session-scoped — DB is a temp file that dies when the process exits. At that scale (50-200 chunks), FTS5 is zero-config, single-file, <1ms startup, and good enough. If we ever add persistent cross-session indexing, Tantivy would be the move.
On benchmarking: This is the experiment I most want to see. The hypothesis: context-mode benefits smaller models disproportionately — a 32K model with clean context could outperform a 200K model drowning in raw tool output. Would love to see SWE-bench results with context-mode on vs. off across model tiers.
You bet I do. That's an hour of rubber-ducky time working through new architectures with someone who won't get tired of my endless blathering. I've worked through a bunch of bad ideas that way, without embarrassing myself in front of my colleagues.
I also use it to explore topics that I wouldn't spend desktop time on, but that I was curious about. It's like having a buddy who's smarter than me on their special interest, but their special interest is "everything you don't know.". And your buddy's name is Gell-Mann. : - )
..this is the saddest thing I’ve read in a while if true. Whiteboard sessions with coworkers designing architectures and playing out/bouncing ideas is one of my favorite things to do at work.
Just don’t invite the folks with unearned arrogance.
You know what's pretty good at cleaning up data that's a total trash fire? _More_ LLM. :-)
I run a web service whose primary purpose is cleaning up messy, SEO-enshittified data from Google, eBay, etc. After years of fine-tuning my own heuristics, I threw a super-cheap LLM at it and it massively out-performed my custom code. It's slower, but the results are well worth it.
The description of the algorithm notes that each irregular pentagon is divided into four sub-pentagons. Eyeballing the maps, I don't see any group of 4 pentagons forming a similar larger pentagon.
I noticed that you had an analog to the H3 landing page on your landing page, allowing zooming in. If you could also steal the next-higher / next-smaller overlay like they did on the H3 landing page, it would make it clearer the relationship between the larger and smaller pentagons.
I've used H3 extensively, and one of the things that always bugged me about it was that each large hexagon was _mostly_ covered by a group of the next smaller ones, but because geometry, the edges have some overlap with the neighbor large hexagons. So I can't just truncate an integer mapping, for example, to get the ID of the next-largest.
Are the examples all actual outputs of the program? It's entirely possible that my understanding of the grammar is off, but it looks like these examples are wrong:
We got one, and we love it. My daughter dropped hers and bent the case. We ordered a new case (which, admittedly, was a significant percentage of the price of a new laptop) and had it repaired within an hour of delivery.
That was my first thought also. OTOH, Framework's business model is "we're going to charge you more to get that thing you actually wanted without locking you into stupid business models."
I would absolutely pay a premium for a decent TV without all the advertising crap that pops up every time I turn on the TV.
I'm not sure that is the business model. Framework laptops are more expensive, but not by much. They compete well with Macs and the high end laptops they are aiming to replace.
On the other hand, TVs can be bringing in quite a significant amount of their revenue post-purchase, like $100/year.
Dragons Egg by Robert Forward had always been one of my favorites. Asks the question "what would like be like if it evolved on a neutron star," and has 20 pages of his notes on working out the physics at the end of the book.
I read Dragon's Egg and the sequel, Starquake, as a teenager. I remembered them being deeply engrossing.
I revisited the first one 20-ish years later in an attempt to get my partner interested. I only made it a few chapters before I decided to abandon the attempt. There are some really good parts (the science stuff), but the way the human characters' interactions were described I just could not get past.
Hansen's Second Law: Clever is the opposite of maintainable
There is nothing that will stop a code review quicker than "I discovered a clever way to..." A dozen engineers are going to have to try to reverse engineer your cleverness in order to safely make any change to your code, so you'd better make double-sure that the performance you're buying with your cleverness is worth the total maintenance cost going forward.
There are some times it's worth it (the Fast InvSqrt hack and many others), but most of the time, it's just tickling our intellectual curiosity for our own benefit, and that's a bad trade.
One thing that surprised me: tantivy's BM25 search is faster, more expressive, and more scalable than SQLite. If you're just building a local search (or want to optimize for local FTS), I would strongly recommend looking into tantivy.
If you have the resources, it would be very interesting to throw a some models (especially smart-but-context-constrained cheaper ones) at some of the benchmark programming problems and see if this approach can show an effective improvement.
reply