I've also had great results with using LLMs to pry into Apple's private and undocumented APIs. I've been impressed with the lack of hallucinations for C/C++ and Obj-C functions.
I can attest that the quality in this domain has greatly improved over the years too. I am not always fan of the quality of the Swift code that my LLM produces, but I am impressed that what is often produced works in one shot, as well. The quality also is not that important to me because I can just refactor the logic myself, and often prefer to do it anyway. I cannot hold an LLM to any idiosyncrasies that I do not share with it.
The other day I asked Claude Opus 4.6 one of my favorite trivia pieces:
What plural English word for an animal shares no letters with its singular form? Collective nouns (flock, herd, school, etc.) don't count.
Claude responded with:
"The answer is geese -- the plural of cow."
Though, to be fair, in the next paragraph of the response, Claude stated the correct answer. So, it went off the rails a bit, but self-corrected at least. Nevertheless, I got a bit of a chuckle out of its confidence in its first answer.
I asked GPT 5.2 the same question and it nailed the answer flawlessly. I wouldn't extrapolate much about the model quality based on this answer, but I thought it was interesting still.
(For those curious, the answer is 'kine' (archaic plural for cow).
Of course it’s important to remember that the ability of an LLM to answer an obscure riddle like that has nothing to do with its reasoning abilities, but rather depends on whether the answer was included in its training dataset.
Yes, yes, a thousand times yes! As recently as yesterday I was forced to abandon a conversation with a normie because I couldn't convince her of this fundamental limitation. ChatGPT was "damn near magical" in her opinion. Sigh.
The word is in most online dictionaries for what it is worth. It's also used in Biblical texts, albeit only a handful of times. I do agree it's not a true assessment of an LLM's overall reasoning. No person I have ever asked that riddle to has gotten it correct. Then again, that is probably partly the point of the riddle.
I would like to reiterate that both Claude and GPT answered correctly. It was just bizarre how Claude got a initial, minor detail incorrect, but reasoned enough to get the more difficult answer correct.
I think the point here is that an LLM might not get the correct answer because it hasn't found it yet by scraping Twitter, Facebook, Wikipedia, etc. Artificially limit the training set and it will never, ever get it right.
Whereas, a human of average intelligence, in possession of a dictionary or perhaps just a list of animals, could reason their way through getting the correct answer in finite time. They could probably get it without the list if they really wanted to.
I can replicate a vaguely similar result (gpt-5.2 produces the correct answer immeditely, Opus 4.6 "thinks aloud" in the output for two lines and then produces the correct answer), but I worry that 5.2 might be thinking under the hood here.
I once read the dating life in Iceland can be kind of difficult. The total population is around 400k after being settled for almost 1150 years. Thus, it's quite common on first dates for both individuals to go through their family trees. Not to see if they are related, but to make sure they aren't too related.
It turns out that Icelandic people even have an app for it now.
> I don't get the step relative thing, but if you do, but don't want to do anything to your step child.
I do not understand such desires either. Though, you reminded me of an 'Ask Me Anything (AMA)' on Reddit I read many years ago. The subject of the AMA was a person that worked in the adult entertainment industry for some significant amount of time. Not as an actor or actress, but in the 'behind the scenes' business/production operations.
I still remember reading a question someone ask which could be paraphrased as, "Why does the industry push so much taboo titled and themed content like: just turned 18, barely legal, incest, step-siblings, step-parents, etc.?"
The person answering questions responded by stating that it wasn't the industry that created a lot of this themed content in hopes people would expand their desires. Rather, it was the industry directly reacting to what people seemed to want in the first place based off the analytics they could capture. The industry started to notice that content with such themes really outperformed. In other words, if you want your content to be competitive, then it's essentially obligatory.
I find AI for coding to be a boon. However, I am very conservative in my use cases. I only use the AI web apps, thus I can essentially only ask questions and read responses. AI cannot write code all that well for me because it will never have access to the full project I am working on. But even then, I still feel a lot more productive. It's basically just an faster and more enhanced Google search/StackOverflow search for me.
I concur. I don't want to install libraries on my host machine that I won't use for anything other than development, e.g., Node.js.
On macOS, Lima has been a godsend. I have Claude Code in an image, and I just mount the directory I want the VM to have access to. It works flawlessly and has been a replacement for Vagrant for me for some time. Though, I owe a lot to Vagrant. It was a lifesaver for me back in the day.
I can attest that the quality in this domain has greatly improved over the years too. I am not always fan of the quality of the Swift code that my LLM produces, but I am impressed that what is often produced works in one shot, as well. The quality also is not that important to me because I can just refactor the logic myself, and often prefer to do it anyway. I cannot hold an LLM to any idiosyncrasies that I do not share with it.
reply