>I'm puzzled that more people aren't loudly exploring this space (LLM+CLI) - it's really fun.
70% of the front page of Hackernews and Twitter for the past 9 months is about everybody and their mother's new LLM CLI. It's the loudest exploration I've ever witnessed in my tech life so far. We need to be hearing far less about LLM CLIs, not more.
Has anyone written a shell script before that uses a local llm as a composable tool? I know there's plenty of stuff like https://github.com/ggerganov/llama.cpp/blob/master/examples/... where the shell script is being used to supply all the llama.cpp arguments you need to get a chatbot ui. But I haven't seen anything yet that treats the LLM as though it were a traditional UNIX utility like sed, awk, cat, etc. I wouldn't be surprised if no one's done it, because I had to invent the --silent-prompt flag that let me do it. I also had to remove all the code from llava-cli that logged stuff to stdout. Anyway, here's the script I wrote: https://gist.github.com/jart/bd2f603aefe6ac8004e6b709223881c...
Justine may have addressed unreliable output by using `--temp 0` [0]. I'd agree that while it may be deterministic, there are other definitions or axes of reliability that may still make it poorly suited for pipes.
[0]
> Notice how I'm using the --temp 0 flag again? That's so output is deterministic and reproducible. If you don't use that flag, then llamafile will use a randomness level of 0.8 so you're certain to receive unique answers each time. I personally don't like that, since I'd rather have clean reproducible insights into training knowledge.
`--temp 0` makes it deterministic. What can make output reliable is `--grammar` which the blog post discusses in detail. It's really cool. For example, the BNF expression `root ::= "yes" | "no"` forces the LLM to only give you a yes/no answer.
that only works up to a point. If you are trying to transform a text based cli output into a JSON object, even with a grammar, you can get variation in the output. A simple example is field or list ordering. Omission is the real problematic one
70% of the front page of Hackernews and Twitter for the past 9 months is about everybody and their mother's new LLM CLI. It's the loudest exploration I've ever witnessed in my tech life so far. We need to be hearing far less about LLM CLIs, not more.