They *can* be distinguished. They are just becoming more difficult to. Its sligh...

duskwuff · on Sept 18, 2023

> Some examples are easy to point at, like the case of the recipe book that lists Zelda food items as ingredients

As an aside, the case you're thinking of was a novel, not a recipe book. Still embarrassing, but at least it was just a bit of set dressing, not instructions to the reader.

https://www.cnet.com/culture/zelda-breath-of-the-wild-recipe...

> I saw someone giving programming advice on discord a few weeks ago. Advice that was blatantly copy/pasted from chat GPT in response to a very specific technical question.

This, on the other hand, is a very real and a very serious problem. I've also seen users try to get ChatGPT to teach them a new programming language or environment (e.g. learning to use a game development framework) and ending up with some seriously incorrect ideas. Several patterns of failure I've seen are:

1) As you describe, language models will frequently hallucinate features. In some cases, they'll even fabricate excuses for why those features fail to work, or will apologize when called out on their error, then make up a different nonexistent feature.

2) Language models often confuse syntax or features from different programming languages, libraries, or paradigms. One example I've heard of recently is language models trying to use features from the C++ standard library or Boost when writing code targeted at Unreal Engine; this doesn't work, as UE has its own standard library.

3) The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.

SargeZT · on Sept 19, 2023

> The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.

Hard disagree. I've used GPT-4 to write full optimizers from papers that were published long after the cutoff date that use concepts that simply didn't exist in the training corpus. Trivial modifications were done after to help with memory usage and whatnot, but more often than not if I provide it the appropriate text from a paper it'll spit something out that more or less works. I have enough knowledge in the field to verify the corectness.

Most recently I used GPT-4 to implement the paper Bayesian Flow Networks, a completely new concept that I recall from the comment section on HN people said "this is way too complicated for people who don't intimately know the field" to make any use of.

I don't mind it when people don't find use with LLMs for their particular problems, but I simply don't run into the vast majority of uselessness that people find, and it really makes me wonder how people are prompting to manage to find such difficulty with them.

rmbyrro · on Sept 18, 2023

They can indeed distinguish them, I agree. So why the fuss?

I think the concern is that bad authors would game the reviews and lure audiences into bad books.

But aren't they already able to do so? Is it sustainable long term? If you spit out programming books with code that doesn't even run, people will post bad reviews, ask for refunds. These authors will burn their names.

It's not sustainable.

snailmailman · on Sept 18, 2023

It doesn't need to be sustainable as one author or one book. These aren't real authors. Its people using AI to make a quick buck. By the time the fraud is found out, they've already made a profit.

They make up an authors name. Publish a bunch of books on a subject. Publish a bunch of fake reviews. Dominate the search results for a specific popular search. They get people to buy their book.

Its not even book specific, its been happening with actual products all over amazon for years. People make up a company, sell cheap garbage, and make a profit. But with books, they can now make the cheap garbage look slightly convincing. And the cheap garbage is so cheap to produce in mass amounts that nobody can really sort through and easily figure out "which of these 10k books published today are real and which are made up by ai".

It takes time and money to produce cheap products at a factory. But once these scammers have the AI generation setup, they can just publish books on loop until someone ends up buying one. They might get found out eventually, and they will have to pretend to be a different author, and they just repeat the process.

failuser · on Sept 18, 2023

What’s the fuss about spam? You can distinguish it from useful mail? What’s the fuss about traffic jams? You’ll get there eventually.

The LLM allow DDoS attack by increasing the threshold needed to check the books for gibberish.

It’s not like this stream of low quality did not exist before, but the topic is hot and many grifters try LLMs to get a quick buck at the same time.

geraldwhen · on Sept 18, 2023

It’s sustainable if you can automate the creation of amazon seller accounts. Based on the number of fraudulent Chinese seller accounts, I’d say it’s very likely automated or otherwise near 0 cost.