Here’s a recent (yesterday) example of a benefit though.
I tried unsuccessfully to search for an ECG analysis term (EAR or EA Run) using Google, DDG, etc. There was no magic set of quoting, search terms, etc. that could explain what those terms were. Ear is just too common for a word.
ChatGPT however was able to take the context of the question I had (an ECG analysis) and lead me to the answer right away of what EAR meant.
I wasn’t seeking medical advice though, just a better search engine with context. So there are clearly benefits here too.
Yeah I took the person's comment, snipped "ECG analysis term EAR meaning" from it and popped that in to google, found a page "ECG Glossary, Terms, & Terminology List" and under the E section it has "Ectopic Atrial Rhythm".
That said if its correct, LLMs are less work for this kind of thing.
But often when people explain to another person what they are looking for, as done in their comment, they will do a better job explaining what they need than when they are in their own head trying to google it. Which is why I just snipped their words to search for it.
I used to work on a healthcare AI chatbot startup before traditional LLMs like BERT. We were definitely worried about accuracy and reliability of the medical advice then, and we had clinicians working closely to make sure the dialog trees were trustworthy. I work in aerospace medicine and aviation safety now, and I constantly encounter inadvisable use of LLMs and a lack of effective evaluation methods (especially for domain-specific LLMs).
I appreciate the advisory notice in the README and the recommendation against using this in settings that may impact people. I sincerely hope that it's used ethically and responsibly.
Maybe it’s ok to worry about both? Not trusting ”arbitrary thing A” does not logically make ”arbitrary thing B” more trustworthy. I do realise that these models intend to (incrementally) represent collective knowledge and may get there in the future. But if you worry about A, why not worry about B which is based on A?
You seem to be assuming, without any evidence at all, that LLMs giving medical advice are likely to be roughly equivalent in accuracy to doctors who are actually examining the patient and not just processing language, just because you are aware that medical mistakes are common.
"Six patients 65 years or older (2 women and 4 men) were included in the analysis. The accuracy of the primary diagnoses made by GPT-4, clinicians, and Isabel DDx Companion was 4 of 6 patients (66.7%), 2 of 6 patients (33.3%), and 0 patients, respectively. If including differential diagnoses, the accuracy was 5 of 6 (83.3%) for GPT-4, 3 of 6 (50.0%) for clinicians, and 2 of 6 (33.3%) for Isabel DDx Companion"
Six patients is a long way from persuasive evidence, because with so few patients randomness is going to be a large factor. And it appears that the six were chosen from the set of patients that doctors were having trouble diagnosing, which may put a thumb on the scale against doctors. But yes, it certainly suggests that a larger study might be worth doing (also including patients diagnosed correctly by doctors, to catch cases where GPT-4 doesn't do as well).
It's not whataboutism at its best, no. Just as with self-driving cars, medical AIs don't have to be perfect, or even to cause zero deaths. They just have to improve the current situation.
It depends who the end user is. As an aid for a trained physician, who is in a better position to spot the hallucinations, it may be fine, whereas a self-medicating patient could be at risk.
We absolutely need more resources in healthcare throughout the world, and it may be that these models, or even AGI, have great potential as a companion for e.g. Doctors Without Borders or even at the local hospital in the future. But there’s quite a bit more nuance to giving medical advice compared to perfecting a self driving car.
A self driving car can cause incredible damage straight away. I don't think you should underestimate that. But we also don't have enough healthcare access, so the need is more urgent than that for automated drivers, the health benefit of which is often only about reducing risk of driving while tired or intoxicated.
Yes a patient could be at risk - they're at risk from everything, including a poorly trained/outdated doctor. And even more at risk from just not having access to a doctor. That's the point: it's a risk on both sides; weighing competing risks is not whataboutism.
I am personally excited for the possibilities. Nobody should be using a LLM without verifying. Will some people do it? Of course, I remember that court case where the lawyer used ChatGPT and it made up cases.
If someone is going to make that mistake, there were other mistakes happening, not just using a LLM.
On the positive note. LLMs offer the chance to potentially do much better diagnosing on hard to figure out cases.
On the other hand, your MD is going to look for the obvious, or statistically relevant, or currently prominent disease.
But they could be presented 99% probability for flu, 1% or wazalla, and that testing for wazalla means pinching your ear tout may actually be correctly diagnosed sometimes.
It is not that MDs are incompetent, it is just that when wazalla was briefly mentioned during their studies, they happened to be in the toilets and missed it. Flu was mentioned 76 times because it is common.
Disclaimer: I know medicine from "House, MD" but also witnessed a miraculous diagnosis on my father just because his MD happened to read an obscure article
(for the story, he was diagnosed with a worm-induced illness that happened one or twice a year in France in the 80's. The worm was from a beach in Brazil, and my dad never travelled to Americas. He was kindly asked to provide a sample of blood to help research in France, which he did. Finally the drug to heal him was available in one pharmacy in Paris and in Lyon. We expected a hefty cost (though it is all covered in France), it costed 5 franks or so. But we were told with my brother to keep an eye on him as he may become delusional and try to jump through the window. The poor man cold hardly blink before we were on him:)
Ah, and the pills were 2cm wide, looked like they were for an elephant. And he had 5 or so to swallow)
You heard about the bell curve concept? Chances are about half the doctors you see are at the lower part of the curve. Which means they are borderline or completely incompetent at what they do.
I'd take my chances with a "properly trained" AI any day. Problem is, most medical corpus is full of bogus studies that have never been replicated, so it might be close to junk at this stage.
> It will lead to deaths,
regular doctors kill people everyday and get away with it because you accept the risks. What's different?
It's true. Only people like me should be allowed access to LLMs. Folks like you should be protected. Equivalent to accredited investor, there should be a tier of "knowledgeable normal person" who is allowed to do whatever.
That's a fair point. Most programmers are terrible. As a lead programmer I've dealt with countless "programmers" that can not do their job without me spoon-feeding them code.
It seems hopelessly confused, because there are no open source models, only some that allow free redistribution of the model weights. Perhaps they should be called "open" and the word "source" should be dropped.
This is debated until the end of time. Open weights, whatever you want to call it. Essentially people and companies can use LLama 2 commercially as long as you have less than 700m MAUs if I recall correctly.
Very brief summary of the paper: there aren't any new technical ideas here, just finetuning a 70B model on curated medical papers, using self-consistency CoT sampling.
Results: @70B: Better than GPT3.5, better than non-fine tuned Llama, worse than GPT-4.
70B gets a human passing score on MedQA. (Passing: 60, Medtron: 64.4, GPT-3.5: 47, GPT-4: 78.6).
TLDR: Interesting, not crazy revolutionary, almost certainly needs more training, stick with GPT-4 for your free unlicensed dangerous AI doctor needs
I’ve run into people on this very site who use LLMs as a doctor, asking it medical questions and following its advice.
The same LLMs that hallucinate court cases when asked about law.
The same LLMs that can’t perform basic arithmetic in a reliable fashion.
The same LLMs that can’t process internally consistent logic.
People are following the medical “advice” that comes out of these things. It will lead to deaths, no questions asked.