Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Meditron: A suite of open-source medical Large Language Models (github.com/epfllm)
126 points by birriel on Nov 28, 2023 | hide | past | favorite | 42 comments


This is the only part of AI that actually terrifies me.

I’ve run into people on this very site who use LLMs as a doctor, asking it medical questions and following its advice.

The same LLMs that hallucinate court cases when asked about law.

The same LLMs that can’t perform basic arithmetic in a reliable fashion.

The same LLMs that can’t process internally consistent logic.

People are following the medical “advice” that comes out of these things. It will lead to deaths, no questions asked.


Here’s a recent (yesterday) example of a benefit though.

I tried unsuccessfully to search for an ECG analysis term (EAR or EA Run) using Google, DDG, etc. There was no magic set of quoting, search terms, etc. that could explain what those terms were. Ear is just too common for a word.

ChatGPT however was able to take the context of the question I had (an ECG analysis) and lead me to the answer right away of what EAR meant.

I wasn’t seeking medical advice though, just a better search engine with context. So there are clearly benefits here too.


Ectopic Atrial Rhythm?


Yeah I took the person's comment, snipped "ECG analysis term EAR meaning" from it and popped that in to google, found a page "ECG Glossary, Terms, & Terminology List" and under the E section it has "Ectopic Atrial Rhythm".

That said if its correct, LLMs are less work for this kind of thing.

But often when people explain to another person what they are looking for, as done in their comment, they will do a better job explaining what they need than when they are in their own head trying to google it. Which is why I just snipped their words to search for it.


Following the advice of chatgpt without double checking? Bad idea.

Using ChatGPT as a starting point? Sounds really good to me, been there, done that.


Yea I think this is the most reasonable take.

You can always check information before believing or acting on it.

However it’s often super difficult to even get started and know what it is that you should be reading more about.


I used to work on a healthcare AI chatbot startup before traditional LLMs like BERT. We were definitely worried about accuracy and reliability of the medical advice then, and we had clinicians working closely to make sure the dialog trees were trustworthy. I work in aerospace medicine and aviation safety now, and I constantly encounter inadvisable use of LLMs and a lack of effective evaluation methods (especially for domain-specific LLMs).

I appreciate the advisory notice in the README and the recommendation against using this in settings that may impact people. I sincerely hope that it's used ethically and responsibly.


Sure, but we already have 250,000 medical deaths PER YEAR in the US due to medical errors (https://pubmed.ncbi.nlm.nih.gov/28186008/).

I don't think people should trust LLMs completely, but let's be real, they shouldn't trust humans completely either.


Isn’t that whataboutism at its best? Those two things are completely unrelated.


No, it's showing that the risk of errors exists even without AI.

AI doesn't necessarily make that risk higher or lower a priori.

Plus if you knew how much of current medical practice exists without evidence you wouldn't be worrying about AI.


Maybe it’s ok to worry about both? Not trusting ”arbitrary thing A” does not logically make ”arbitrary thing B” more trustworthy. I do realise that these models intend to (incrementally) represent collective knowledge and may get there in the future. But if you worry about A, why not worry about B which is based on A?


You seem to be assuming, without any evidence at all, that LLMs giving medical advice are likely to be roughly equivalent in accuracy to doctors who are actually examining the patient and not just processing language, just because you are aware that medical mistakes are common.


https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425828/ Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis

"Six patients 65 years or older (2 women and 4 men) were included in the analysis. The accuracy of the primary diagnoses made by GPT-4, clinicians, and Isabel DDx Companion was 4 of 6 patients (66.7%), 2 of 6 patients (33.3%), and 0 patients, respectively. If including differential diagnoses, the accuracy was 5 of 6 (83.3%) for GPT-4, 3 of 6 (50.0%) for clinicians, and 2 of 6 (33.3%) for Isabel DDx Companion"


Six patients is a long way from persuasive evidence, because with so few patients randomness is going to be a large factor. And it appears that the six were chosen from the set of patients that doctors were having trouble diagnosing, which may put a thumb on the scale against doctors. But yes, it certainly suggests that a larger study might be worth doing (also including patients diagnosed correctly by doctors, to catch cases where GPT-4 doesn't do as well).


It's not whataboutism at its best, no. Just as with self-driving cars, medical AIs don't have to be perfect, or even to cause zero deaths. They just have to improve the current situation.


It depends who the end user is. As an aid for a trained physician, who is in a better position to spot the hallucinations, it may be fine, whereas a self-medicating patient could be at risk. We absolutely need more resources in healthcare throughout the world, and it may be that these models, or even AGI, have great potential as a companion for e.g. Doctors Without Borders or even at the local hospital in the future. But there’s quite a bit more nuance to giving medical advice compared to perfecting a self driving car.


A self driving car can cause incredible damage straight away. I don't think you should underestimate that. But we also don't have enough healthcare access, so the need is more urgent than that for automated drivers, the health benefit of which is often only about reducing risk of driving while tired or intoxicated.

Yes a patient could be at risk - they're at risk from everything, including a poorly trained/outdated doctor. And even more at risk from just not having access to a doctor. That's the point: it's a risk on both sides; weighing competing risks is not whataboutism.


I am personally excited for the possibilities. Nobody should be using a LLM without verifying. Will some people do it? Of course, I remember that court case where the lawyer used ChatGPT and it made up cases.

If someone is going to make that mistake, there were other mistakes happening, not just using a LLM.

On the positive note. LLMs offer the chance to potentially do much better diagnosing on hard to figure out cases.


The reality is that the majority of things people want to go to the doctor for are not serious.

If this can help with that, I am all for it.


On the contrary modern medicine terrifies me. Something like this might be our only hope.


It should. Most medicine is just extracting plant chemicals, modifying them, concentrating them, and thereby they can patent what nature has provided.


Chat GPT and that Amazon Healthcare thing will be more efficient than the US Healthcare system. Which is kind of crazy


On the other hand, your MD is going to look for the obvious, or statistically relevant, or currently prominent disease.

But they could be presented 99% probability for flu, 1% or wazalla, and that testing for wazalla means pinching your ear tout may actually be correctly diagnosed sometimes.

It is not that MDs are incompetent, it is just that when wazalla was briefly mentioned during their studies, they happened to be in the toilets and missed it. Flu was mentioned 76 times because it is common.

Disclaimer: I know medicine from "House, MD" but also witnessed a miraculous diagnosis on my father just because his MD happened to read an obscure article

(for the story, he was diagnosed with a worm-induced illness that happened one or twice a year in France in the 80's. The worm was from a beach in Brazil, and my dad never travelled to Americas. He was kindly asked to provide a sample of blood to help research in France, which he did. Finally the drug to heal him was available in one pharmacy in Paris and in Lyon. We expected a hefty cost (though it is all covered in France), it costed 5 franks or so. But we were told with my brother to keep an eye on him as he may become delusional and try to jump through the window. The poor man cold hardly blink before we were on him:) Ah, and the pills were 2cm wide, looked like they were for an elephant. And he had 5 or so to swallow)


You heard about the bell curve concept? Chances are about half the doctors you see are at the lower part of the curve. Which means they are borderline or completely incompetent at what they do.

I'd take my chances with a "properly trained" AI any day. Problem is, most medical corpus is full of bogus studies that have never been replicated, so it might be close to junk at this stage.

> It will lead to deaths,

regular doctors kill people everyday and get away with it because you accept the risks. What's different?


It's true. Only people like me should be allowed access to LLMs. Folks like you should be protected. Equivalent to accredited investor, there should be a tier of "knowledgeable normal person" who is allowed to do whatever.


Most deaths are already caused by the medical establishment.


In fairness, many doctors have terrible advice and make mistakes often. This is why malpractice insurance exists.


What's to be terrified about? Humans also hallucinate. Doctors are terrible at their jobs.


If doctors are terrible at their jobs so are programmers. And we should be terrified just as well at their AI creations then.


That's a fair point. Most programmers are terrible. As a lead programmer I've dealt with countless "programmers" that can not do their job without me spoon-feeding them code.


Wait until you hear about search engines …


Is this open source? It says the model is the Llama license which is NOT open source.


This is a bit confusing. It appears the model license https://ai.meta.com/llama/license/ is different from the code license (Apache 2.0).

Seems like... there are lots of opportunities these days to clear up what open source means?


It seems hopelessly confused, because there are no open source models, only some that allow free redistribution of the model weights. Perhaps they should be called "open" and the word "source" should be dropped.


This is debated until the end of time. Open weights, whatever you want to call it. Essentially people and companies can use LLama 2 commercially as long as you have less than 700m MAUs if I recall correctly.


"Weights available" seems apt.


Very brief summary of the paper: there aren't any new technical ideas here, just finetuning a 70B model on curated medical papers, using self-consistency CoT sampling.

Results: @70B: Better than GPT3.5, better than non-fine tuned Llama, worse than GPT-4.

70B gets a human passing score on MedQA. (Passing: 60, Medtron: 64.4, GPT-3.5: 47, GPT-4: 78.6).

TLDR: Interesting, not crazy revolutionary, almost certainly needs more training, stick with GPT-4 for your free unlicensed dangerous AI doctor needs



I like this is a pun of Metatron.



Fair enough.


oh god




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: