OpenAI's o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors

(theguardian.com)

105 points | by donsupreme 19 hours ago

18 comments

gpm 45 minutes ago
I'd be very very hesitant to trust studies like this. It's very easy to mess up these benchmarks.
See for example this recent paper where AI managed to beat radiologists on interpreting x-rays... when the AI didn't even have access to the x-rays: https://arxiv.org/pdf/2603.21687 (on a pre existing "large scale visual question answering benchmark for generalist chest x-ray understanding" that wasn't intentionally messed up).
And in interpreting x-ray's human radiologists actually do just look at the x-rays. In the context the article is discussing the human doctors don't just look at the notes to diagnose the ER patient. You're asking them to perform a task that isn't necessary, that they aren't experienced in, or trained in, and then saying "the AI outperforms them". Even if the notes aren't accidentally giving away the answer through some weird side channel, that's not that surprising.
Which isn't to say that I think the study is either definitely wrong, or intentionally deceptive. Just that I wouldn't draw strong conclusions from a single study here.
[-]
- pixel_popping 21 minutes ago
  I agree with you on this specific study, however, I can't really wrap my head about the fact that doctors will be better than AI models on the long-run. After all, medicine is all about knowledge, experience and intelligence (maybe "pattern recognition"), all those, we must assume that the best AI models (especially ones focusing solely in the medical field) would largely beat large majority of humans (aka doctors), if we already have this assumption for software engineers, we should have it for this field as well, and let's be realistic, each time I've seen a doc the last few months (and ER twice), each time they were using ChatGPT btw (not kidding, it chocked me).
  So I’m genuinely curious:
  What is the specific capability (or combination of capabilities) that people believe will remain permanently (or at least for decades) where a top medical AI cannot match or exceed the performance of a good human doctor? Let's put liability and ethics aside, let's be purely objective about it.
  [-]
  - largbae 0 minutes ago
    But liability and ethics cannot be put aside. If treatments were free of cost and perfectly address problems, then a correct diagnosis would always lead to the optimal patient outcome. In that scenario, AI diagnosis will be like code generation and go asymptotic to perfection as models improve.
    But a doctor's job in the real world today is to navigate a total mess of uncertainty: about the expected outcome of treatments given a patient's age and other peoblems. About the psychological effect of knowing about a problem that they cannot effectively treat. Even about what the signals in the chart and x-ray mean with any certainty.
    We are very far from having unit test suites for medical problems.
  - nkrisc 12 minutes ago
    > What is the specific capability (or combination of capabilities) that people believe will remain permanently (or at least for decades) where a top medical AI cannot match or exceed the performance of a good human doctor? Let's put liability and ethics aside, let's be purely objective about it.
    Being a human when a patient is experiencing what is potentially one of the worst moments of their life. AI could be a tool doctors use, but let’s not dehumanize health care further, it is one of the most human professions that crosses about every division you can think of.
    I would not want to receive a cancer diagnosis from a fucking AI doctor.
  - gherkinnn 8 minutes ago
    To answer your question: talking to a human.
    Medicine is so much more than "knowledge, experience, and pattern matching", as any patient ever can attest to. Why is it so hard for some people to understand that humans need other humans and human problems can't be solved with technology?
  - fc417fc802 9 minutes ago
    > I can't really wrap my head about the fact that doctors will be better than AI models on the long-run.
    Nobody said that though?
    If the current trajectory continues and if advancements are made regarding automated data collection about patients and if those advancements are adopted in the clinic then presumably specialized medical models will exceed human performance at the task of diagnosis at some point in the future. Clearly that hasn't happened yet.
creativeSlumber 0 minutes ago
> "An AI and a pair of human doctors were each given the same standard electronic health record to read"
This is handicapping the human doctors abilities. There is a lot more information a human doctor can gather even with a brief observation of the patient.
droidjj 16 minutes ago
The paper: https://www.science.org/doi/10.1126/science.adz4433 (April 30, 2026)
jmpman 5 hours ago
Besides for myself and wife, I've also used LLMs to diagnose my dogs. Convinced there's a huge opportunity for AI based veterinary, especially one which then performs bidding across the local veterinary clinics to perform the care/surgeries. I've noticed that local vets vary in price by more than an order of magnitude. My 80 year old mother and mother inlaw have been regularly scammed by over charging vets, and with their dogs being a major part of their lives, they extremely susceptible to pressure.
theshrike79 17 minutes ago
I'll repeat my idea on how this MUST be done:
1. AI gets data about the patient and makes a diagnosis. This is NOT shown to doctor yet.
2. Doctor does their stuff, writes down their diagnosis. This diagnosis is locked down and versioned.
3. Doctor sees AI's diagnosis
4. Doctor can adjust their diagnosis, BUT the original stays in the system.
This way the AI stays as the assistant and won't affect the doctor's decision, but they can change their mind after getting the extra data.
SkiFreeWin3 42 minutes ago
Yes, but what was the overlap
colechristensen 9 minutes ago
I think this is more a commentary on how bad ER diagnosis is.
LeCompteSftware 40 minutes ago
It is easy to overinterpret this based on the headline, the doctors were actually at a slight disadvantage. This isn't how they normally work, this is a little more like a med school pop quiz:
```
  An AI and a pair of human doctors were each given the same standard electronic health record to read – typically including vital sign data, demographic information and a few sentences from a nurse about why the patient was there. The AI identified the exact or very close diagnosis in 67% of cases, beating the human doctors, who were right only 50%-55% of the time.... The study only tested humans against AIs looking at patient data that can be communicated via text. The AI’s reading of signals, such as the patient’s level of distress and their visual appearance, were not tested. That means the AI was performing more like a clinician producing a second opinion based on paperwork.
```
"I don't know, let's run more tests" is also a very important ability of doctors that was apparently not tested here. In addition to all the normal methodological problems with overinterpreting results in AI/LLMs/ML/etc. Sadly I do think part of the problem here is cynical (even maniacal) careerist doctors who really shouldn't be working at hospitals. This means that even though I am generally quite anti-LLM, and really don't like the idea of patients interacting with them directly, I am a little optimistic about these being sanity/laziness checkers for health professionals.
SpyCoder77 32 minutes ago
This is a rather new article about an old model...
[-]
- sigmar 24 minutes ago
  Study design, data collection, analysis, and peer review take time. O1 came out a little over 1.5 years ago
beering 15 hours ago
o1 is several generations old and was released in 2024. Is this some quite old research that took a long time to get published?
[-]
- nhinck2 7 hours ago
  It's also important to note that it beat doctors in diagnosing in a way doctors do not diagnose.
- SpicyLemonZest 14 hours ago
  Yes, the preprint of the same paper (https://arxiv.org/abs/2412.10849) was first written in December 2024.
journal 24 minutes ago
would it ever diagnose incorrectly to save more lives? kinda weird an ai would decide who die so others may survive, but i guess whatever.
[-]
- HWR_14 13 minutes ago
  Not only should AI misdiagnose to save lives, but a human should too. You walk in with symptoms that most likely is a harmless virus that clears up on its own or 5% of the time is a deadly bacteria. The correct course of action is to try to test if it is the 5% case (most often the wrong diagnosis), not send people home because they are most likely fine. Many cases have a similar low but not 0 risky diagnosis.
SilverElfin 38 minutes ago
I’ve had much better luck with diagnosis of my own family’s issues than with doctors. Usually now, I’m feeding them more information to begin with, so that their 30 minute office visits are not wasted, requiring another expensive follow up appointment.
While I’m sure there can be ways in which such studies are wrong, it’s very obvious that AI can accelerate work in many of these areas where we seek out professional help - doctors, lawyers, etc.
[-]
- kakacik 19 minutes ago
  It can speed up some aspects of work, but please don't trust some llm with variable quality of output more than professional. If you don't like current doctor try another, most are in the business of helping other people.
  If you have string of issues with 10 last doctors though, then issue is, most probably, you...
  My wife is a GP, and easily 1/3 of her patients have also some minor-but-visible mental issue. 1-2 out of 10 scale. Makes them still functional in society but... often very hard to be around with.
  That doesn't mean I don't trust your words, there are tons of people with either rare issues or even fairly common ones but manifesting in non-standard way (or mixed with some other issue). These folks suffer a lot to find a doctor who doesn't bunch them up in some general state with generic treatment. There are those, but not that often.
  It helps both sides tremendously if patient is not above or arrogant know-it-all waving with chatgpt into doctor's face and basically just coming for prescription after self-diagnosis. Then, help is sometimes proportional to situation and lawful obligations.
Aboutplants 1 hour ago
Now show me the result of Triage Doctors with aided AI help
bluefirebrand 51 minutes ago
Unfortunately, from my understanding Doctors don't necessarily diagnose for accuracy, they often diagnose to limit liability.
They aren't going to take a stab at an uncommon diagnosis even if it occurs to them, if they might get sued if they're wrong.
Edit: I'm not trying to say Doctors deliberately diagnose wrong. Just that if there are two possible diagnoses, one common that matches some of the symptoms and one rare that matches all symptoms, doctors are still much more likely to diagnose the common one. Hoofbeats, horses, zebras, etc
Noahxel 18 minutes ago
[flagged]
Bender 6 hours ago
Humans could not diagnose and treat me correctly. They almost killed me. Curious where I could feed my symptoms and the same data I gave to an ER to an AI to test it.
[-]
- jacekm 36 minutes ago
  https://aistudio.google.com/
- causal 48 minutes ago
  Chatgpt.com?
  [-]
wg0 41 minutes ago
The Guardian needs to raise their bar on what to report and how to give readers full context on the ongoing NFT AI trust me bro crypto scam and that context would be that it is a mathematical model of human language and not medical expert or replacement for one.
[-]
- pixel_popping 26 minutes ago
  So we can eventually classify AI models as Software experts, but not as Medical experts, why so?
  [-]
  - wg0 0 minutes ago
    I don't classify them as software experts either. Anyone doing so is probably not an expert themselves.
    I take them as those code generation command line tools like create react app and such.
  - jval43 12 minutes ago
    We can't. It's just that everyone and their dog has an interest in selling you that lie because money.
    Stochastic parrots can code yes, but that does not make them experts. Don't trust them with your life.
- sigmar 34 minutes ago
  >The Guardian needs to raise their bar on what to report and how to give readers full context
  Should they not report on peer reviewed articles published in Science? or only report published articles that fit your priors?
- tene80i 34 minutes ago
  It’s a peer reviewed study in one of the world’s top science journals. It’s not some random person on a podcast.
taurath 51 minutes ago
I’d love to see a follow to that radiologist evaluation, where it failed so miserably on the thing it was supposed to be the best at that now there’s a shortage of radiologists.
[-]
- pasiaj 41 minutes ago
  Not an expert but what I’ve heard is that AI-based radiology analysis has brought down prices so much that there’s been a huge increase in demand, which has led to employee shortages.