ChatGPT gets confused by healthcare evidence, says CSIRO

But large language models are ‘diamonds in the rough’ with potential to empower clinicians, say experts.

The reliability of ChatGPT when answering health questions drops as low as 28% when provided with supporting evidence but remains as high as 80% when asked for yes or no answers, Australian research finds.

“If you compare the current consensus in Australia versus internationally, there’s probably more pessimism locally around the use of these AI technologies,” CSIRO principal research scientist and associate professor at the University of Queensland Bevan Koopman told Health Services Daily.

“These large language models, which are the underlying technology behind ChatGPT, are diamonds in the rough currently.

“They show amazing potential in terms of how they might help people access information or make health decisions based on the latest evidence.

“But we’re still at the stage where we’re trying to understand how these models work, how they should be deployed, and how they behave under different scenarios.”

The study, published in ACL Anthology, looked at the concurrence between ChatGPT responses to health-related questions and the correct response based on medical knowledge, depending on how the questions were asked.

The researchers found that ChatGPT was 80% accurate when asked for a yes or no answer to a health-based question like, “Can folic acid help improve cognition and treat dementia?”.

However, when provided with evidence either in support or contrary to the question, accuracy fell, even when the evidence supported the correct answer.

“We’re not sure why this happens,” said Professor Koopman, who co-authored the study.

“But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy.”

Accuracy fell to a measly 28%, when an “unsure” answer was allowed.

Dr Koopman said while the data showed that the accuracy of answers could be significantly degraded depending on how the question was asked, “overall, accuracy of 80% is really quite high for a dataset that has some pretty tricky questions”.

Mental Health Technology

Black Dog looks to AI for treatment answers

Aged Care Technology

Generative AI ‘on the agenda’ for aged care in 1-2 years

Dr Koopman said he hoped the research would motivate the development of healthcare specific LLMs that were trained using the latest medical literature and could provide source attribution.

Chair of the RACGP expert committee on practice technology and management Dr Rob Hosking said that while AI was already proving useful to clinicians, particularly as administrative or transcription tools for GPs, it seemed LLMs remained too immature to reliably answer health questions.

“It’s potentially very positive, but we’ve got to proceed with caution at the moment,” he said.

“As the CSIRO research has pointed out, currently, it’s making mistakes. And they’re pretty serious mistakes by the sound of it.”

But, if the technology was able to amalgamate information from the many guidelines and databases doctors currently referred to, it could save valuable time, added Dr Hosking.

“It’s got the potential, if we can get it right, to empower GPs to manage patients that they might otherwise have had to refer to specialists.”

Dr Koopman “100%” agreed that health information provided through LLMs would need to be interpreted by a health professional.

“A lot of the work that my team does at CSIRO is with doctors at Queensland Health and across the country, trying to empower their evidence-based decision making by providing them with access to the latest medical evidence.

“That’s really where LLMs can play a big role.

“The ever-growing amount of medical knowledge that any one clinician has to learn, retain and retrieve at critical points in times, if we can ease that burden and make it easy to access the latest evidence and understand how that relates to their particular patient, that’s really what our aim is in terms of applying this generative AI technology in healthcare setting.”

Account Information

Email Address

Password

Confirm Password

First Name

Last Name

What part of the health sector do you work in?

Organisation

Job Title

AHPRA Number

Newsletter Frequency

Daily News

Full Name LEAVE THIS BLANK

ChatGPT gets confused by healthcare evidence, says CSIRO

Black Dog looks to AI for treatment answers

Generative AI ‘on the agenda’ for aged care in 1-2 years

The delicate art of running a government-mandated digital health monopoly

Review finds ‘serious leadership instability’ at far reaches of HNE LHD

Why it may be time to recentre our health systems

Imugene: Two Additional Complete Responses and Three Partial Responses in azer-cel CAR T Phase 1b trial

APRA consults on minor updates to Health Benefits Fund Enforcement Rules 2015

DoHDA: A clearer rural pathway for non-GP specialists

Indigenous midwife referred to by racist slur on staff whiteboard at Sydney’s RPA hospital

Prevention the best form of diabetes treatment: RACGP – Royal Australian College of GPs

Global, regional, and national health-care inefficiency and associated factors in 201 countries, 1995–2022: a stochastic frontier meta-analysis for the Global Burden of Disease Study 2023

COVID-19 outbreaks in Australian residential aged care homes – 11 July 2025

Olympic/Paralympic heroes join medical experts urging parents and young adults to “be vigilant” with meningococcal cases expected to climb this peak season – GSK

Physical Disability Australia to host “Fair and Accessible Healthcare” webinar on 5th August – Physical Disability Australia (PDA)

Office of the Commonwealth Ombudsman: Quarterly Update: 1 January to 31 March 2025

Harnessing mRNA to prevent and slow Alzheimer’s disease – The Florey

Amplia Therapeutics: Another partial response confirmed in Accent pancreatic cancer trial

DoHDA: Incoming Government Brief

OncoSil Medical announces appointment of non-executive director

DoHDA: Expanded headspace Wagga Wagga now open

New Member Registration Existing member? log in.

Account Information

Payment Information We accept all major credit cards

Log in

New Member Registration Existing member? log in.

Account Information

Payment Information We accept all major credit cards