AI in oncology: from ChatGPT in the clinic to mapping the care continuum

A four-paper arc from one of the first peer-reviewed evaluations of ChatGPT for clinical Q&A, through a framework for thinking about LLMs in healthcare, to a comprehensive review of AI across the entire cancer care continuum - and the structured-data corollary underneath all of it.

The moment

ChatGPT was released to the public on November 30, 2022. By the spring of 2023 it was already in clinical conversations everywhere - patients asking it about their treatments, clinicians using it to draft notes, hospital administrators wondering if it would replace work. Most of the enthusiasm was unmoored from evidence. The field needed actual evaluation of what these models did and did not do well.

The first evaluation

In February 2023 Dr. Osterman and collaborators at Vanderbilt circulated a preprint evaluating ChatGPT's accuracy and reliability on physician-posed medical questions. The peer-reviewed version, with expanded methodology, was published in JAMA Network Open in October 2023 as Accuracy and Reliability of Chatbot Responses to Physician Questions (Goodman, Patrinely, Stone, Zimmerman, et al., 2023).

The work was deliberately narrow: ChatGPT, evaluated against physician-authored answers, on real questions a clinician might ask. The findings were sobering for the hype cycle. The model produced fluent, confident, frequently-correct answers and produced confidently-stated errors with no consistent self-flagging of uncertainty. The evaluation was one of the early data points cited in the policy debates that followed.

The framework

A month after the original preprint - and well before the peer-reviewed version - Dr. Osterman co-authored On the cusp: Considering the impact of artificial intelligence language models in healthcare (Goodman, Patrinely, Osterman, Wheless & Johnson, Med, March 2023). The framing piece. Not "does ChatGPT work" but "where should LLMs be allowed to operate, who validates them, and what are the safety guardrails that need to exist before they touch a patient." The paper has been cited across the clinical-AI literature as one of the early articulations of what a responsible deployment looked like.

The full picture

By 2025 the field had matured enough to attempt a synthesis. Dr. Osterman co-authored Artificial intelligence across the cancer care continuum (Riaz, Khan & Osterman, Cancer, August 2025) - a comprehensive review mapping AI applications across the entire cancer journey: screening, diagnosis, treatment selection, toxicity prediction, survivorship, and quality of life. The structure is intentional. AI is not one thing in oncology; it is many overlapping things, each at a different stage of validation and adoption. The review documents that.

The structured-data corollary

Through every one of these papers a single argument keeps surfacing: AI in healthcare is most useful when the data underneath is structured. A model trained on unstructured PDFs generalizes poorly; a model that consumes standardized FHIR-shaped data generalizes well. This is the connection between the AI work and the standards work.

Dr. Osterman and colleagues published mCODEGPT in Communications Medicine in October 2025 (Zhang, Huang, Malin, Osterman & Long, 2025) - a zero-shot information extraction approach that uses large language models to lift mCODE-conformant elements out of clinical free text. The point is the target. When an LLM has a structured schema to aim at (mCODE), the outputs become trustworthy and reusable. When it doesn't, the outputs are eloquent guesses.

The same logic applies to the GE HealthCare Digital Precision Oncology work (case study): the reason ML on EHR data is tractable for immunotherapy outcome prediction is because the underlying clinical data was first organized, curated, and structured. AI is the visible layer; the data work underneath is what makes it possible.

The lesson

Dr. Osterman's AI work isn't a story of one model or one paper. It's a position. AI in oncology should be validated narrowly before it's deployed broadly; it should be framed honestly to clinicians and patients about what it can and cannot do; and it should be built on top of structured data standards, not as a workaround for the lack of them. The next decade of cancer AI depends on getting all three right.

Cited works

Goodman RS, Patrinely JR, Stone CA Jr, Zimmerman E, Donald RR, Chang SS, Berkowitz ST, Finn AP, Jahangir E, Scoville EA, Reese TS, Friedman DL, Bastarache JA, van der Heijden YF, Wright JJ, Ye F, Carter N, Alexander MR, Choe JH, Chastain CA, Zic JA, Horst SN, Turker I, Agarwal R, Osmundson E, Idrees K, Kiernan CM, Padmanabhan C, Bailey CE, Schlegel CE, Chambless LB, Gibson MK, Osterman TJ, Wheless L, Johnson DB. Accuracy and Reliability of Chatbot Responses to Physician Questions. JAMA Network Open 2023;6(10):e2336483.
Goodman RS, Patrinely JR, Osterman T, Wheless L, Johnson DB. On the cusp: Considering the impact of artificial intelligence language models in healthcare. Med 2023;4(3):139-140.
Riaz IB, Khan MA, Osterman TJ. Artificial intelligence across the cancer care continuum. Cancer 2025.
Zhang K, Huang T, Malin BA, Osterman T, Long Q. Introducing mCODEGPT as a zero-shot information extraction from clinical free text data tool for cancer research. Communications Medicine 2025.