mCODE: from a data standard to regulatory infrastructure

How an open-source oncology data model launched in 2019 became the only method of submitting data to a federal value-based care program - and the parallel work of getting structured genomic data into the EHR at scale.

The problem

Cancer care generates enormous quantities of data and almost none of it travels well. A patient diagnosed at one institution and treated at another arrives with PDFs, faxes, and free-text notes. Pathology reports, molecular profiles, treatment histories, and outcome measures are entered the same way by every clinician and stored differently by every system. The result is what Dr. Osterman, May Terry, and Robert Miller described in JCO Clinical Cancer Informatics in 2020: a field where every institution rebuilds the same custom data mappings, and where research, quality reporting, and clinical trial matching all pay the cost (Osterman, Terry & Miller, 2020).

The standard

mCODE wordmark

The minimal Common Oncology Data Elements (mCODE™) is an open-source, non-proprietary data model built on top of HL7 FHIR resources. It defines the minimum interoperable record for cancer care: patient demographics, cancer diagnosis, disease characterization, treatments, clinical findings, tumor genomics, and outcomes. It was released at the 2019 American Society of Clinical Oncology Annual Meeting and featured in the ASCO Presidential Address that year.

Diagram showing mCODE positioned alongside other HL7 FHIR Accelerator projects (Da Vinci, CodeX, Gravity, CARIN, HL7 Ontoserver), all serving as semantic-interoperability layers under FHIR.
mCODE in the FHIR ecosystem: one of several HL7 Accelerator projects that define semantic-interoperability profiles on top of base FHIR resources. Source: 2026 HL7 International Working Group Meeting deck.
The full mCODE STU 4 schema showing six top-level domains - Disease, Treatment, Outcome, Patient, Genomics, Assessment - with detailed sub-profiles and FHIR element references under each.
mCODE STU 4: six top-level domains (Disease, Treatment, Outcome, Patient, Genomics, Assessment) and the full profile graph beneath. Source: 2026 HL7 International Working Group Meeting deck.

Governance is shared. The mCODE Executive Committee includes representation from the American Society of Clinical Oncology, the American Society for Radiation Oncology, the Food and Drug Administration, the National Cancer Institute, and the Alliance for Clinical Trials. Dr. Osterman was appointed Chair of the mCODE Technology Review Group in January 2021 - the body that oversees additions and changes to the standard - and now serves as Chair of the Executive Committee.

The scale

As of this writing mCODE is implemented at more than seventy institutions across six countries, including Duke, Dana-Farber Cancer Institute, MD Anderson, The Ohio State University, the University of Michigan, the University of Pennsylvania, and the Mayo Clinic, with international implementations in Taiwan, Brazil, and Canada. The community has contributed more than two hundred public comments through the HL7 process. That trajectory matters less as a count and more as a signal: mCODE has graduated from "promising standard" to "the assumed data model" for a significant fraction of US cancer-data infrastructure.

The genomics gap

One area resisted standardization the longest: clinical genomics. Most academic medical centers continue to receive genomic reports as unstructured PDFs or faxed paper - which means molecular results sit outside the EHR's structured data, invisible to decision support, invisible to clinical trial matching, and invisible to outcomes research. By 2019 only one US institution had integrated genomic results from a reference laboratory directly into Epic.

Vanderbilt was the eighth. Under Dr. Osterman's leadership of the Clinical Genomics Workstream, Vanderbilt-Ingram Cancer Center integrated structured genomic results into the electronic health record and grew the corpus rapidly. By the end of 2021 there were 12,000 tumor genomic reports in the Vanderbilt EHR. Today Vanderbilt holds more structured genomic data in its electronic health record than any other institution in the United States. The implementation work was profiled in Healthcare Innovation in 2021 and again in Discoveries in Medicine in early 2023.

Implementation was a teaching exercise as much as an engineering one. A team of undergraduate Vanderbilt computer science students built the mCODE-on-Azure integration starting in the summer of 2022, surfacing the structured genomic data through a FHIR API that downstream applications - trial matching, targeted-therapy alerting, integration with MyCancerGenome - can call. The full pilot is documented in Minimal Common Oncology Data Elements Genomics Pilot Project: Enhancing Oncology Research Through Electronic Health Record Interoperability at Vanderbilt University Medical Center (Li et al., JCO Clinical Cancer Informatics, 2024), with Dr. Osterman as senior author and the undergraduate engineering lead Yanwei Li as first author. An earlier NCCN abstract (Vento and Osterman, BIO23-019) documents the upstream workflow for getting reference-laboratory genomics into the EHR in the first place.

The CMS hook

mCODE's significance changed shape in 2023 when the Centers for Medicare and Medicaid Services launched the Enhancing Oncology Model (EOM), a voluntary value-based care program for medical oncology practices. CMS specified that data submissions to the EOM would happen via mCODE - and only via mCODE. Overnight the standard moved from "useful interoperability layer" to "regulatory infrastructure." Any practice participating in EOM must produce mCODE-shaped data; any vendor serving those practices must emit it. That regulatory hook is the reason mCODE adoption is now self-reinforcing.

What's on top

The structured-data foundation makes downstream AI tractable. In late 2025 Dr. Osterman and collaborators published mCODEGPT in Communications Medicine - a zero-shot information-extraction approach that uses large language models to lift mCODE-conformant elements out of clinical free text (Zhang et al., 2025). The framing is deliberate. AI doesn't replace the standard; it sits on top of it. When the target schema is mCODE, an LLM has something concrete to aim at, and downstream applications can trust the output.

The lesson

mCODE is the systems improvement Dr. Osterman wanted from the beginning - fewer custom mappings between institutions, fewer one-off integrations, less friction between the data clinicians enter and the data researchers need. A standard is a force multiplier. CMS noticed; the international community noticed; the AI work that now sits on top of the standard noticed. The next decade of cancer informatics depends on building similar standards in the places mCODE doesn't yet reach.

Cited works

  1. Osterman TJ, Terry M, Miller RS. Improving Cancer Data Interoperability: The Promise of the Minimal Common Oncology Data Elements (mCODE) Initiative. JCO Clinical Cancer Informatics 2020;4:993-1001.
  2. Li Y, Ye J, Huang Y, Wu J, Liu X, Ahmed S, Osterman T. Minimal Common Oncology Data Elements Genomics Pilot Project: Enhancing Oncology Research Through Electronic Health Record Interoperability at Vanderbilt University Medical Center. JCO Clinical Cancer Informatics 2024.
  3. Vento J, Osterman TJ. BIO23-019: Precision Oncology: Integrating Structured Genomic Data Into the Electronic Health Record. Journal of the National Comprehensive Cancer Network 2023.
  4. Zhang K, Huang T, Malin BA, Osterman T, Long Q. Introducing mCODEGPT as a zero-shot information extraction from clinical free text data tool for cancer research. Communications Medicine 2025.

Related: Cancer data standards (mCODE) · Clinical genomics in the EHR · Leadership and governance roles · AI in oncology.

Back to top