IN PARTNERSHIP WITH

Leveraging Electronic Medical Records for Early Lung Cancer Diagnosis.

An evaluation of the C the Signs AI cancer prediction platform using the Mayo Data Platform.

June 11, 2025

Author

S

:

C the Signs

Charlotte Beames

Clare O’Neill

introduction

Background

In the US, only 27.4% of lung cancer cases are diagnosed early, with 5-year survival rates of 63% for localized and 27% for late-stage cancers (1). Despite recommendations since 2012 for screening high-risk individuals with low-dose CT, uptake has been limited, and most lung cancer diagnoses occur after symptoms appear (2). Similarly, chest x-rays, while commonly used as an initial test to investigate patients with symptomatic suspected lung cancer, have demonstrated limited sensitivities between 50-70% and specificities over 80% (3).

Symptoms often overlap with common conditions, making detection challenging, and studies have identified median delays of 187 days from symptom onset to diagnosis. This prolonged interval presents an opportunity for improvement.

This study examines the use of the AI cancer prediction platform, C the Signs, to passively screen for lung cancer by leveraging electronic medical records (EMRs) for early lung cancer detection.

METHODS

Study Design

This retrospective study aimed to evaluate the performance of the C the Signs AI Cancer Prediction Model in identifying individuals at high risk of lung cancer earlier than standard physician-led diagnostic pathways. Using the Mayo Data Platform, we accessed electronic medical records (EMRs) for a cohort of 894,409 patients, spanning a 20-year period from January 1, 2002 to December 31, 2021. A total of 7,395 patients were diagnosed with lung cancer during this timeframe.

The study included adults aged ≥18 years with at least one recorded primary care interaction. Patients with a diagnosis of lung cancer prior to the study period were excluded. The AI model was retrospectively applied to the EMR dataset to identify individuals at risk of lung cancer and assess its ability to flag these cases earlier than traditional clinical pathways.

{{feedback}}

Model Application

The C the Signs AI Prediction Model is a pan-cancer expert system developed in the United Kingdom to assist healthcare professionals in identifying patients at risk of cancer, including lung cancer, at the point of care. The model integrates a combination of risk factors, symptoms, and clinical signs, employing a dual-layered algorithm that combines structured data, such as coded clinical information (e.g., ICD-10 codes), with unstructured data derived from EMRs using Natural Language Processing (NLP) techniques. This innovative approach enhances the model’s ability to identify subtle risk indicators that might be overlooked in traditional diagnostic pathways.

{{feedback-2}}

Analysis

We assessed the model’s diagnostic performance using standard metrics: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). We also compared the time of AI-predicted risk identification to the date of clinical diagnosis made by primary care physicians to evaluate whether the AI platform facilitated earlier detection. Although lung cancer staging data was not available in this dataset, the timing comparison provided a proxy for assessing the model’s potential impact on accelerating diagnosis and pathway entry.

results

Model Performance

The C the Signs AI cancer prediction platform demonstrated a sensitivity of 91.5% and a specificity of 52.3% in identifying patients at risk of lung cancer. Among the 7,395 patients diagnosed with lung cancer in the dataset, the model correctly flagged 6,749 cases.
Notably, the platform identified 26.6% of patients at risk up to five years earlier than when their diagnoses were made by primary care physicians, underscoring its potential to support earlier detection in lung cancer pathways.

Of the 894,409 patients eligible in the dataset, 423,249 false positives were recorded among the 887,014 patients without lung cancer, contributing to the specificity result.

fig 1. Percentage of patients diagnosed with Lung  Cancer and the number of years preceding diagnosis when patients were identified at risk by C the Signs.

This study analyzes de-identified Electronic Health Record (EHR) data via Mayo Clinic Platform Discover. The data presented here is extracted following a privacy-preserving protocol
and deemed de-identified by an expert in accordance with HIPAA Privacy Rule. Additional data, including raw EHR data, cannot be shared due to expert determination. Contact corresponding authors for more information on Mayo Clinic Platform Discover.

Model Performance

The C the Signs AI cancer prediction platform demonstrated a sensitivity of 91.5% and a specificity of 52.3% in identifying patients at risk of lung cancer. Among the 7,395 patients diagnosed with lung cancer in the dataset, the model correctly flagged 6,749 cases.
Notably, the platform identified 26.6% of patients at risk up to five years earlier than when their diagnoses were made by primary care physicians, underscoring its potential to support earlier detection in lung cancer pathways.

Of the 894,409 patients eligible in the dataset, 423,249 false positives were recorded among the 887,014 patients without lung cancer, contributing to the specificity result.

Model Performance

The C the Signs AI cancer prediction platform demonstrated a sensitivity of 91.5% and a specificity of 52.3% in identifying patients at risk of lung cancer. Among the 7,395 patients diagnosed with lung cancer in the dataset, the model correctly flagged 6,749 cases.
Notably, the platform identified 26.6% of patients at risk up to five years earlier than when their diagnoses were made by primary care physicians, underscoring its potential to support earlier detection in lung cancer pathways.

Of the 894,409 patients eligible in the dataset, 423,249 false positives were recorded among the 887,014 patients without lung cancer, contributing to the specificity result.

fig 2. Exploring detection of cancer types.

This study analyzes de-identified Electronic Health Record (EHR) data via Mayo Clinic Platform Discover. The data presented here is extracted following a privacy-preserving protocol
and deemed de-identified by an expert in accordance with HIPAA Privacy Rule. Additional data, including raw EHR data, cannot be shared due to expert determination. Contact corresponding authors for more information on Mayo Clinic Platform Discover.

Model Performance

The C the Signs AI cancer prediction platform demonstrated a sensitivity of 91.5% and a specificity of 52.3% in identifying patients at risk of lung cancer. Among the 7,395 patients diagnosed with lung cancer in the dataset, the model correctly flagged 6,749 cases.
Notably, the platform identified 26.6% of patients at risk up to five years earlier than when their diagnoses were made by primary care physicians, underscoring its potential to support earlier detection in lung cancer pathways.

Of the 894,409 patients eligible in the dataset, 423,249 false positives were recorded among the 887,014 patients without lung cancer, contributing to the specificity result.

This study analyzes de-identified Electronic Health Record (EHR) data via Mayo Clinic
Platform Discover. The data presented here is extracted following a privacy-preserving protocol
and deemed de-identified by an expert in accordance with HIPAA Privacy Rule. Additional data, including raw EHR data, cannot be shared due to expert determination. Contact corresponding authors for more information on Mayo Clinic Platform Discover.

  • The C the Signs Al cancer prediction platform demonstrated a sensitivity of 91.5% and a specificity of 52.3% in identifying patients at risk of lung cancer.
  • Among the 7,395 patients diagnosed with lung cancer in the dataset, the model correctly flagged 6,749 cases.
  • Notably, the platform identified 26.6% of patients at risk up to five years earlier than when their diagnoses were made by primary care physicians, underscoring its potential to support earlier detection in lung cancer pathways.
  • Of the 894,409 patients eligible in the dataset, 423,249 false positives were recorded among the 887,014 patients without lung cancer, contributing to the specificity result.
interpreting the results

Discussion

This study demonstrates the potential of the C the Signs AI Prediction Model in improving early detection of lung cancer. With a sensitivity of 91.5%, the model performs comparably to or better than current clinical tools, such as chest X-ray, which are often used reactively and have limited sensitivity for early-stage disease.

The AI platform’s performance is notable in its ability to detect 26.6% of lung cancer cases up to five years earlier than the diagnoses made through traditional physician-led pathways, supporting its value in identifying patients before significant clinical progression occurs.

While the specificity of 52.3% suggests a moderate false-positive rate, this aligns with trade-offs seen in other early detection models prioritising high sensitivity. Importantly, the platform’s high NPV indicates strong potential to safely rule out low-risk patients, potentially reducing unnecessary imaging, referrals, or anxiety.

One of the key advantages of this AI model is its capacity to flag individuals who might not yet meet established clinical referral thresholds or who present with non-specific symptoms. This capability is particularly relevant in lung cancer, where early symptoms can be vague or absent, and diagnosis often occurs at an advanced stage.

By identifying risk earlier, the model supports more timely investigation and intervention,
with implications for improved survival and reduced morbidity. These benefits may outweigh the cost of additional diagnostics prompted by early risk identification.

Conclusion

This study highlights the significant potential of AI prediction models, such as
C the Signs, to address the challenge of early lung cancer detection.

By harnessing routinely collected EMR data, the platform can identify high-risk individuals earlier than traditional diagnostic approaches, offering an opportunity to improve clinical outcomes and reduce system-level burdens. These findings support the integration of AI-driven tools as complementary aids within diagnostic pathways, especially for cancers where early detection remains elusive with current strategies.

“Our position in the NHS ecosystem enables us to direct innovation towards tackling the system's most pressing challenges, such as reducing waiting lists, tackling health inequalities, and improving the update of medicines... We've brought 6 innovators who are currently delivering a huge impact across England here tonight.”

RICHARD STUBBS

CHAIR OF THE HEALTH INNOVATION NETWORK

"Nullam quis risus eget urna mollis ornare vel eu leo. Sed posuere consectetur est at lobortis. Integer posuere erat a ante venenatis dapibus posuere velit aliquet."Cras mattis consectetur purus sit amet fermentum. Maecenas faucibus mollis interdum. Donec sed odio dui.'

Test

Test position

Affiliations list

  1. Consultant Clinical Oncologist, Department of Radiotherapy, Charing Cross Hospital, Imperial College Healthcare NHS Trust.
  2. Honorary Clinical Senior Lecturer, Department of Cancer & Surgery, Imperial College London.
  3. C the Signs Inc, Boston, MA, USA.
  4. Harvard Medical School.
  5. Resident Physician, Department of Radiation Oncology, Memorial Sloan Kettering, NY, USA.
  6. Professor of Oncology, Mayo Clinic, Department of Oncology, Rochester, NM, USA.
  7. Mayo Clinic Comprehensive Care Center, Jacksonville, FL, USA.
  8. Senior Associate Consultant of Al, Informatics and Oncology, Mayo Clinic, AZ, USA.

References

  1. https://seer.cancer.gov/statfacts/html/lungb.html
  2. Sorscher S. Inadequate Uptake of USPSTF-Recommended Low Dose CT Lung Cancer Screening. J Prim Care Community Health. 2024 Jan-Dec;15:21501319241235011. doi: 10.1177/21501319241235011. PMID: 38400557; PMCID: PMC10894545.
  3. Bradley SH, Abraham S, Callister ME, Grice A, Hamilton WT, Lopez RR, Shinkins B, Neal RD. Sensitivity of chest X-ray for detecting lung cancer in people presenting with symptoms: asystematic review. Br J Gen Pract. 2019 Nov 28;69(689):e827-e835. doi: 10.3399/bjgp19X706853. PMID: 31636130; PMCID: PMC6805164.

C our case studies.

MORE OF OUR CASE STUDIES
No items found.

Case Study on Osteoarthritis Treatment

Case Study on Heart Failure Management

Case Study on Asthma Control

Case Study on Hypertension Treatment

Case Study on Diabetes Management