Cancer Screening: What It Actually Catches – and What It Misses Artwork

How Doctors Think — with Dmitry Sokolov, MD

How Doctors Think explores health, performance, and longevity through clear, evidence-based conversations with clinicians, researchers, and other domain experts.

Hosted by Dmitry Sokolov, MD, the podcast examines how physiology, habits, and judgement shape real-world outcomes — especially in high-stakes areas such as productivity, surgery, recovery, metabolic health, and long-term performance.

It also explores uncertainty and the real-life problems faced by highly successful professionals in a rapidly changing world, shaped by accelerating AI and wider social and economic instability.

All Episodes

How Doctors Think — with Dmitry Sokolov, MD

Cancer Screening: What It Actually Catches – and What It Misses

March 15, 2026 • Dmitry Sokolov MD • Season 1 • Episode 5

0:00 | 22:45

Around 70% of cancer deaths come from cancers with no routine screening programme. The NHS-Galleri trial - 142,000 participants, the largest randomised trial of multi-cancer early detection ever conducted - recently reported its results. The headlines said it failed.

This video walks through the screening architecture we already have, the technology behind multi-cancer blood testing, what the trial actually found, and why the honest clinical position is more nuanced than any headline can accommodate.

Topics covered:
– Mammography, cervical screening, colonoscopy, PSA — established trade-offs
– Cell-free DNA methylation and tissue-of-origin prediction
– Sensitivity by stage: what 51.5% overall and 17–20% Stage I actually mean
– False positives vs false negatives — and which is more dangerous
– Lead-time bias and length bias in screening
– The NHS-Galleri primary endpoint, why it was not met, and what the secondary findings suggest
– Population-level guidelines vs individual-level decisions

Studies on GRAIL Galleri test referenced in this video:

1. CCGA Clinical Validation (test performance: sensitivity, specificity, stage breakdown)
Klein EA et al. Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Annals of Oncology. 2021;32(9):1167–1177.
https://www.annalsofoncology.org/article/s0923-7534(21)02046-9/fulltext

2. PATHFINDER (real-world diagnostic pathway after a positive result)
Schrag D, Beer TM, McDonnell CH et al. Blood-based tests for multicancer early detection (PATHFINDER): a prospective cohort study. Lancet. 2023;402(10409):1251–1260.
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(23)01700-2/abstract

3. NHS-Galleri (142,000-participant population-scale RCT — press release only; full data expected ASCO late May/early June 2026)
GRAIL press release, 19 February 2026.
https://grail.com/press-releases/landmark-nhs-galleri-trial-demonstrates-a-substantial-reduction-in-stage-iv-cancer-diagnoses-increased-stage-i-and-ii-detection-of-deadly-cancers-and-four-fold-higher-cancer-detection-rate/

Dmitry Sokolov MD
Consultant Anaesthetist | Lifestyle Medicine Physician
dmitrysokolovmd.com

SPEAKER_00 0:00

Around 70% of cancer deaths arise from cancers for which no routine screening exists. Pancreatic, ovarian, liver, esophageal, stomach, kidney, bladder. We don't screen for them, we wait. Across every major healthcare system, the UK, the US, Europe, Australia, population screening is established for a handful of cancers breast, cervical, and barrel. Lung screening is available in some systems for high-risk smokers, and some targeted surveillance programs exist for specific high-risk groups, like Barrett's esophagus, cirrhosis, certain inherited syndromes. But for the average-risk adults, the pathway for most lethal cancers is the same. You develop a symptom, you present, and by that point you are very likely stage 3 or 4. And the 5-year survival rate 4 pancreatic cancer, for example, is around 3%. A few weeks ago, the headlines announced that a blood test designed to detect over 50 cancer types from a single blood draw had failed in a major trial. The NHS gallery trial with 142,000 participants is the largest randomized controlled trial of multi-cancer early detection ever conducted. Those headlines were based on a corporate press release, not a peer-reviewed publication, not a formal scientific presentation. The full dataset has not yet been made public, and it is expected at the ASCO annual meeting in late May or early June. So everything you have read about this trial, every headline, every expert, every opinion, including what I am about to say, is an interpretation of a summary. That context matters. But before we look at the gallery test performance, let's walk through what we actually have already shaped in the screening architecture that exists to date. The current screening infrastructure is built around single cancer tests, each with its own protocol, frequency, evidence base, and trade-offs. Mammography for breast cancer. In the UK, NHS breast screening is every three years from age 50. In the US, biennial screening every two years is recommended from 40 to 74 years of age. The details vary by country, but the principle is the same. Regular imaging to detect breast cancer before any symptoms appear. Sensitivity varies considerably, with the breast tissue density being the major modifier. And in multiple randomized trials, mammography has been shown to reduce breast cancer mortality by roughly 20%. But mammography also produces false positives. Approximately 1 in 10 women screened over a decade will have at least one. And importantly, it may overdiagnose, essentially, it may find the cancers which, if not revealed by screening, would have never caused symptoms or death. The Marmott, the UK's independent assessment, estimated overdiagnosis around 90% of screen-detected cancers. That is not a reason to stop mammography, but it is a reminder that finding more cancer is not automatically translated to helping more patients. And this tension applies to every screening too. Cervical screening is arguably the most successful screening program in medicine. The combination of cytology, cell analysis, and HPV testing, with a shift towards primary HPV testing in recent years, has dramatically reduced cervical cancer incidence. Combined with HPV vaccination, the WHO has set a global strategy for cervical cancer elimination, and several countries, including the UK, have committed to elimination targets within the next two decades. A cancer with a long preclinical phase, a detectable precursor lesion, what comes up in the reports as abnormal cells, and an effective preventive intervention, the vaccine, this is what screening looks like when the conditions are ideal. Colonoscopy and Fecal Immunochemical Testing, FIT for colorectal cancer. Colonoscopy is both diagnostic and therapeutic. You can detect the cancer and remove the precursor lesion in the same procedure. Randomized trials of fecal or cold blood testing showed 15 to 33% reductions in colorectal mortality. Colonoscopy is highly sensitive, 75 to 93% for adenomas 6 mm or larger. But the main constraint with this screening modality is compliance. Across the UK, bowel cancer screening uptake sits roughly at 2-3. In the US, colonoscopy compliance is similar, around 66%. And for lung cancer screening with low dose CT offered only to high-risk smokers, uptake is under 20%. Even the most sensitive screening test in the world is useless if people don't take it. Now about PSA for prostate cancer. This is the most instructive example of where screening becomes complicated. The US Preventive Services Task Force recommends shared decision-making for men 55 to 69. This is not routine screening, but a conversation about whether to test. The reason is that PSA detects prostate cancer effectively, but a large proportion of the cancers it detects would never have harmed the patient. There's even a saying among doctors, most men die with prostate cancer, but some men die from it. A long-term follow-up from the European Randomized Study, the ERSBC, suggests that for every man whose life was extended by early detection, somewhere between 10 and 40 men are diagnosed with a cancer that would never have killed them. Many undergo surgery or radiation with significant side effects like incontinence, erectile dysfunction, for a disease that left alone would have remained clinically silent. PSA is the clearest demonstration that the question in cancer screening is not only can we find it, but also does finding it improve outcomes. That is the architecture we have to date. A handful of established population screening programs, breast, cervical, bowel, and some systems targeted lung screening, and a shared decision-making conversations about testing or not testing the PSA, each with decades of data and known trade-offs. And the fundamental limitation of this architecture is simple. It is organ by organ, one cancer at a time, and it covers a fraction of the cancers that kill people. This is where multi-cancer early detection enters the conversation. The thing is that tumors shed fragments of DNA into the bloodstream. This is the so-called cell-free DNA. And these fragments carry methylation patterns, the specific way of how methyl groups tag the DNA. And these patterns differ between cancerous and non-cancerous tissue. The idea of multicancer screening is that if you can read those patterns accurately, you can detect a cancerous signal from a blood drawer and predict where in the body it is coming from. Grail's gallery test is the most advanced commercial implementation of this method. It analyzes methylation patterns on cell-free DNA using targeted sequencing and machine learning, and it screens for signals associated with more than 50 cancer types. When it detects a signal, it can predict the tissue of origin with near 90% accuracy, which means that the diagnostic workup can be directed to the right organ rather than leaving the physician searching blindly. The test has been commercially available in the US since 2021 at roughly $950. It is not yet FDA approved, yet nearly half a million tests were sold by early 2026. The performance characteristics here require careful interpretation though. In the published clinical validation study, Gallery showed a specificity of 99.5%. This sounds reassuring, and at an individual level it is high. It means that of every thousand people who do not have cancer, 995 will correctly be told that no signal was detected, and 5 will be told a signal was found when it was not there. But in a screening population of millions, that half a percent generates thousands of false positives, and people are getting sent to PET scans, MRIs, and biopsies to investigate a cancer that doesn't exist in their body. Overall sensitivity, that is, across all cancers and all stages, was 51.5%. And that means that from 100 people who have cancer, the test correctly identifies it in about 52 people, but it does miss it in 48 people, the so-called false negatives. Sensitivity further breaks down by stage in a pattern that is biologically predictable. Roughly 17 to 20% for stage 1 across all cancers, about 40-45% for stage 2, approximately 77 to 81% for stage 3, and around 90% for stage 4. The reason is simple. Early stage tumors shed less DNA into the bloodstream, so the signal we can pick from blood is fainter. And here you may argue, what's the whole point of this test if it doesn't pick up the cancers as early as possible? Because from the first look it doesn't. But there is an important distinction. The sensitivity in different stages I've just mentioned to you is all cancer average. However, for the pre-specified group of 12 high mortality cancers, the cancers this technology most needs to detect, stage 1 and 2 sensitivity is materially better than the all cancer average. And this matters because the argument for multi-cancer detection rests specifically with the cancers like pancreatic, ovarian, and liver, where the current population screening infrastructure is nothing. And this is where the clinical reasoning needs to be precise. If we are evaluating gallery test as a replacement for mammography or colonoscopy, then it is inadequate. 20% stage 1 sensitivity certainly doesn't beat the colonoscopy 75 to 93%. But that is the wrong comparison. For cancers with no existing population screening, the baseline is zero. There is no test, no program, and no protocol. The relevant question is whether a test that catches some of these cancers at an earlier stage produces ultimately a net benefit compared to the current alternative of waiting for symptoms. And unsurprisingly, that question cannot be answered with a simple 20% is better than nothing. It requires accounting for the downstream consequences of testing. And this is where I want to be honest about the limitations. A test like this carries two kinds of error, and they're not equally dangerous. The conventional criticism usually focuses on false positives. As I mentioned before, specificity of 99.5% sounds excellent at an individual level, but at populational scale, when we screen millions of people, 0.5% false positive rate generates a large absolute number of people with a positive result who don't have cancer. In the Pathfinder study, among 92 participants with a cancer signal detected, only 35 were confirmed true positives and 57 were false positives. And the median time to diagnostic resolution for those false positives was 162 days, and that means 5 months of investigations and anxiety for a cancer that wasn't there. That is a real cost, financial, psychological, and systemic. But in my clinical judgment, it is a manageable cost, provided the patient is educated correctly before the test, not after it. False positives are not unique to multi-cancer early detection. They're inherent in any screening program. Mammography produces them and PSA produces them at large scale. The key is that the patient understands before the blood is drawn that a positive result may lead to further investigations that will ultimately find nothing, that it may cost money and it will be stressful. This is informed consent, the same framework I use when discussing any test whose result may be psychologically significant, Apo E4 genotyping, for example. A result that may tell you something serious about your Alzheimer's risk, but gives you very little leverage to change it. Or LP Little A test, a lipid particle that is genetically determined, poorly modifiable with currently available drugs, and can fundamentally alter how you think about your cardiovascular trajectory. All these tests require careful framing before the result arrives, and so does a multi-cancer blood test. What concerns me more, and I think this is underappreciated in the public discussion, is the false negatives. A test with 51.5% overall sensitivity means that roughly half of the cancers present at the time of testing will return a no signal detected result. For stage 1 cancers across all types, and the sensitivity in the realms of 20%, the miss rate is approximately 80%. The patient receives a clean result, and the danger is not that the test failed mechanically, it performed within its validated parameters. The danger is what happens next in the patient's mind. A person who takes a multicancer blood test and receives a negative result is at risk of believing that they have been screened, that they are clear, that the diffuse worry they carried into the consultation had been fundamentally addressed. And that belief, that false reassurance, is more clinically dangerous than never having taken any test at all. Because a patient who has not been tested remains vigilant, whereas a patient who believes they have been tested and cleared may dismiss the very symptoms, the unexplained weight loss, the new abdominal discomfort, the change in bowel habit that would otherwise prompt them to seek investigation. This is the real clinical risk of a screening test with moderate sensitivity. Not that it finds things that are not there, although it does, and that must be accounted for and framed correctly, but that it misses the things that are there. And in doing so, it replaces appropriate uncertainty with inappropriate confidence. Any physician recommending this test has an obligation to make it explicit that the test is additive rather than comprehensive. So a negative result does not mean you don't have cancers and can relax forever now. It means that the test did not detect a cancerous signal at the moment of blood draw. Those are different statements, and the gap between them is where the real harm lives. There is also a deeper methodological concern. Finding cancer earlier does not automatically translate in the fact that the patient lives longer. Two biases haunt all screening programs. Lead time bias is when the test detects a cancer that would have been found later anyway. So the patient appears to survive longer from the diagnosis, but the actual date of death has not changed. They have simply lived longer knowing that they had cancer. Length bias is when screening disproportionately detects slower-growing cancers because they remain in a detectable preclinical phase for longer. At the same time, faster and more aggressive cancers can arise and present clinically between screening rounds. In the context of gallery, length bias presents with the fact that slow-growing tumors shed more DNA over a longer period. At the same time, fast aggressive cancers, the ones that kill most quickly, may shed less or shed for a shorter window, making them harder to catch. These biases don't mean early detection is futile, but they mean that stage shift, that is, finding cancers at earlier stages, is an imperfect surrogate for what actually matters, which is whether patients live longer and better. The strength of the association between stage shift and mortality varies by cancer type. It is stronger in lung and ovarian, weaker in colorectal, and poor for prostate. State shift is encouraging, but it is not proof. The NHS gallery trial was designed to test whether this technology deployed at population scale could shift diagnosis towards earlier stages. 142,000 participants, three annual screening rounds, randomized, controlled, embedded within the NHS. The primary endpoint was a statistically significant reduction in combined stage 3 and stage 4 cancer diagnosis. The trial did not meet that endpoint. However, within a pre-specified group of 12 deadly cancers, there was a greater than 20% reduction in stage 4 diagnosis in the second and third screening rounds. Stage 1 and 2 detection increased substantially. The overall screening detection rate was four times higher than the standard care alone, and fewer cancers were detected through emergency presentation. Interestingly, stage 4 reduction was planned as a key secondary endpoint, so this is not post-hoc cherry picking. But once the primary endpoint fails, secondary findings become hypothesis supporting rather than practice settling. There was also a higher than anticipated incidence of stage 3 cancers in both arms of the trial. This matters because the primary endpoint bundled stage 3 and stage 4 together. Think about what that means. Every time the test succeeds, every time it captures a cancer early enough to downstage it from 4 to 3, that cancer is still counted in the composite endpoint. It moved from one side of the endpoint to the other, but the total didn't change. As a result, the test's own success becomes invisible to the metric designed to measure it. I want to be direct about what we still do not know. We don't have the actual numbers. We don't have confidence intervals, cancer type breakdowns, or false positive rate in this trial population. Until the full dataset is presented and it has not been as of today, strong conclusions in either direction are premature. So at present, no professional society or guideline body recommends multi-cancer early detection for routine population screening. No definitive trial has yet shown that these tests reduce overall cancer mortality. Those are the central facts, but the technology is first generation. And same as with the first mammograms, which were far less capable than the ones we use now, the sensitivity for early stage cancers will also improve as machine learning algorithms improve with more data. There is also a distinction between population-level guidelines and individual-level decisions that I think matters. A screening advisory body responsible for recommendations affecting millions of people must weigh aggregate costs, aggregate harms, and aggregate benefits. That is legitimate and necessary, but the individual sitting across from me, the person with a strong family history of pancreatic cancer for whom no population screening exists, is making a different calculation. Their risk tolerance, their values, their capacity to manage the uncertainty of a positive result that may be false, these enter the equation in the ways that population-level guidelines cannot accommodate. For cancers with no existing screening, even a modestly effective test represents a category change for certain individuals. It does not replace clinical judgment, it does not replace established screening, it is additive. And the question of whether to use it is a clinical decision made between a physician and an informed patient, not some binary. Determined by a headline. The honest position is this The technology is promising and additive. The trial data are incomplete. The skeptics are asking legitimate questions. And people are continuing to be diagnosed with cancers, pancreatic, ovarian liver, at stages where treatment options are limited. Not because the science does not exist, but because the evidence base is still being built. That is where we are. Not a failure, but an incomplete dataset from a first generation technology with signals that warrant serious attention and the first publication that the field and any informed individual should read carefully when it arrives.