About HistoAtlas
Dataset provenance, preprocessing decisions, missingness, and citation information.
1. What Is an Atlas Point?
Each point in the atlas represents one case-representative diagnostic slide, exactly one slide per patient case.
Selection rule. For each case, we prefer the Primary Tumor sample type. When multiple slides exist for the same case, we select the slide with the highest tissue area that passes quality control. See Methods §2 for the full cohort definition and exclusion criteria.
Rationale. Survival outcomes and molecular labels (mutations, expression, copy number) are annotated at the case level, not the slide level. Using one slide per case avoids pseudo-replication and ensures that every statistical association reflects independent patient-level observations.
2. Data Governance
No protected health information (PHI) is stored or served by HistoAtlas. TCGA sample identifiers are research identifiers only and are not linked to patient identity.
All source data are subject to GDC data use policies.
- For terms of use, see Terms of Service
- For data collection practices, see Privacy Policy
- To report concerns or request takedowns: GitHub issues
3. Citation
@misc{histoatlas2026,
title = {HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes},
author = {Bannier, Pierre-Antoine},
year = {2026},
url = {https://histoatlas.org},
note = {Version 1.0}
}