Preprint · CC-BY
via bioRxiv
PSAP-genomic-regions: a method leveraging population data to prioritize coding and non-coding variants in whole genome sequencing for rare disease diagnosis
Ogloblinsky, M.-S., Bocher, O., Aloui, C., Leutenegger, A.-L., Ozisik, O., Baudot, A., Tournier-Lasserve, E., Castillo-Madeen, H., Lewinsohn, D., Conrad, D. F., Genin, E., Marenne, G.
biorxiv · 2024
Abstract
The introduction of next generation sequencing technologies in the clinics has improved rare disease diagnosis. Nonetheless, for very heterogeneous or very rare diseases, more than half of cases still lack molecular diagnosis. Novel strategies are needed to prioritize variants within a single individual. The PSAP (Population Sampling Probability) method was developed to meet this aim but only for coding variants in exome data. To address the challenge of the analysis of non-coding variants in whole genome sequencing data, we propose an extension of the PSAP method to the non-coding genome called PSAP-genomic-regions. In this extension, instead of considering genes as testing units (PSAP-genes strategy), we use genomic regions defined over the whole genome that pinpoint potential functional constraints.
We conceived an evaluation protocol for our method using artificially-generated disease exomes and genomes, by inserting coding and non-coding pathogenic ClinVar variants in large datasets of exomes and genomes from the general population.
We found that PSAP-genomic-regions significantly improves the ranking of these variants compared to using a pathogenicity score alone. Using PSAP-genomic-regions, more than fifty percent of non-coding ClinVar variants, especially those involved in splicing, were among the top 10 variants of the genome. In addition, our approach gave similar results compared to PSAP-genes regarding the scoring of coding variants. On real sequencing data from 6 patients with Cerebral Small Vessel Disease and 9 patients with male infertility, all causal variants were ranked in the top 100 variants with PSAP-genomic-regions.
By revisiting the testing units used in the PSAP method to include non-coding variants, we have developed PSAP-genomic-regions, an efficient whole-genome prioritization tool which offers promising results for the diagnosis of unresolved rare diseases. PSAP-genomic-regions is implemented as a user-friendly Snakemake workflow, accessible to both researchers and clinicians which can easily integrate up-to-date annotation from large databases.
Author summaryIn recent years, improvement in DNA sequencing technologies has allowed the identification of many genes involved in rare diseases. Nonetheless, the molecular diagnosis is still unknown for more than half of rare diseases cases. This is in part due to the large heterogeneity of molecular causes in rare diseases. This also highlights the need for the development of new methods to prioritize pathogenic variants from DNA sequencing data at the scale of the whole genome and not only coding regions. With PSAP-genomic-regions, we offer a strategy to prioritize coding and non-coding variants in whole-genome data from a single individual in need of a diagnosis. The PSAP-genomic-regions combines information on the predicted pathogenicity and frequency of variants in the context of functional regions of the genome. In this work, we compare the PSAP-genomic-regions strategy to other variant prioritization strategies on simulated and real data. We show the better performance of PSAP-genomic-regions over a classical approach based on variant pathogenicity scores alone. PSAP-genomic-regions provides a straightforward approach to prioritize causal pathogenic variants, especially non-coding ones, that are often missed with other strategies and could explain the cause of undiagnosed rare diseases.
◌ CITATION ONLY
Full text is not openly licensed for redistribution here. Read it at the source:
Provenance
- Source
- bioRxiv
- DOI
- 10.1101/2024.02.13.580050
- Canonical
- link ↗
- Fetched
- 2026-05-31 MST
Cite this
APA
M.-S., O., O., B., C., A., A.-L., L., O., O., A., B., E., T., H., C., D., L., F., C.D., E., G., & G., M. (2024). PSAP-genomic-regions: a method leveraging population data to prioritize coding and non-coding variants in whole genome sequencing for rare disease diagnosis. <em>biorxiv</em>. https://doi.org/10.1101/2024.02.13.580050
Vancouver
M.-S. O, O. B, C. A, A.-L. L, O. O, A. B, et al. PSAP-genomic-regions: a method leveraging population data to prioritize coding and non-coding variants in whole genome sequencing for rare disease diagnosis. biorxiv. 2024. doi:10.1101/2024.02.13.580050.
BibTeX
@unpublished{ogloblinsky2024PSAPge,
title = {PSAP-genomic-regions: a method leveraging population data to prioritize coding and non-coding variants in whole genome sequencing for rare disease diagnosis},
author = {Ogloblinsky, M.-S. and Bocher, O. and Aloui, C. and Leutenegger, A.-L. and Ozisik, O. and Baudot, A. and Tournier-Lasserve, E. and Castillo-Madeen, H. and Lewinsohn, D. and Conrad, D. F. and Genin, E. and Marenne, G.},
journal = {biorxiv},
year = {2024},
doi = {10.1101/2024.02.13.580050},
}
Research neighborhood
References, citing works, and semantically nearest findings. Click a node to open it.
Related findings
Nature 2010
Open access · CC-BY
A map of human genome variation from population-scale sequencing
Genome Medicine 2020
Open access · CC-BY
Epigenetic deregulation of lamina-associated domains in Hutchinson-Gilford progeria syndrome
Diabetes/Metabolism Research and Reviews 2020
Open access · OA
MODY patients exhibit shorter telomere length than non‐diabetic subjects
Experimental Dermatology 2016
Open access · OA
Analysis of telomere length as predictive marker in psoriasis for comorbidities
Aging 2022
Open access · CC-BY
Centenarians consistently present a younger epigenetic age than their chronological age with four epigenetic clocks based on a small number of CpG sites
Journal of the American Society of Nephrology 2001
Citation only