Preprint · CC-BY
via bioRxiv
Through the lens of causal inference: Decisions and pitfalls of covariate selection
Chen, G., Cai, Z., Taylor, P. A.
biorxiv · 2024
Abstract
The critical importance of justifying the inclusion of covariates is a facet often overlooked in data analysis. While the incorporation of covariates typically follows informal guidelines, we argue for a comprehensive exploration of underlying principles to avoid significant statistical and interpretational challenges. Our focus is on addressing three common yet problematic practices: the indiscriminate lumping of covariates, the lack of rationale for covariate inclusion, and the oversight of potential issues in result reporting. These challenges, prevalent in neuroimaging models involving covariates such as reaction time, demographics, and morphometric measures, can introduce biases, including overestimation, underestimation, masking, sign flipping, or spurious effects.
Our exploration of causal inference principles underscores the pivotal role of domain knowledge in guiding co-variate selection, challenging the common reliance on statistical measures. This understanding carries implications for experimental design, model-building, and result interpretation. We draw connections between these insights and reproducibility concerns, specifically addressing the selection bias resulting from the widespread practice of strict thresholding, akin to the logical pitfall associated with "double dipping." Recommendations for robust data analysis involving covariates encompass explicit research question statements, justified covariate inclusions/exclusions, centering quantitative variables for interpretability, appropriate reporting of effect estimates, and advocating a "highlight, dont hide" approach in result reporting. These suggestions are intended to enhance the robustness, transparency, and reproducibility of covariate-driven analyses, encompassing investigations involving consortium datasets such as ABCD and UK Biobank. We discuss how researchers can use a transparent depiction of the covariate relationships to enhance the ethos of open science and promote research reproducibility.
◌ CITATION ONLY
Full text is not openly licensed for redistribution here. Read it at the source:
Provenance
- Source
- bioRxiv
- DOI
- 10.1101/2024.01.11.575211
- Canonical
- link ↗
- Fetched
- 2026-05-31 MST
Cite this
APA
G., C., Z., C., & A., T.P. (2024). Through the lens of causal inference: Decisions and pitfalls of covariate selection. <em>biorxiv</em>. https://doi.org/10.1101/2024.01.11.575211
Vancouver
G. C, Z. C, A. TP. Through the lens of causal inference: Decisions and pitfalls of covariate selection. biorxiv. 2024. doi:10.1101/2024.01.11.575211.
BibTeX
@unpublished{chen2024Throug,
title = {Through the lens of causal inference: Decisions and pitfalls of covariate selection},
author = {Chen, G. and Cai, Z. and Taylor, P. A.},
journal = {biorxiv},
year = {2024},
doi = {10.1101/2024.01.11.575211},
}
Research neighborhood
References, citing works, and semantically nearest findings. Click a node to open it.
Related findings
biorxiv 2024
Preprint · CC-BY
Pseudotime analysis for time-series single-cell sequencing and imaging data
biorxiv 2024
Preprint · CC-BY
Understanding functional brain reorganisation for naturalistic piano playing in novice pianists
Metabolites 2026
Open access · OA
Towards Metabolomics-Guided Healthy and Anti-Aging Nutrition.
biorxiv 2024
Preprint · CC-BY
Inference of alveolar capillary network connectivity from blood flow dynamics
Radiocarbon 2020
Open access · OA
The IntCal20 Northern Hemisphere Radiocarbon Age Calibration Curve (0–55 cal kBP)
Disease Models & Mechanisms 2014
Open access · CC-BY