Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

ObjectivesElectronic health records (EHRs) provide substantial resources for observational studies, yet present significant challenges in safeguarding patient privacy while maintaining research quality. Differential privacy (DP) offers a quantifiable privacy guarantee; however, its impact on observational studies remains underexplored. We empirically evaluated the effects of DP across varying values of its privacy parameter, epsilon, on case-control analysis outcomes using EHR data. This study aims to inform DP parameter selection and examines the influence of study characteristics on differentially private observational studies.Materials and methodsWe assessed the effects of DP on a case-control study of 1-year asthma exacerbations, including 22 165 participants with a history of asthma from UK Biobank linked to EHR data. Odds ratios (ORs) for sociodemographic factors and comorbidities were analyzed using adjusted and propensity score-matched models across epsilon values.ResultsDP influenced the magnitude, direction, and statistical significance of ORs, occasionally resembling patterns of misclassification, residual confounding, and false-positive bias. Rare and imbalanced covariates showed greater OR variability, especially in matched studies. Epsilons smaller than ln(2) led to noticeable OR fluctuations.DiscussionThe impact of DP on ORs and selection of an optimal epsilon depends on sample size, covariate prevalence, confounders, case-to-control ratios in propensity score matching, mitigation of random seed p-hacking, and trust models.ConclusionThe effects of DP on ORs are highly context-dependent. In this study, epsilon values below ln(2) led to unstable ORs across random seeds. Averaging results or using predetermined seeds may help reduce variability and mitigate p-hacking.

Original publication

DOI

10.1093/jamia/ocaf090

Type

Journal article

Journal

Journal of the American Medical Informatics Association : JAMIA

Publication Date

06/2025

Addresses

Institute of Health Informatics, University College London, London NW1 2DA, United Kingdom.