Search results
Found 9826 matches for
ACES: AUTOMATIC COHORT EXTRACTION SYSTEM FOR EVENT-STREAM DATASETS
Reproducibility remains a significant challenge in machine learning (ML) for healthcare. Datasets, model pipelines, and even task or cohort definitions are often private in this field, leading to a significant barrier in sharing, iterating, and understanding ML results on electronic health record (EHR) datasets. We address a significant part of this problem by introducing the Automatic Cohort Extraction System (ACES) for event-stream data. This library is designed to simultaneously simplify the development of tasks and cohorts for ML in healthcare and also enable their reproduction, both at an exact level for single datasets and at a conceptual level across datasets. To accomplish this, ACES provides: (1) a highly intuitive and expressive domain-specific configuration language for defining both dataset-specific concepts and dataset-agnostic inclusion or exclusion criteria, and (2) a pipeline to automatically extract patient records that meet these defined criteria from real-world data. ACES can be automatically applied to any dataset in either the Medical Event Data Standard (MEDS) or Event Stream GPT (ESGPT) formats, or to any dataset in which the necessary task-specific predicates can be extracted in an event-stream form. ACES has the potential to significantly lower the barrier to entry for defining ML tasks in representation learning, redefine the way researchers interact with EHR datasets, and significantly improve the state of reproducibility for ML studies using this modality. ACES is available at: https://github.com/justin13601/aces.
Large-scale epidemiology of opisthorchiasis in 21 provinces in Thailand based on diagnosis by fecal egg examination and urine antigen assay and analysis of risk factors for infection
Introduction Infection with the carcinogenic fish-borne trematode Opisthorchis viverrini, known as opisthorchiasis, is a major cause of biliary cancer (cholangiocarcinoma). Despite decades of disease prevention and control in Thailand, the parasite remains endemic. Here we apply a novel antigen assay for mass screening of opisthorchiasis and compare the prevalence against the conventional examination and analyze risk factors associated with current O. viverrini infection. Materials and methods We conducted a large-scale cross-sectional survey to assess transmission of O. viverrini in the North, Northeast, and Eastern regions of Thailand. We screened randomly selected people (age 15 years and over) in 23 sub-districts, within 21 provinces, with a target sample size of 1,000 per sub-district. Each participant was screened for multiple helminth infection by fecal examination (quantitative formalin-ethyl acetate concentration technique; FECT), and the antigen assay by monoclonal antibody-based enzyme-linked immunosorbent assay (ELISA) was applied to urine samples to detect O. viverrini. We collected risk factors for O. viverrini infection using standardized questionnaire surveys. The data were analyzed with regression models which correlated individual-level explanatory variables against i) infection status with O. viverrini and ii) the intensity of infection, as measured by the antigen assay or FECT. Findings Of the 20,322 individuals enrolled, 19,465 provided urine samples for antigen detection by ELISA and 18,929 provided fecal samples for examination by FECT. The urine antigen assay revealed an overall opisthorchiasis prevalence of 50.3%, a fourfold increase over the 12.2% prevalence detected by FECT. Marked spatial heterogeneity was observed, with antigen‐based prevalence estimates ranging from 22.2% to 71.4% and several localities exceeding 60%. When assessed against a composite reference standard (combined ELISA and FECT), the urine ELISA yielded a diagnostic sensitivity of 91.6%, compared with 21.9% for FECT. We found a positive correlation between fecal egg counts and the concentration of worm antigen in urine across study sites. The ratio between the prevalence of O. viverrini observed by the antigen assay and FECT was high in provinces with a low mean number of O. viverrini eggs, and the ratio approached unity as the mean eggs per gram of stool (EPG) increased. Similar aggregate distribution patterns of fecal egg counts (EPG) and urine antigen concentrations suggest that the urine assay has potential for quantitative diagnostic evaluations. When analyzing individual-level risk factors, we further identified age, sex, occupation, a history of prior treatment with praziquantel, history of O. viverrini examination, and raw fish consumption as predictive of infection with O. viverrini, while a higher education level and certain occupations emerged as protective factors. Conclusions and recommendations Application of the antigen assay to diagnose O. viverrini infection yielded a four-fold higher prevalence than the fecal egg examination, with the highest difference in low endemicity regions, which suggests that previous surveys may have underestimated the extent of opisthorchiasis in Thailand. Given the ease of urine sample collection, our study highlights the potential for application of the antigen assay as a new tool in the control of opisthorchiasis.
Extensive transmission and variation in a functional receptor for praziquantel resistance in endemic Schistosoma mansoni.
Mass-drug administration (MDA) of human populations using praziquantel monotherapy has become the primary strategy for controlling and potentially eliminating the major neglected tropical disease schistosomiasis. To understand how long-term MDA impacts schistosome populations, we analysed whole-genome sequence data of 570 Schistosoma mansoni samples (and the closely related outgroup species, S. rodhaini) from eight countries incorporating both publicly-available sequence data and new parasite material. This revealed broad-scale genetic structure across countries but with extensive transmission over hundreds of kilometres. We characterised variation across the transient receptor potential melastatin ion channel, TRPMPZQ, a target of praziquantel, which has recently been found to influence praziquantel susceptibility. Functional profiling of TRPMPZQ variants found in endemic populations identified four mutations that reduced channel sensitivity to praziquantel, indicating standing variation for resistance. Analysis of parasite infrapopulations sampled from individuals pre- and post-treatment identified instances of treatment failure, further indicative of potential praziquantel resistance. As schistosomiasis is targeted for elimination as a public health problem by 2030 in all currently endemic countries, and even interruption of transmission in selected African regions, we provide an in-depth genomic characterisation of endemic populations and an approach to identify emerging praziquantel resistance alleles.
Pathfinder studies: a novel tool for process mapping data-driven health research to build global research capacity.
BackgroundThere is vast global inequality regarding where health research happens, who leads the research, and who benefits from the evidence. Globally, wealthier nations drive and influence data-driven research and how it is structured institutionally. Key barriers to high-quality research being undertaken in and led by low-resource settings are well reported. These barriers persist, thereby perpetuating a lack of locally generated data and/or evidence to tackle diseases that bring the greatest burden. Our aim was to design a tool to capture best practices in the production of data-driven health research, to advance both quality and quantity of research being conducted where it is needed most.MethodsAn expert group of senior global health researchers from Asia, Africa, Europe, and Latin America and the Caribbean (LAC) convened to discuss potential solutions to addressing this imbalance in both quality and quantity of global health research. This study documents how a novel approach was developed, informed by this discussion, to support research teams in low-resource settings. The new approach, called "Pathfinder", is a process-mapping tool wherein teams document key steps of their research projects flow to produce quality data and subsequent studies.ResultsThe Pathfinder methodology is a novel tool to be used alongside planned studies to guide teams through each step of their research, from setting their research question, to identifying the best methods needed to complete each step, to translating research outputs into impactful policy and practice. It is a standardized framework, which can be applied or adapted to specific settings for research teams track to key steps, challenges, solutions, and tools throughout their planned study's process. Pathfinders can also be applied to studies that have already been completed, retroactively documenting their key components. Several global research institutes are piloting the Pathfinder methodology.ConclusionsPathfinders can help inform future studies by capturing best practices, thereby removing barriers to research, and addressing global inequality in this domain. Specifically, Pathfinders can help identify the methods and skills needed for teams to produce safe, ethical, and accurate data-driven health research.
Heart Rate Profiles During Exercise and Incident Parkinson's Disease.
ObjectiveTo determine whether established heart rate parameters of exercise, related to cardiac autonomic function, are associated with incident Parkinson's disease, independent of both clinical and autonomic prodromal features.MethodsA study of UK Biobank participants who performed a standardized bicycle exercise test (2009-2013), followed until November 2022, and analyzed in January 2024, was carried out. Heart rate increase from rest to exercise, and heart rate decrease from peak exercise to recovery were associated with incident Parkinson's disease. Multivariable adjustment was performed both for clinical characteristics and for prodromal non-cardiac autonomic features.ResultsA total of 69,288 eligible participants (men 48%, mean age 56.8 years [SD 8.2 years]) were followed for 12.5 years: among the 319 (0.5%) who developed Parkinson's disease, recognized prodromal markers (constipation, bladder dysfunction) were more common at baseline. The median lag time to diagnosis was 9.3 years (interquartile range 4.4). Both heart rate increase (37.5 [SD 11.5] vs 40.8 [SD 12.4] b.p.m., p InterpretationCollectively, this suggests that cardiac autonomic involvement precedes clinically manifest Parkinson's disease, and that heart rate recovery might serve as a quantitative prodromal marker. ANN NEUROL 2025.
Characterising ongoing brain aging and baseline effects from cross-sectional data
“Brain age delta” is the difference between age estimated from brain imaging data and actual age. Positive delta in adults is normally interpreted as implying that an individual is aging (or has aged) faster than the population norm, an indicator of unhealthy aging. Unfortunately, from cross-sectional (single timepoint) imaging data, it is impossible to know whether a single individual’s positive delta reflects a state of faster ongoing aging, or an unvarying trait (in other words, a “historical baseline effect” in the context of the population being studied). However, for a cross-sectional dataset comprising many individuals, one could attempt to disambiguate varying aging rates from fixed baseline effects. We present a method for doing this, and show that for the common approach of estimating a single delta per subject, baseline effects are likely to dominate. If instead one estimates multiple biologically distinct modes of brain aging, we find that some modes do reflect aging rates varying strongly across subjects. We demonstrate this, and verify our modelling, using longitudinal (two timepoint) data from 4,400 participants in UK Biobank. In addition, whereas previous work found incompatibility between cross-sectional and longitudinal brain aging, we show that careful data processing does show consistency between cross-sectional and longitudinal results.
A Primer on Inference and Prediction With Epidemic Renewal Models and Sequential Monte Carlo.
Renewal models are widely used in statistical epidemiology as semi-mechanistic models of disease transmission. While primarily used for estimating the instantaneous reproduction number, they can also be used for generating projections, estimating elimination probabilities, modeling the effect of interventions, and more. We demonstrate how simple sequential Monte Carlo methods (also known as particle filters) can be used to perform inference on these models. Our goal is to acquaint a reader who has a working knowledge of statistical inference with these methods and models and to provide a practical guide to their implementation. We focus on these methods' flexibility and their ability to handle multiple statistical and other biases simultaneously. We leverage this flexibility to unify existing methods for estimating the instantaneous reproduction number and generating projections. A companion website SMC and epidemic renewal models provides additional worked examples, self-contained code to reproduce the examples presented here, and additional materials.
Perturbations in the blood metabolome up to a decade before prostate cancer diagnosis in 4387 matched case-control sets from the European Prospective Investigation into Cancer and Nutrition.
Measuring pre-diagnostic blood metabolites may help identify novel risk factors for prostate cancer. Using data from 4387 matched case-control pairs from the European Prospective Investigation into Cancer and Nutrition (EPIC) study, we investigated the associations of 148 individual metabolites and three previously defined metabolite patterns with prostate cancer risk. Metabolites were measured by liquid chromatography-mass spectrometry. Multivariable-adjusted conditional logistic regression was used to estimate the odds ratio per standard deviation increase in log metabolite concentration and metabolite patterns (OR1SD) for prostate cancer overall, and for advanced, high-grade, aggressive. We corrected for multiple testing using the Benjamini-Hochberg method. Overall, there were no associations between specific metabolites or metabolite patterns and overall, aggressive, or high-grade prostate cancer that passed the multiple testing threshold (padj <0.05). Six phosphatidylcholines (PCs) were inversely associated with advanced prostate cancer diagnosed at or within 10 years of blood collection. metabolite patterns 1 (64 PCs and three hydroxysphingomyelins) and 2 (two acylcarnitines, glutamate, ornithine, and taurine) were also inversely associated with advanced prostate cancer; when stratified by follow-up time, these associations were observed for diagnoses at or within 10 years of recruitment (OR1SD 0.80, 95% CI 0.66-0.96 and 0.76, 0.59-0.97, respectively) but were weaker after longer follow-up (0.95, 0.82-1.10 and 0.85, 0.67-1.06). Pattern 3 (8 lyso PCs) was associated with prostate cancer death (0.82, 0.68-0.98). Our results suggest that the plasma metabolite profile changes in response to the presence of prostate cancer up to a decade before detection of advanced-stage disease.
Genomic risk prediction for depression in a large prospective study of older adults of European descent.
The extent to which genetic predisposition contributes to late-life depression risk, particularly after age 70, remains unclear, despite the high prevalence of depression in this age group and the variability in risk factors by age. This study investigated the association between a polygenic score (PGS) and depression outcomes, including severity, trajectories of depression, and antidepressant medication use, in a longitudinal cohort of 12,029 genotyped older adults of European descent aged ≥70 years, with no history of diagnosed cardiovascular disease events, dementia, or permanent physical disability at baseline. Participants were followed for a median of 4.7 years. The PGS was derived using the latest Psychiatric Genomics Consortium data for major depression. Depression was defined by the CES-D-10 score thresholds of ≥8 (primary outcome), ≥10, and ≥12 (secondary outcomes), alongside antidepressant medication use and four previously established longitudinal trajectories of depressive symptoms: low (non-depressed), moderate (subthreshold), high (persistent), and initially low but increasing (emerging). Multivariable models were used to examine associations between the PGS (per standard deviation, SD) and outcomes, adjusting for covariates. At baseline, mean participant age was 75.1 years, 54.9% were female, and 9.1% had depression (CES-D-10 ≥ 8). The PGS was significantly associated with baseline depression (OR = 1.23 [1.15-1.31]), incident depression (HR = 1.18 [1.14-1.23]) and antidepressant medication use (OR = 1.39 [1.31-1.47]). Compared with non-depressed participants, the PGS was associated with increasing severity of depression trajectory classes (subthreshold depression OR = 1.15 [1.11-1.20], emerging depression OR = 1.22 [1.13-1.31], persistent depression OR = 1.40 [1.31-1.49]). These findings suggest that the PGS may play an important role in risk stratification for late-life depression.
Multi-ancestry genome-wide association analyses incorporating SNP-by-psychosocial interactions identify novel loci for serum lipids.
Serum lipid levels, which are influenced by both genetic and environmental factors, are key determinants of cardiometabolic health and are influenced by both genetic and environmental factors. Improving our understanding of their underlying biological mechanisms can have important public health and therapeutic implications. Although psychosocial factors, including depression, anxiety, and perceived social support, are associated with serum lipid levels, it is unknown if they modify the effect of genetic loci that influence lipids. We conducted a genome-wide gene-by-psychosocial factor interaction (G×Psy) study in up to 133,157 individuals to evaluate if G×Psy influences serum lipid levels. We conducted a two-stage meta-analysis of G×Psy using both a one-degree of freedom (1df) interaction test and a joint 2df test of the main and interaction effects. In Stage 1, we performed G×Psy analyses on up to 77,413 individuals and promising associations (P -5) were evaluated in up to 55,744 independent samples in Stage 2. Significant findings (P -8) were identified based on meta-analyses of the two stages. There were 10,230 variants from 120 loci significantly associated with serum lipids. We identified novel associations for variants in four loci using the 1df test of interaction, and five additional loci using the 2df joint test that were independent of known lipid loci. Of these 9 loci, 7 could not have been detected without modeling the interaction as there was no evidence of association in a standard GWAS model. The genetic diversity of included samples was key in identifying these novel loci: four of the lead variants displayed very low frequency in European ancestry populations. Functional annotation highlighted promising loci for further experimental follow-up, particularly rs73597733 (MACROD2), rs59808825 (GRAMD1B), and rs11702544 (RRP1B). Notably, one of the genes in identified loci (RRP1B) was found to be a target of the approved drug Atenolol suggesting potential for drug repurposing. Overall, our findings suggest that taking interaction between genetic variants and psychosocial factors into account and including genetically diverse populations can lead to novel discoveries for serum lipids.
Imaging Neuroscience opening editorial
In this editorial we introduce a new non-profit open access journal, Imaging Neuroscience. In April 2023, editors of the journals NeuroImage and NeuroImage:Reports resigned, and a month later launched Imaging Neuroscience. NeuroImage had long been the leading journal in the field of neuroimaging. While the move to fully open access in 2020 represented a positive step toward modern academic practices, the publication fee was set to a level that the editors found unethical and unsustainable. The publisher of NeuroImage, Elsevier, was unwilling to reduce the fee after much discussion. This led us to launch Imaging Neuroscience with MIT Press, intended to replace NeuroImage as our field’s leading journal, but with greater control by the neuroimaging academic community over publication fees and adoption of modern and ethical publishing practices.
SARS-CoV-2 genomic diversity and within-host evolution in individuals with persistent infection in the UK: an observational, longitudinal, population-based surveillance study.
BACKGROUND: Persistent SARS-CoV-2 infections in hospitalised immunocompromised individuals are known to facilitate accelerated within-host viral evolution, potentially contributing to the emergence of highly divergent variants. However, little is known about the evolutionary dynamics and transmission risks of persistent infections in the general population. We aimed to characterise the within-host evolution of SARS-CoV-2 during persistent infections identified through a large community surveillance study. METHODS: We used data from the Office for National Statistics COVID-19 Infection Survey (ONS-CIS), a large-scale, longitudinal, population-based surveillance study conducted in the UK from April, 2020, to March, 2023. For this analysis, we focused on infections with high viral load (cycle threshold ≤30) and available genome sequences, from seven major SARS-CoV-2 lineages (alpha, delta, BA.1, BA.2, BA.4, BA.5, and XBB). ONS-CIS participants were randomly selected from the general population and tested regularly by RT-PCR, regardless of symptoms. We defined persistent infections as those with sustained or rebounding high viral RNA titres for 26 days or longer. We examined associated host characteristics and used raw sequence data to identify de novo mutations and estimate within-host synonymous and non-synonymous evolutionary rates across the SARS-CoV-2 genome. FINDINGS: Between Nov 2, 2020, and March 21, 2023, we identified 576 persistent infections with at least two sequences, including 11 alpha, 106 delta, 102 BA.1, 204 BA.2, 16 BA.4, 133 BA.5, and 4 XBB. Persistent infections were more common in males than females (p<0·0001) and individuals older than 60 years (p=0·0027). The median within-host genome-wide evolutionary rate was 7·9 × 10-4 substitutions per site per year (IQR 7·0-9·0 × 10-4), with high inter-individual variability driven largely by non-synonymous mutations, particularly in the N-terminal and receptor-binding domains of the spike protein. Longer infection duration was associated with higher evolutionary rates, while no associations were found with age, sex, vaccination status, previous infection, or virus lineage. We found no clear evidence of transmission beyond the first month of infection in any of the 84 persistent infections lasting 56 days or longer. In total, we identified 379 recurrent mutations, including many with known or predicted negative fitness effects and low prevalence at the population level, as well as de novo reversions to the Wuhan-Hu-1 reference sequence, which were likely under positive selection within those individuals. INTERPRETATION: This study highlights the heterogeneous nature of within-host SARS-CoV-2 evolution in individuals with persistent infection in the community. Notably, a small subset of persistent infections with high viral loads underwent accelerated viral evolution or recurrently acquired hallmark mutations found in novel variants. In addition, onward transmission from a persistent infection during the later stages of infection is likely to be rare. These insights have important implications for prioritising genomic surveillance and managing patients with persistent infections. FUNDING: Department of Health and Social Care.
How improvements to drug effectiveness impact mass drug administration for control and elimination of schistosomiasis.
Schistosomiasis affects more than 230 million people worldwide. Control and elimination of this parasitic infection is based on mass drug administration of praziquantel (PZQ), which has been in use for several decades. Because of the limitations of the efficacy of PZQ especially against juvenile worms, and the threat of the emergence of resistance, there is a need to consider alternative formulations or delivery methods, or new drugs that could be more efficacious. We use an individual-based stochastic model of parasite transmission to investigate the effects of possible improvements to drug efficacy. We consider an increase in efficacy compared to PZQ, as well as additional efficacy against the juvenile life stage of schistosome parasites in the human host, and a slow-release formulation that would provide long-lasting efficacy for a period of time following treatment. Analyses suggest a drug with a high efficacy of 99%, or with efficacy lasting 24 weeks after treatment, are the two most effective individual improvements to the drug profile of PZQ. A drug with long lasting efficacy is most beneficial when MDA coverage is low. However, when prevalence of infection has already been reduced to a low level, a high efficacy is the most important factor to accelerate interruption of transmission. Our results indicate that increased efficacy against juvenile worms can only result in modest benefits, but the development of a new drug formulation with higher efficacy against adult worms or long-lasting efficacy would create an improvement to the community impact over the currently used formulation.
Community-acquired pneumonia identification from electronic health records in the absence of a gold standard: A Bayesian latent class analysis.
Community-acquired pneumonia (CAP) is common and a significant cause of mortality. However, CAP surveillance commonly relies on diagnostic codes from electronic health records (EHRs), with imperfect accuracy. We used Bayesian latent class models with multiple imputation to assess the accuracy of CAP diagnostic codes in the absence of a gold standard and to explore the contribution of various EHR data sources in improving CAP identification. Using 491,681 hospital admissions in Oxfordshire, UK, from 2016 to 2023, we investigated four EHR-based algorithms for CAP detection based on 1) primary diagnostic codes, 2) clinician-documented indications for antibiotic prescriptions, 3) radiology free-text reports, and 4) vital signs and blood tests. The estimated prevalence of CAP as the reason for emergency hospital admission was 13.6% (95% credible interval 13.3-14.0%). Primary diagnostic codes had low sensitivity but a high specificity (best fitting model, 0.275 and 0.997 respectively), as did vital signs with blood tests (0.348 and 0.963). Antibiotic indication text had a higher sensitivity (0.590) but a lower specificity (0.982), with radiology reports intermediate (0.485 and 0.960). Defining CAP as present when detected by any algorithm produced sensitivity and specificity of 0.873 and 0.905 respectively. Results remained consistent using alternative priors and in sensitivity analyses. Relying solely on diagnostic codes for CAP surveillance leads to substantial under-detection; combining EHR data across multiple algorithms enhances identification accuracy. Bayesian latent class analysis-based approaches could improve CAP surveillance and epidemiological estimates by integrating multiple EHR sources, even without a gold standard for CAP diagnosis.
Multimorbidity in dementia: Current perspectives and future challenges
AbstractMultimorbidity—the co‐occurrence of two or more chronic health conditions—affects > 86% of people with dementia. It is associated with cognitive and functional decline, reduced health‐related quality of life, increased health‐care use, and higher mortality. The relationship between multimorbidity and dementia is potentially bidirectional; conditions such as hypertension and diabetes increase the risk of developing dementia, and cognitive impairment can complicate their management. This complexity presents challenges in health care and research, affecting treatment decisions and often leading to the exclusion of these individuals from clinical trials. Understanding multimorbidity through long‐term prospective studies is crucial to clarify its relationship with dementia. Investigating specific disease combinations, environmental and genetic factors, and their impacts on cognitive health will guide the development of effective prediction models and inclusive intervention strategies for diverse global populations across the life course.Highlights Multimorbidity affects > 86% of individuals with dementia, worsening outcomes. The relationship between multimorbidity and dementia is potentially bidirectional. Chronic conditions hinder dementia management and clinical trial inclusion. Life‐course multimorbidity research is key to dementia risk reduction strategies. Prospective studies are needed to improve prediction models and interventions.
Neuroimaging meta regression for coordinate based meta analysis data with a spatial model.
Coordinate-based meta-analysis combines evidence from a collection of neuroimaging studies to estimate brain activation. In such analyses, a key practical challenge is to find a computationally efficient approach with good statistical interpretability to model the locations of activation foci. In this article, we propose a generative coordinate-based meta-regression (CBMR) framework to approximate a smooth activation intensity function and investigate the effect of study-level covariates (e.g. year of publication, sample size). We employ a spline parameterization to model the spatial structure of brain activation and consider four stochastic models for modeling the random variation in foci. To examine the validity of CBMR, we estimate brain activation on 20 meta-analytic datasets, conduct spatial homogeneity tests at the voxel level, and compare the results to those generated by existing kernel-based and model-based approaches. Keywords: generalized linear models; meta-analysis; spatial statistics; statistical modeling.