• Mining faces from biomedical literature using deep learning

    29 March 2018

    Gaining access to large, labelled sets of relevant images is crucial for the development and testing of biomedical imaging algorithms. Using images found in biomedical research articles would contribute some way towards a solution to this problem. However, this approach critically depends on being able to identify the most relevant images from very large sets of potentially useful figures. In this paper a deep convolutional neural network (CNN) classifier is trained using only synthetic data, to rapidly and accurately label raw images taken from biomedical articles. We apply this method in the context of detecting faces in biomedical images; and show that the classifier is able to retrieve figures containing faces with an average precision of 94.8%, from a dataset of over 31,000 images taken from articles held in the PubMed database. The utility of the classifier is then demonstrated through a case study, by aiding the mining of photographs of patients with rare genetic disorders from targeted articles. This approach is readily adaptable to facilitate the retrieval of other categories of biomedical images.

  • A comparative analysis of whole genome sequencing of esophageal adenocarcinoma pre- and post-chemotherapy.

    14 March 2018

    The scientific community has avoided using tissue samples from patients that have been exposed to systemic chemotherapy to infer the genomic landscape of a given cancer. Esophageal adenocarcinoma is a heterogeneous, chemoresistant tumor for which the availability and size of pretreatment endoscopic samples are limiting. This study compares whole-genome sequencing data obtained from chemo-naive and chemo-treated samples. The quality of whole-genomic sequencing data is comparable across all samples regardless of chemotherapy status. Inclusion of samples collected post-chemotherapy increased the proportion of late-stage tumors. When comparing matched pre- and post-chemotherapy samples from 10 cases, the mutational signatures, copy number, and SNV mutational profiles reflect the expected heterogeneity in this disease. Analysis of SNVs in relation to allele-specific copy-number changes pinpoints the common ancestor to a point prior to chemotherapy. For cases in which pre- and post-chemotherapy samples do show substantial differences, the timing of the divergence is near-synchronous with endoreduplication. Comparison across a large prospective cohort (62 treatment-naive, 58 chemotherapy-treated samples) reveals no significant differences in the overall mutation rate, mutation signatures, specific recurrent point mutations, or copy-number events in respect to chemotherapy status. In conclusion, whole-genome sequencing of samples obtained following neoadjuvant chemotherapy is representative of the genomic landscape of esophageal adenocarcinoma. Excluding these samples reduces the material available for cataloging and introduces a bias toward the earlier stages of cancer.

  • Ranking and characterization of established BMI and lipid associated loci as candidates for gene-environment interactions.

    5 April 2018

    Phenotypic variance heterogeneity across genotypes at a single nucleotide polymorphism (SNP) may reflect underlying gene-environment (G×E) or gene-gene interactions. We modeled variance heterogeneity for blood lipids and BMI in up to 44,211 participants and investigated relationships between variance effects (Pv), G×E interaction effects (with smoking and physical activity), and marginal genetic effects (Pm). Correlations between Pv and Pm were stronger for SNPs with established marginal effects (Spearman's ρ = 0.401 for triglycerides, and ρ = 0.236 for BMI) compared to all SNPs. When Pv and Pm were compared for all pruned SNPs, only BMI was statistically significant (Spearman's ρ = 0.010). Overall, SNPs with established marginal effects were overrepresented in the nominally significant part of the Pv distribution (Pbinomial <0.05). SNPs from the top 1% of the Pm distribution for BMI had more significant Pv values (PMann-Whitney = 1.46×10-5), and the odds ratio of SNPs with nominally significant (<0.05) Pm and Pv was 1.33 (95% CI: 1.12, 1.57) for BMI. Moreover, BMI SNPs with nominally significant G×E interaction P-values (Pint<0.05) were enriched with nominally significant Pv values (Pbinomial = 8.63×10-9 and 8.52×10-7 for SNP × smoking and SNP × physical activity, respectively). We conclude that some loci with strong marginal effects may be good candidates for G×E, and variance-based prioritization can be used to identify them.

  • An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans.

    3 April 2018

    To characterize type 2 diabetes (T2D)-associated variation across the allele frequency spectrum, we conducted a meta-analysis of genome-wide association data from 26,676 T2D case and 132,532 control subjects of European ancestry after imputation using the 1000 Genomes multiethnic reference panel. Promising association signals were followed up in additional data sets (of 14,545 or 7,397 T2D case and 38,994 or 71,604 control subjects). We identified 13 novel T2D-associated loci (P < 5 × 10-8), including variants near the GLP2R, GIP, and HLA-DQA1 genes. Our analysis brought the total number of independent T2D associations to 128 distinct signals at 113 loci. Despite substantially increased sample size and more complete coverage of low-frequency variation, all novel associations were driven by common single nucleotide variants. Credible sets of potentially causal variants were generally larger than those based on imputation with earlier reference panels, consistent with resolution of causal signals to common risk haplotypes. Stratification of T2D-associated loci based on T2D-related quantitative trait associations revealed tissue-specific enrichment of regulatory annotations in pancreatic islet enhancers for loci influencing insulin secretion and in adipocytes, monocytes, and hepatocytes for insulin action-associated loci. These findings highlight the predominant role played by common variants of modest effect and the diversity of biological mechanisms influencing T2D pathophysiology.