Novel EDGE encoding method enhances ability to identify genetic interactions
Molly A. Hall , John Wallace, Anastasia M. Lucas, Yuki Bradford, Shefali S. Verma, Bertram Müller-Myhsok, Kristin Passero, Jiayan Zhou, John McGuigan, Beibei Jiang, Sarah A. Pendergrass, Yanfei Zhang, Peggy Peissig, Murray Brilliant, Patrick Sleiman, Hakon Hakonarson, John B. Harley, Krzysztof Kiryluk, Kristel Van Steen , Jason H. Moore , Marylyn D. Ritchie
Abstract:
When choosing a traditional genetic encoding for single nucleotide polymorphisms (SNPs), assumptions are made about their genetic models, such as additive, dominant, or recessive. However, SNPs across the genome often exhibit different genetic models, which poses a challenge for running SNP-SNP interaction analyses with every combination of encodings. In this study, we propose a novel encoding called elastic data-driven genetic encoding (EDGE), where SNPs are assigned a heterozygous value based on their observed genetic model in a dataset before interaction testing. We evaluated the power of EDGE in detecting genetic interactions using simulated genetic models and found it to outperform traditional encoding methods across different minor allele frequencies (MAFs).
Introduction: In association studies involving SNPs, choosing the appropriate encoding method, such as additive, dominant, or recessive, requires making assumptions about the behavior of the risk allele. The assumed risk associated with the heterozygous genotype varies depending on the chosen encoding method. However, the accuracy of these assumptions in capturing the true risk may vary across different encodings.
Materials and methods:
Simulated datasets: To assess the performance of EDGE in accurately assigning heterozygous genotype values and identifying SNP-SNP interactions under various genetic models, we developed the Biallelic Model Simulator. This tool generates two independent biallelic SNPs in Hardy-Weinberg equilibrium based on specified minor allele frequencies.
eMERGE datasets: We conducted genome-wide genotyping on approximately 55,000 samples across different ancestral backgrounds using Illumina BeadChips at the eMERGE II study sites. These datasets were used to assess the performance of EDGE in detecting SNP-SNP interactions.
Replication datasets: We performed candidate replication SNP-SNP interaction analyses using data from the UK Biobank, which contains genetic and phenotypic information from around 500,000 individuals. Poor quality samples and related individuals were removed from the dataset.
Statistical analyses: Regression modeling was performed using PLATO software for all simulated and eMERGE datasets. We compared the performance of EDGE with traditional encodings (additive, dominant, recessive, and codominant) in GWAS analyses. Replication analyses were conducted on significant SNP-SNP interaction models identified in the eMERGE dataset.
Discussion: For many years, the additive model has been widely used for encoding SNPs in regression-based epistasis studies. In this study, we introduced a novel encoding, EDGE, which offers flexibility in detecting SNPs with nonadditive allelic architecture. We evaluated different encoding methods in the context of epistasis, identified nonadditive genetic models, and discovered novel SNP-SNP interactions associated with complex diseases.
Citation: Hall MA, Wallace J, Lucas AM, Bradford Y, Verma SS, Müller-Myhsok B, et al. (2021) Novel EDGE encoding method enhances ability to identify genetic interactions. PLoS Genet 17(6): e1009534. https://doi.org/10.1371/journal.pgen.1009534
Editor: Heather J. Cordell, Newcastle University, UNITED KINGDOM
Received: May 7, 2020; Accepted: April 6, 2021; Published: June 4, 2021.
Copyright: © 2021 Hall et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The result files for the eMERGE data are within the manuscript and its Supporting Information files.
Funding: The project described was partially supported by NIH grants LM010098 and AI116794 to JHM.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: MDR is on the scientific advisory board for Cipherome and Goldfinch Bio. The other co-authors have declared that no competing interests exist.