Jointly representing long-range genetic similarity and spatially heterogeneous isolation-by-distance
Vivaswat Shastry, Marco Musiani, John Novembre
Abstract
Isolation-by-distance patterns in genetic variation are a widespread feature of the geographic structure of genetic variation in many species, and many methods have been developed to illuminate such patterns in genetic data. However, long-range genetic similarities also exist, often as a result of rare or episodic long-range gene flow. Jointly characterizing patterns of isolation-by-distance and long-range genetic similarity in genetic data is an open data analysis challenge that, if resolved, could help produce more complete representations of the geographic structure of genetic data in any given species.
Introduction
A key first step in understanding the genetics of a species is to understand its variation across the geographic range it inhabits (i.e., the geographic structure of genetic variation, or the “landscape genetics” of the species [1–4]). In many, or most species, isolation-by-distance patterns are common, in which genetic similarity is highest amongst the most geographically proximal individuals (i.e., [5]).
Materials and method
Analysis of a representative simulated dataset
We show a schematic workflow for the methodology of FEEMSmix using a representative simulation of a simple scenario of spatial population structure with a long-range gene flow event via Fig 1.
Results
We begin by describing a birth–death process with stochastic mutation accumulation, before deriving expected distributions for various summary statistics of interest.
Discussion
In this paper, we present a method called FEEMSmix that represents the geographic structure of genetic variation using simultaneously a landscape of spatially heterogeneous gene flow and long-range gene flow events. It is built upon a previous method called FEEMS (Fast Estimation of Effective Migration Surfaces by [10]), and follows in the same naming tradition of modeling residuals to baseline fits of the observed genetic data with a parameter specifying the strength of an instantaneous admixture pulse (e.g., TreeMix, [16]; MixMapper, [17]; SpaceMix, [18]).
Acknowledgments
We would like to thank members of the Berg, Novembre, and Steinrücken labs, as well as members of the University of Chicago Program in Computational Biology (PCB) community for helpful discussions and feedback during the development of this project.
Citation: Shastry V, Musiani M, Novembre J (2025) Jointly representing long-range genetic similarity and spatially heterogeneous isolation-by-distance. PLoS Genet 21(9): e1011612. https://doi.org/10.1371/journal.pgen.1011612
Editor: Gideon S. Bradburd, University of Michigan, UNITED STATES OF AMERICA
Received: February 6, 2025; Accepted: September 2, 2025; Published: September 16, 2025
Copyright: © 2025 Shastry et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The wolves data set is provided as part of the FEEMS package in (https://doi.org/10.7554/eLife.61927) (and is also publicly available from the original publication, https://doi.org/10.1111/mec.13364). This data set can be found at https://doi.org/10.5061/dryad.c9b25. The corrected wolves data set and the human data set used in this study can be found at https://doi.org/10.5061/dryad.p8cz8wb18 and https://zenodo.org/records/15007585. All simulated data can be reproduced using code in https://github.com/VivaswatS/feems/tree/admixture_edge. Finally, FEEMSmix is readily available as a complete python package from https://github.com/NovembreLab/feems.
Funding: Funding to JN was provided by NIH NIGMS grants R35 GM149521 and R01 GM132383. MM was supported by European Union - NextGenerationEU, under the National Recovery and Resilience Plan (NRRP), Project title “National Biodiversity Future Center -NBFC” (project code CN 00000033). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
