Unknotting RNA: A method to resolve computational artifacts

Simón Poblete, Mikolaj Mlynarczyk, Marta Szachniuk

Abstract

RNA 3D structure prediction often encounters entanglements, computational artifacts that complicate structural models, resulting in their exclusion from further studies despite the potentially accurate prediction of regions outside the entanglement. This study presents a protocol aimed at resolving such issues in RNA models while preserving the overall 3D fold and structural integrity. By employing the SPQR coarse-grained model and short Molecular Dynamics simulations, the protocol imposes energy terms that enable selective modifications to disentangle structures without causing significant distortions. 

Introduction

In recent years, computational modeling has emerged as a leading technique for elucidating the secondary and tertiary structures of biological molecules. Deep learning-driven modeling has already successfully replaced experimental methods in protein research [1–3]. For nucleic acids, predictive algorithms complement wet-lab experiments. Current computational tools effectively generate secondary structures of RNA and DNA from sequences, primarily reflecting short-range canonical interactions. However, non-canonical pairings and long-range contacts are often absent in output models [4–6].

Materials and method

Testing dataset

To test the disentanglement protocol, we downloaded RNA 3D models predicted in the CASP15 and RNA-Puzzles competitions, available in their online repositories as of January 2024. The CASP15 dataset (https://predictioncenter.org/download_area/CASP15/predictions/RNA/) included 1,660 models generated in CASP15 for 12 RNA targets. The RNA-Puzzles dataset (https://github.com/RNA-Puzzles) contained 1,028 models targeting 22 RNA sequences in rounds I-IV of RNA-Puzzles. From this collection, we discarded redundant structures and blobs, focusing our analysis on the remaining models for the entanglements.

Results and discussion

We applied the disentanglement protocol to each of the 195 entangled structures from the benchmark set (the resulting structures are available at doi: 10.5281/zenodo.13840004). Table 1 presents the aggregate results of this experiment. The protocol successfully resolved approximately half (49%) of the entanglements in eligible RNA structures, with 72% of successful cases coming from CASP15 predictions and 28% from RNA-Puzzles. It was notably more effective for interlaces, resolving 77% of cases, compared to 40% for lassos across both datasets. However, it is important to note that lassos are more prevalent – occurring twice as often as interlaces in the CASP15 dataset and 13 times more frequently in the RNA-Puzzles dataset – generally harder to remove and, importantly, may not be artifacts. Artifacts definitively include all types of interlaces and D(*) lassos. Among the 99 such conformations identified in the dataset, 80 (81%) were successfully untangled.

Conclusion

In this work, we have developed a systematic procedure for detecting entanglements and proposing refined untangled structures. Following the identification of entanglements and the specification of core nucleotides to compose them, our pipeline utilizes a multiscale approach that enables rapid energy minimization and manipulation of the nucleotides. The resulting structures are backmapped to a full-atom representation consistent with the original model. The disentanglement protocol was applied to RNA models predicted in CASP15 and RNA-Puzzles and led to the observation of substantial differences in the behavior of interlaces and lassos. For interlaces, which are clearly topological artifacts, the results are generally favorable and indicate that the protocol can be applied reliably in most cases. In contrast, the scenario for lassos is more complex. Untangling lassos often disrupts base pairs stabilizing them, or may have little to no effect on the overall structure of RNA. This indicates that the entanglement may be non-removable or that it poses no significant harm to the integrity of the molecule.

Citation: Poblete S, Mlynarczyk M, Szachniuk M (2025) Unknotting RNA: A method to resolve computational artifacts. PLoS Comput Biol 21(3): e1012843. https://doi.org/10.1371/journal.pcbi.1012843

Editor: Dina Schneidman, Hebrew University of Jerusalem, ISRAEL

Received: October 5, 2024; Accepted: February 2, 2025; Published: March 20, 2025

Copyright: © 2025 Poblete et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: RNA models predicted in CASP15 are available at https://predictioncenter.org/download_area/CASP15/predictions/RNA/, while those of RNA-Puzzles at http://www.rnapuzzles.org/results/. RNA 3D models with entanglements resolved using the SPQR-based protocol are available at doi: 10.5281/zenodo.13840004. RNAspider is accessible at https://rnaspider.cs.put.poznan.pl/ and SPQR at doi: 10.5281/zenodo.14658435 and https://github.com/srnas/spqr.

Funding: SP was supported by the Fondecyt Regular project No. 1231071 and Centro Ciencia & Vida, FB210008, Financiamiento Basal para Centros Científicos y Tecnológicos de Excelencia de ANID (https://anid.cl). MM and MS were supported by the statutory funds of Poznan University of Technology (https://www.put.poznan.pl/en) and the Institute of Bioorganic Chemistry, Polish Academy of Sciences (https://www.ibch.poznan.pl/en.html). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.