Revvity Signals - Drug Discovery

Improving Prediction of Drug-target Interactions Based on Fusing Multiple Features With Data Balancing and Feature Selection Techniques

Hakimeh Khojasteh, Jamshid Pirgazi, Ali Ghanbari Sorkhi

Abstract

Drug discovery relies on predicting drug-target interaction (DTI), which is an important challenging task. The purpose of DTI is to identify the interaction between drug chemical compounds and protein targets. Traditional wet lab experiments are time-consuming and expensive, that’s why in recent years, the use of computational methods based on machine learning has attracted the attention of many researchers. Actually, a dry lab environment focusing more on computational methods of interaction prediction can be helpful in limiting search space for wet lab experiments. In this paper, a novel multi-stage approach for DTI is proposed that called SRX-DTI. In the first stage, combination of various descriptors from protein sequences, and a FP2 fingerprint that is encoded from drug are extracted as feature vectors. A major challenge in this application is the imbalanced data due to the lack of known interactions, in this regard, in the second stage, the One-SVM-US technique is proposed to deal with this problem. 

Introduction

The main phase in the drug discovery process is to identify interactions between drugs and targets (or proteins), which can be performed by in vitro experiments. Identifying drug-target interaction plays a vital role in drug development that aims to identify new drug compounds for known targets and find new targets for current drugs [1,2]. The expansion of the human genome project has provided a better diagnosis of disease, early detection of certain diseases, and identifying drug-target interactions (DTIs) [3]. Although significant efforts have been done in previous years, only a limited number of drug candidates have been permitted to reach the market by the Food and Drug Administration (FDA) whereas the maximum number of drug candidates have been rejected during clinical verifications, due to side effects or low efficacy [4]. Moreover, the cost of a new chemistry-based drug is often 2.6 billion dollars, and it takes typically 15 years to finish the drug development and approval procedure. This issue has been changing into a bottleneck to identifying the targets of any candidate drug molecules [2,5]. The experiment-based methods involve high cost, time-consuming, and small-scale limitations that motivate researchers to constantly develop computational methods for the exploitation of new drugs [2,6,7]. 

Materials and methods

In this study, we propose a novel method of drug-target interaction prediction, which is called SRX-DTI. In the first step, drug chemical structures (SMILE format) and protein sequences (FASTA format) are collected from DrugBank and KEGG databases using their specific access IDs. In the next step, different feature extraction methods are applied to drug compounds and protein sequences to create a variety of features. Drug-target pair vectors are made based on known interactions and extracted features. Afterward, a balancing technique is utilized on DTI vectors to deal with imbalanced datasets, and drug–target features are selected through the FFS-RF to boost prediction performance. Finally, the XGBoost classifier is used on the balanced datasets with optimal features to predict DTIs. A schematic diagram of our proposed SRX-DTI model is shown in Fig 1.

Results

In this section, we explain the experimental results of our proposed method in DTI prediction. We implemented all the phases, i.e., features extraction, data balancing, and classifiers of the proposed model in Python language (Python 3.10 version) using the Scikit-learn library. Some of the target descriptors were calculated by the iFeature package [61] and the rest of them were implemented in Python language. OpenBabel Software was used to extract fingerprint descriptors from drugs. All of the implantations were performed on a computer with a processor 2.50 GHz Intel Xeon Gold 5–2670 CPU and 64 GB RAM.

Discussion

Comparison with other methods

During the last decade, different machine learning frameworks have been proposed to predict DTIs. Some of the proposed methods use feature selection techniques and some of those do not use feature selection. Most of the studies (as well as our approach) have used the dataset proposed by Yamanishi et al. [38] to assess the prediction ability of the proposed methods. To evaluate the effectiveness of our method, we consider six drug–target methods under the AUROC values for the same dataset under the 5-fold CV. In the following, we compare the AUROC of the SRX-DTI model with the other state-of-the-art methods proposed by Mousavian et al. [26], Li et al. [70], Meng et al. [71], Wang et al. [29], Mahmud et al. [64], Wang et al. [28], and Mahmud et al. [6]. The AUROCs generated by these models are listed in Table 15. As seen in the table, the AUROC of the proposed model is superior in comparison with the AUROC of other methods in all the datasets.

Conclusion

The identification of drug-target interactions through experimentation is a costly and time-consuming process. Therefore, the development of computational methods for identifying interactions between drugs and target proteins has become a critical step in reducing the search space for laboratory experiments. In this work, we proposed a novel framework for predicting drug-target interactions. Our approach is unique in that we use a variety of descriptors for target proteins. We implement the One-SVM-US technique to address unbalanced data. The most important advantage of the proposed method is developing the FFS-RF algorithm to find an optimal subset of features to reduce computational cost and improve prediction performance. We also compare the performance of four classifiers on balanced datasets with optimal features, ultimately selecting the XGBoost classifier to predict DTIs in our model. We then employ the XGBoost classifier to predict DTIs on five benchmark datasets. Our SRX-DTI model achieved good prediction results, which showed that the proposed method outperforms other methods to predict DTIs.

Citation: Khojasteh H, Pirgazi J, Ghanbari Sorkhi A (2023) Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques. PLoS ONE 18(8): e0288173. https://doi.org/10.1371/journal.pone.0288173

Editor: Prabina Kumar Meher, ICAR Indian Agricultural Statistics Research Institute, INDIA

Received: February 8, 2023; Accepted: June 21, 2023; Published: August 3, 2023

Copyright: © 2023 Khojasteh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The source code of the method along with datasets is freely available at https://github.com/Khojasteh-hb/SRX-DTI.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

 

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0288173#abstract0