Volume 19, Nº 10 (2024)
- Ano: 2024
- Artigos: 8
- URL: https://hum-ecol.ru/1574-8936/issue/view/9934
Life Sciences
Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review
Resumo
:Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.
 891-907
				
					891-907
				
						 
			
				 
				
			
		A-RFP: An Adaptive Residue Flexibility Prediction Method Improving Protein-ligand Docking Based on Homologous Proteins
Resumo
Background:Computational molecular docking plays an important role in determining the precise receptor-ligand conformation, which becomes a powerful tool for drug discovery. In the past 30 years, most computational docking methods have treated the receptor structure as a rigid body, although flexible docking often yields higher accuracy. The main disadvantage of flexible docking is its significantly higher computational cost. Due to the fact that different protein pocket residues exhibit different degrees of flexibility, semi-flexible docking methods, balancing rigid docking and flexible docking, have demonstrated success in predicting highly accurate conformations with a relatively low computational cost.
Methods:In our study, the number of flexible pocket residues was assessed by quantitative analysis, and a novel adaptive residue flexibility prediction method, named A-RFP, was proposed to improve the docking performance. Based on the homologous information, a joint strategy is used to predict the pocket residue flexibility by combining RMSD, the distance between the residue sidechain and the ligand, and the sidechain orientation. For each receptor-ligand pair, A-RFP provides a docking conformation with the optimal affinity.
Results:By analyzing the docking affinities of 3507 target-ligand pairs in 5 different values ranging from 0 to 10, we found there is a general trend that the larger number of flexible residues inevitably improves the docking results by using Autodock Vina. However, a certain number of counterexamples still exist. To validate the effectiveness of A-RFP, the experimental assessment was tested in a small-scale virtual screening on 5 proteins, which confirmed that A-RFP could enhance the docking performance. And the flexible-receptor virtual screening on a low-similarity dataset with 85 receptors validates the accuracy of residue flexibility comprehensive evaluation. Moreover, we studied three receptors with FDA-approved drugs, which further proved A-RFP can play a suitable role in ligand discovery.
Conclusion:Our analysis confirms that the screening performance of the various numbers of flexible residues varies wildly across receptors. It suggests that a fine-grained docking method would offset the aforementioned deficiency. Thus, we presented A-RFP, an adaptive pocket residue flexibility prediction method based on homologous information. Without considering computational resources and time costs, A-RFP provides the optimal docking result.
 908-918
				
					908-918
				
						 
			
				 
				
			
		STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer
Resumo
Introduction:Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches.
Methods:We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a "random walk with a restart" approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs.
Results:Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations.
Conclusion:Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.
 919-932
				
					919-932
				
						 
			
				 
				
			
		Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data
Resumo
Background:When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data.
Methods:We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy.
Results:The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights.
Conclusion:Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.
 933-942
				
					933-942
				
						 
			
				 
				
			
		Enhancing Drug-Target Binding Affinity Prediction through Deep Learning and Protein Secondary Structure Integration
Resumo
Background:Conventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets.
Objective:This study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins.
Methods:In our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drugtarget interactions more effectively
Results:We have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset.
Conclusion:In conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.
 943-952
				
					943-952
				
						 
			
				 
				
			
		Sia-m7G: Predicting m7G Sites through the Siamese Neural Network with an Attention Mechanism
Resumo
Background:The chemical modification of RNA plays a crucial role in many biological processes. N7-methylguanosine (m7G), being one of the most important epigenetic modifications, plays an important role in gene expression, processing metabolism, and protein synthesis. Detecting the exact location of m7G sites in the transcriptome is key to understanding their relevant mechanism in gene expression. On the basis of experimentally validated data, several machine learning or deep learning tools have been designed to identify internal m7G sites and have shown advantages over traditional experimental methods in terms of speed, cost-effectiveness and robustness.
Aims:In this study, we aim to develop a computational model to help predict the exact location of m7G sites in humans.
Objective:Simple and advanced encoding methods and deep learning networks are designed to achieve excellent m7G prediction efficiently.
Methods:Three types of feature extractions and six classification algorithms were tested to identify m7G sites. Our final model, named Sia-m7G, adopts one-hot encoding and a delicate Siamese neural network with an attention mechanism. In addition, multiple 10-fold cross-validation tests were conducted to evaluate our predictor.
Results:Sia-m7G achieved the highest sensitivity, specificity and accuracy on 10-fold crossvalidation tests compared with the other six m7G predictors. Nucleotide preference and model visualization analyses were conducted to strengthen the interpretability of Sia-m7G and provide a further understanding of m7G site fragments in genomic sequences.
Conclusion:Sia-m7G has significant advantages over other classifiers and predictors, which proves the superiority of the Siamese neural network algorithm in identifying m7G sites.
 953-962
				
					953-962
				
						 
			
				 
				
			
		Integrated Machine Learning Algorithms for Stratification of Patients with Bladder Cancer
Resumo
Background:Bladder cancer is a prevalent malignancy globally, characterized by rising incidence and mortality rates. Stratifying bladder cancer patients into different subtypes is crucial for the effective treatment of this form of cancer. Therefore, there is a need to develop a stratification model specific to bladder cancer.
Purpose:This study aims to establish a prognostic prediction model for bladder cancer, with the primary goal of accurately predicting prognosis and treatment outcomes.
Methods:We collected datasets from 10 bladder cancer samples sourced from the Gene Expression Omnibus (GEO), the Cancer Genome Atlas (TCGA) databases, and IMvigor210 dataset. The machine learning based algorithms were used to generate 96 models for establishing the risk score for each patient. Based on the risk score, all the patients was classified into two different risk score groups.
Results:The two groups of bladder cancer patients exhibited significant differences in prognosis, biological functions, and drug sensitivity. Nomogram model demonstrated that the risk score had a robust predictive effect with good clinical utility.
Conclusion:The risk score constructed in this study can be utilized to predict the prognosis, response to drug treatment, and immunotherapy of bladder cancer patients, providing assistance for personalized clinical treatment of bladder cancer.
 963-976
				
					963-976
				
						 
			
				 
				
			
		CFCN: An HLA-peptide Prediction Model based on Taylor Extension Theory and Multi-view Learning
Resumo
Background:With the increasing development of biotechnology, many cancer solutions have been proposed nowadays. In recent years, Neo-peptides-based methods have made significant contributions, with an essential prerequisite of bindings between peptides and HLA molecules. However, the binding is hard to predict, and the accuracy is expected to improve further.
Methods:Therefore, we propose the Crossed Feature Correction Network (CFCN) with deep learning method, which can automatically extract and adaptively learn the discriminative features in HLA-peptide binding, in order to make more accurate predictions on HLA-peptide binding tasks. With the fancy structure of encoding and feature extracting process for peptides, as well as the feature fusion process between fine-grained and coarse-grained level, it shows many advantages on given tasks.
Results:The experiment illustrates that CFCN achieves better performances overall, compared with other fancy models in many aspects.
Conclusion:In addition, we also consider to use multi-view learning methods for the feature fusion process, in order to find out further relations among binding features. Eventually, we encapsulate our model as a useful tool for further research on binding tasks.
 977-990
				
					977-990
				
						 
			
				 
				
			
		 
						 
						 
						 
					 
						 
									



