Relational subgraphs fused with complete subgraphs based on the knowledge graph for mining protein complexes

The potential discovery of protein complexes can elucidate the structure of protein-protein interaction networks and identify downstream regulatory genes. Given the complexity of protein-protein interactions, interpretable domain knowledge discovery has gained significant attention. In this study, w...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 15; no. 1; p. 37767
Main Authors Zhao, Ruixue, Zhang, Dandan, Kou, Yuantao, Xian, Guojian, Yang, Xiao
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 29.10.2025
Nature Publishing Group
Subjects
Online AccessGet full text
ISSN2045-2322
2045-2322
DOI10.1038/s41598-025-18281-7

Cover

More Information
Summary:The potential discovery of protein complexes can elucidate the structure of protein-protein interaction networks and identify downstream regulatory genes. Given the complexity of protein-protein interactions, interpretable domain knowledge discovery has gained significant attention. In this study, we constructed a knowledge graph for interacting proteins by gathering data from UniProt and PlaPPISite databases related to the model plant Arabidopsis thaliana. We developed a relational subgraph-driven protein-protein interaction prediction model based on this knowledge graph to predict interactions within connected subgraphs. Subsequently, complete subgraphs of interacting proteins were extracted, enabling the potential discovery of protein complex structures. The knowledge graph consisted of 68,713 nodes and 109,496 semantic relationships. A total of 1,232 protein-protein interactions were predicted. Comparison with experimentally validated interactions recorded in the STRING and BioGrid databases revealed that 682 of these interactions were confirmed. Based on the predicted interactions, 336 protein complexes were identified by mining the complete subgraphs. The proposed knowledge mining method, which integrates relational subgraphs and complete subgraphs, facilitates the discovery of protein complexes and provides a novel approach for analyzing their structures and identifying downstream genes.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-18281-7