Comprehensive Repertoire of Foldable Regions within Whole Genomes

In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophob...

Full description

Saved in:
Bibliographic Details
Published inPLoS computational biology Vol. 9; no. 10; p. e1003280
Main Authors Faure, Guilhem, Callebaut, Isabelle
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 01.10.2013
PLOS
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1553-7358
1553-734X
1553-7358
DOI10.1371/journal.pcbi.1003280

Cover

More Information
Summary:In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA). These hydrophobic clusters mainly correspond to regular secondary structures, which together form structured or foldable regions. Genome-wide analyses revealed that SEG-HCA is opposite of disorder predictors, both addressing distinct structural states. Interestingly, there is however an overlap between the two predictions, including small segments of disordered sequences, which undergo coupled folding and binding. SEG-HCA thus gives access to these specific domains, which are generally poorly represented in domain databases. Comparison of the whole set of SEG-HCA predictions with the Conserved Domain Database (CDD) also highlighted a wide proportion of predicted large (length >50 amino acids) segments, which are CDD orphan. These orphan sequences may either correspond to highly divergent members of already known families or belong to new families of domains. Their comprehensive description thus opens new avenues to investigate new functional and/or structural features, which remained so far uncovered. Altogether, the data described here provide new insights into the protein architecture and organization throughout the three kingdoms of life.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMCID: PMC3812050
Conceived and designed the experiments: GF IC. Performed the experiments: GF. Analyzed the data: GF IC. Contributed reagents/materials/analysis tools: GF. Wrote the paper: GF IC.
The authors have declared that no competing interests exist.
ISSN:1553-7358
1553-734X
1553-7358
DOI:10.1371/journal.pcbi.1003280