Fast protein homology and fold detection with sparse spatial sample kernels

In this work we present a new string similarity feature, the sparse spatial sample (SSS). An SSS is a set of short substrings at specific spatial displacements contained in the original string. Using this feature we induce the SSS kernel (SSSK) which measures the agreement in the SSS content between...

Full description

Saved in:
Bibliographic Details
Published in2008 19th International Conference on Pattern Recognition pp. 1 - 4
Main Authors Kuksa, P., Pai-Hsi Huang, Pavlovic, V.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2008
Subjects
Online AccessGet full text
ISBN9781424421749
1424421748
ISSN1051-4651
DOI10.1109/ICPR.2008.4761450

Cover

More Information
Summary:In this work we present a new string similarity feature, the sparse spatial sample (SSS). An SSS is a set of short substrings at specific spatial displacements contained in the original string. Using this feature we induce the SSS kernel (SSSK) which measures the agreement in the SSS content between pairs of strings. The SSSK yields better prediction performance at substantially reduced computational cost than existing algorithms for sequence classification tasks. We show that on the task of predicting the functional and structural classes of proteins, the SSSK results in state-of-the-art performance across several benchmark sets in both supervised and semi-supervised learning settings. The results have immediate practical value for accurate protein superfamily and fold classification and may be similarly extended to other sequence modeling domains.
ISBN:9781424421749
1424421748
ISSN:1051-4651
DOI:10.1109/ICPR.2008.4761450