Maximum-Parsimony Haplotype Inference Based on Sparse Representations of Genotypes

The haplotypes of an individual can be used to predict diseases and help designing drugs. However, experimentally determining haplotypes is expensive and time-consuming, so genotypes are usually measured instead. Given the set of genotypes for a group of unrelated individuals, it is possible to infe...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on signal processing Vol. 60; no. 4; pp. 2013 - 2023
Main Authors	Jajamovich, G. H., Xiaodong Wang
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.04.2012 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Applied sciences Biological cells Clustering algorithms Detection, estimation, filtering, equalization, prediction Dictionaries Entropy Exact sciences and technology Exact solutions Formulations Haplotype inference Haplotypes Inference Inference algorithms Information, signal and communications theory Mathematical analysis Maximization maximum parsimony principle Representations Signal and communications theory Signal, noise sparse dictionary sparse representations Statistical analysis Studies Telecommunications and information theory Vectors Vectors (mathematics) Performance evaluation Dictionaries Dimensionality State of the art sparse dictionary Disease sparse representations Entropy Parcimony analysis Exact solution maximum parsimony principle Selection problem Haplotype inference NP hard problem Signal processing Sparse representation
Online Access	Get full text
ISSN	1053-587X 1941-0476
DOI	10.1109/TSP.2011.2179542

Cover

More Information
Summary:	The haplotypes of an individual can be used to predict diseases and help designing drugs. However, experimentally determining haplotypes is expensive and time-consuming, so genotypes are usually measured instead. Given the set of genotypes for a group of unrelated individuals, it is possible to infer the haplotype pair for each subject based on the maximum parsimony principle. Finding the exact solution to this problem is NP-hard. We propose two related formulations of the haplotype inference problem that translate the maximum parsimony principle into the sparse representation of genotypes. In the first formulation we look for the set of haplotypes that explain the genotypes such that the resulting frequency vector of haplotypes is as sparse as possible. The sparseness condition is achieved by minimizing the Tsallis entropy of the frequency vector, which is still an NP-hard problem. We propose a method that enumerates all local minima with high probability by solving a set of integer linear programs of low dimensionality. The minimizer is then found by identifying the local minimum point that achieves the lowest Tsallis entropy. In the second formulation, we state the haplotypes inference as a sparse dictionary selection problem. Each genotype is reconstructed by a haplotype pair selected from a set of available haplotypes that needs to be sparse. This leads to an approximately submodular maximization problem and therefore, can be solved with a fast greedy method. We test the proposed solutions with different data sets and compare the performance with the state-of-the-art methods, achieving similar or better results.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2011.2179542