n-Gram Models

A statistical language model defines a probability distribution over a set of symbol sequences from some finite inventory. An especially simple yet very powerful concept for the formal description of statistical language models is formed by their representation using Markov chains or so-called n-gra...

Full description

Saved in:

Bibliographic Details
Published in	Markov Models for Pattern Recognition pp. 107 - 127
Main Author	Fink, Gernot A.
Format	Book Chapter
Language	English
Published	London Springer London 2014
Series	Advances in Computer Vision and Pattern Recognition
Subjects	Automatic Speech Recognition General Distribution Language Model Markov Chain Model Symbol Sequence
Online Access	Get full text
ISBN	1447163079 9781447163077
ISSN	2191-6586 2191-6594
DOI	10.1007/978-1-4471-6308-4_6

Cover

More Information
Summary:	A statistical language model defines a probability distribution over a set of symbol sequences from some finite inventory. An especially simple yet very powerful concept for the formal description of statistical language models is formed by their representation using Markov chains or so-called n-gram models. A statistical n-gram model corresponds to a Markov chain of order n−1. The probability of a certain symbol sequence is decomposed into a product of conditional probabilities while limiting the context to n−1 predecessor symbols. For the construction of statistical language models, methods are conceivable that compute a model directly depending on relative frequencies of events that were estimated from some sample data. However, in practice it must be assumed that the majority of the theoretically possible events are not contained in the sample set considered resulting in vanishing probability estimates. Therefore, for n-gram models the probability distributions determined empirically are always subject to a post-processing or smoothing operation which aims at delivering robust estimates especially for very small conditional probabilities.
ISBN:	1447163079 9781447163077
ISSN:	2191-6586 2191-6594
DOI:	10.1007/978-1-4471-6308-4_6