n-Gram Models
A statistical language model defines a probability distribution over a set of symbol sequences from some finite inventory. An especially simple yet very powerful concept for the formal description of statistical language models is formed by their representation using Markov chains or so-called n-gra...
        Saved in:
      
    
          | Published in | Markov Models for Pattern Recognition pp. 107 - 127 | 
|---|---|
| Main Author | |
| Format | Book Chapter | 
| Language | English | 
| Published | 
        London
          Springer London
    
        2014
     | 
| Series | Advances in Computer Vision and Pattern Recognition | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 1447163079 9781447163077  | 
| ISSN | 2191-6586 2191-6594  | 
| DOI | 10.1007/978-1-4471-6308-4_6 | 
Cover
| Summary: | A statistical language model defines a probability distribution over a set of symbol sequences from some finite inventory. An especially simple yet very powerful concept for the formal description of statistical language models is formed by their representation using Markov chains or so-called n-gram models. A statistical n-gram model corresponds to a Markov chain of order n−1. The probability of a certain symbol sequence is decomposed into a product of conditional probabilities while limiting the context to n−1 predecessor symbols.
For the construction of statistical language models, methods are conceivable that compute a model directly depending on relative frequencies of events that were estimated from some sample data. However, in practice it must be assumed that the majority of the theoretically possible events are not contained in the sample set considered resulting in vanishing probability estimates. Therefore, for n-gram models the probability distributions determined empirically are always subject to a post-processing or smoothing operation which aims at delivering robust estimates especially for very small conditional probabilities. | 
|---|---|
| ISBN: | 1447163079 9781447163077  | 
| ISSN: | 2191-6586 2191-6594  | 
| DOI: | 10.1007/978-1-4471-6308-4_6 |