Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling
A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substant...
Saved in:
| Published in | Computer speech & language Vol. 14; no. 4; pp. 355 - 372 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Oxford
Elsevier Ltd
01.10.2000
Elsevier Academic Press |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0885-2308 1095-8363 |
| DOI | 10.1006/csla.2000.0149 |
Cover
| Summary: | A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard N -gram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N -gram range due to an intervening phrase or clause. It is also shown that these two methods individually enhance an N -gram model in complementary ways and the overall improvement from their combination is nearly additive. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 |
| ISSN: | 0885-2308 1095-8363 |
| DOI: | 10.1006/csla.2000.0149 |