Latent Variable Grammars for Natural Language Parsing

As described in Chap. 1, parsing is the process of analyzing the syntactic structure of natural language sentences and will be fundamental for building systems that can understand natural languages. Probabilistic context-free grammars (PCFGs) underlie most high-performance parsers in one way or anot...

Full description

Saved in:
Bibliographic Details
Published inCoarse-to-Fine Natural Language Processing pp. 7 - 46
Main Author Petrov, Slav
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 08.09.2011
SeriesTheory and Applications of Natural Language Processing
Subjects
Online AccessGet full text
ISBN3642227422
9783642227424
ISSN2192-032X
2192-0338
DOI10.1007/978-3-642-22743-1_2

Cover

More Information
Summary:As described in Chap. 1, parsing is the process of analyzing the syntactic structure of natural language sentences and will be fundamental for building systems that can understand natural languages. Probabilistic context-free grammars (PCFGs) underlie most high-performance parsers in one way or another (Charniak 2000; Collins 1999; Charniak and Johnson 2005; Huang 2008). However, as demonstrated by Charniak (1996) and Klein and Manning (2003a), a PCFG which simply takes the empirical rules and probabilities off of a treebank does not perform well. This naive grammar is a poor one because its context-freedom assumptions are too strong in some places (e.g., it assumes that subject and object NPs share the same distribution) and too weak in others (e.g., it assumes that long rewrites are not decomposable into smaller steps). Therefore, a variety of techniques have been developed to both enrich and generalize the naive grammar, ranging from simple tree annotation and category splitting (Johnson 1998; Klein and Manning 2003a) to full lexicalization and intricate smoothing (Collins 1999; Charniak 2000).
Bibliography:The material in this chapter was originally presented in Petrov et al. (2006) and Petrov and Klein (2007).
ISBN:3642227422
9783642227424
ISSN:2192-032X
2192-0338
DOI:10.1007/978-3-642-22743-1_2