Syntax-Based Collocation Extraction

Collocation is a key language phenomenon which crucially impacts any text production task and which is exploitable in many text analysis tasks. This book offers a comprehensive and up-to-date review of the theoretical and practical work on this topic.

Saved in:

Bibliographic Details
Main Author	Seretan, Violeta
Format	eBook Book
Language	English
Published	Dordrecht Springer Nature 2011 Springer Springer Netherlands
Edition	1
Series	Text, Speech and Language Technology
Subjects	Collocation (Linguistics) Computational Linguistics Computer programming, programs, data Computer Science Grammar, Comparative and general Grammar, Comparative and general > Syntax Natural language processing (Computer science) Natural Language Processing (NLP)
Online Access	Get full text
ISBN	9400701349 9789400701342 9400701330 9789400701335
ISSN	1386-291X
DOI	10.1007/978-94-007-0134-2

Cover

Table of Contents:

Intro -- Preface -- Contents -- 1 Introduction -- 1.1 Collocations and Their Relevance for NLP -- 1.2 The Need for Syntax-Based Collocation Extraction -- 1.3 Aims -- 1.4 Chapters Outline -- 2 On Collocations -- 2.1 Introduction -- 2.2 A Survey of Definitions -- 2.2.1 Statistical Approaches -- 2.2.2 Linguistic Approaches -- 2.2.3 Collocation vs. Co-occurrence -- 2.3 Towards a Core Collocation Concept -- 2.4 Theoretical Perspectives on Collocations -- 2.4.1 Contextualism -- 2.4.2 Text Cohesion -- 2.4.3 Meaning-Text Theory -- 2.4.4 Semantics and Metaphoricity -- 2.4.5 Lexis-Grammar Interface -- 2.5 Linguistic Descriptions -- 2.5.1 Semantic Compositionality -- 2.5.2 Morpho-Syntactic Characterisation -- 2.6 What Collocation Means in This Book -- 2.7 Summary -- 3 Survey of Extraction Methods -- 3.1 Introduction -- 3.2 Extraction Techniques -- 3.2.1 Collocation Features Modelled -- 3.2.2 General Extraction Architecture -- 3.2.3 Contingency Tables -- 3.2.4 Association Measures -- 3.2.5 Criteria for the Application of Association Measures -- 3.3 Linguistic Preprocessing -- 3.3.1 Lemmatization -- 3.3.2 POS Tagging -- 3.3.3 Shallow and Deep Parsing -- 3.3.4 Beyond Parsing -- 3.4 Survey of the State of the Art -- 3.4.1 English -- 3.4.2 German -- 3.4.3 French -- 3.4.4 Other Languages -- 3.5 Summary -- 4 Syntax-Based Extraction -- 4.1 Introduction -- 4.2 The Fips Multilingual Parser -- 4.3 Extraction Method -- 4.3.1 Candidate Identification -- 4.3.2 Candidate Ranking -- 4.4 Evaluation -- 4.4.1 On Collocation Extraction Evaluation -- 4.4.2 Evaluation Method -- 4.4.3 Experiment 1: Monolingual Evaluation -- 4.4.4 Results of Experiment 1 -- 4.4.5 Experiment 2: Cross-Lingual Evaluation -- 4.4.6 Results of Experiment 2 -- 4.5 Qualitative Analysis -- 4.5.1 Error Analysis
4.5.2 Intersection and Rank Correlation -- 4.5.3 Instance-Level Analysis -- 4.6 Discussion -- 4.7 Summary -- 5 Extensions -- 5.1 Identification of Complex Collocations -- 5.1.1 The Method -- 5.1.2 Experimental Results -- 5.1.3 Related Work -- 5.2 Data-Driven Induction of Syntactic Patterns -- 5.2.1 The Method -- 5.2.2 Experimental Results -- 5.2.3 Related Work -- 5.3 Corpus-Based Collocation Translation -- 5.3.1 The Method -- 5.3.2 Experimental Results -- 5.3.3 Related Work -- 5.4 Summary -- 6 Conclusion -- 6.1 Main Contributions -- 6.2 Future Directions -- A List of Collocation Dictionaries -- English -- French -- Italian -- Polish -- Portugese -- Russian -- Spanish -- B List of Collocation Definitions -- C Association Measures -- Mathematical Notes -- C.1 X2 -- C.2 Log-Likelihood Ratio -- D Monolingual Evaluation (Experiment 1) -- D.1 Test Data and Annotations -- D.2 Results -- E Cross-Lingual Evaluation (Experiment 2) -- E.1 Test Data and Annotations -- E.2 Results -- F Output Comparison -- References -- Index