Computational Linguistics and Intelligent Text Processing : 8th International Conference, CICLing 2007, Mexico City, Mexico, February 18-24, 2007 : proceedings

This book constitutes the refereed proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2007, held in Mexico City, Mexico in February 2007. The 53 revised full papers presented together with 3 invited papers were carefully reviewed and...

Full description

Saved in:

Bibliographic Details
Main Author	Gelbukh, Alexander
Format	eBook Book
Language	English
Published	Berlin Springer 2007 Springer Berlin / Heidelberg
Edition	1
Subjects	Computational linguistics Computational linguistics-Congresses Congresses Natural language processing (Computer science) Text processing (Computer science)-Congresses
Online Access	Get full text
ISBN	9783540709381 354070938X

Cover

Table of Contents:

Part-of-Speech Tagging Using Word Probability Based on Category Patterns -- Introduction -- N-Gram POS-Tagging Models and Korean Language Characteristics -- Word N-Gram Is Not of Practical Use for Korean -- Morphotactic Constraints Within a Korean Word and Previous Alternatives -- Korean POS-Tagging Using Word Probability Based on Category Patterns -- Application of Category-Pattern-Based Model to Bayesian Models for POS-Tagging -- Parameter Training and POS Assigning -- Experimentation and Application -- Conclusions and Further Work -- References -- Handling Conjunctions in Named Entities -- Introduction -- Problem Description -- Related Work -- Experimental Setup -- Corpus and Data Preparation -- The Tag Set -- Encoding -- The Algorithms -- Baseline -- Classifiers -- Results -- Evaluation Scheme -- Classification Results -- Analysis -- Conjunction Category Indicators -- Error Analysis -- Conclusions and Future Work -- ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy -- Introduction -- Named Entity Recognition in Arabic -- The Maximum Entropy Approach -- The Developed Resources -- ANERcorp^13: Two Corpora for Training and Test -- ANERgazet^15: Integrating Web-Based Gazetteers -- Experiments and Results -- Conclusions and Future Works -- References -- Applying Machine Learning to Chinese Entity Detection and Tracking -- Introduction -- Related Work -- Mention Boundary Detection -- Character-Based Model Combined with Word Knowledge -- Boundary Detection with Conditional Random Fields -- Character-Based N-Gram Features and Wordlist-Based Features -- Head and Extent Combination -- Entity Attribute Identification -- Coreference Resolution -- Experimental Results -- Mention Boundary Detection Results -- Entity Attribute Identification Results -- Entity Tracking Results -- Conclusion -- References
Text Categorization for Improved Priors of Word Meaning -- Introduction -- Finding Predominant Senses -- Creating the Domain Corpora -- The GigaWord Corpus -- The Classifier -- The Domain Corpora -- Domain Rankings -- Experiments and Evaluation -- Hand-Labelled Versus Automatically Classified -- Senseval -- Domain Salient Words -- Discussion and Future Research -- Case-Sensitivity of Classifiers for WSD: Complex Systems Disambiguate Tough Words Better -- Introduction -- Prediction Factors -- Case Factors -- System Factors -- Optimal Ensembling Method -- Evaluation -- Test Setting -- Base System Complexity vs Tough / Easy Words -- Base System Complexity vs Best Optimal Ensembles -- Discussion -- Conclusions and Future Work -- References -- Word Clustering for Collocation-Based Word Sense Disambiguation -- Introduction -- Related Work -- The Yarowsky Algorithm -- Word Clustering -- Extending the Collocation List -- Experiment -- Data Set -- Experimental Setup -- Experiment Results -- Discussion of Results -- Relationship Between F-Measure with Word-Class and Corpus -- Error Analysis -- Conclusion and the Future Work -- References -- Lexical Constellations and the Structure of Meaning: A Prototype Application to WSD -- Introduction -- Meaning by Constellation and WSD -- DFA and WSD -- The Algorithm in Action -- Some Considerations and Conclusions -- References -- Rule-Based Protein Term Identification with Help from Automatic Species Tagging -- Introduction -- Related Work -- Data and Ontology -- Hybrid Approachs to TI -- Assigning Potential CM Identifiers to Protein Mentions -- Term Disambiguation -- Results -- Conclusions -- Unsupervised Discrimination of Person Names in Web Contexts -- Introduction -- Lexical Features -- Second Order Context Representation -- Cluster Stopping -- Experimental Data -- Evaluation -- Experimental Results
Discussion and Conclusions
Intro -- Title -- Preface -- Organization -- Table of Contents -- Integration of Linguistic Resources for Verb Classification: FrameNet Frame, WordNet Verb and Suggested Upper Merged Ontology -- Introduction -- Background and Motivation -- Extension of FrameNet Verb Coverage -- Direct Retrieval - WordNet Synset -- WordNet Relation Links and Frame as Domain -- Affinity of Candidate Synsets with Domain Frame -- Linking FrameNet Frame with SUMO Concept -- Data Evaluation -- Evaluation Result -- Conclusion -- References -- French EuroWordNet Lexical Database Improvements -- Introduction -- EuroWordNet: Presentation and Limits -- Improvement Made to the Relationships -- An Usable Database -- Update of the Semantic Relationships -- Inserting Definitions into EuroWordNet Thesaurus -- Wikipedia -- Definition Extraction in French Language -- General Process -- Results Analysis -- Conclusion -- Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language -- Introduction -- The Method -- Translating the English OpenMind Using a Commercial MT Software -- Translating the English ConceptNet with Heuristic Translation Rules -- Combining Two Translation Results -- Manual Evaluations of E-K Translated Concepts -- Concluding Remarks and Future Work -- References -- Conquering Language: Using NLP on a Massive Scale to Build High Dimensional Language Models from the Web -- Introduction -- Estimating Language Presence -- Extracting Language Modeling Information -- Fetching Web Pages and Character Encoding -- Extracting Text from Web Pages -- Language Identification and Web Crawling -- Natural Language Processing -- Processing Output and Example -- Possible Applications -- Related Research and Conclusion -- References -- On Heads and Coordination in Valence Acquisition -- Introduction -- Motivation and Outline
The IPI PAN Corpus -- Poliqarp -- Distinguishing Syntactic and Semantic Heads -- Coordination -- XML Representation -- Extending the Poliqarp Query Language -- Simple Constructions -- Coordination -- Conclusion -- Chinese Terminology Extraction Using Window-Based Contextual Information -- Introduction -- Related Work -- Algorithm Design -- The Preprocessing Module -- Automatic Term Extraction -- Terminology Verification -- Experiment and Discussion -- Performance of the Two Approaches -- The Hybrid Approach -- Conclusion -- References -- Baby-Steps Towards Building a Spanglish Language Model -- What Is Spanglish? -- Linguistic Features of Spanglish -- Code-Switching -- Borrowing -- Code-Mixing -- Examples of Shallow Phenomena -- Language Models -- Data Collection -- Tools of the Trade -- Test Phase and Results -- SML Test -- UTI Test -- Final Remarks -- Current and Future Work -- Latent Variable Models for Causal Knowledge Acquisition -- Introduction -- Related Work -- Statistical Models for Causal Knowledge Acquisition -- Model Structures -- Model Estimation -- Causality Detection -- Experiments -- Settings -- Results(1): The Effectiveness of Incorporating Dependencies Between Two Events into Causal Models -- Results(2): The Effectiveness of Class Labels -- Examples -- Conclusion -- Finite-State Technology as a Programming Environment -- Introduction -- A Motivating Example -- An Alternative Implementation -- Comparison and Evaluation -- Discussion -- Morphological Disambiguation of Turkish Text with Perceptron Algorithm -- Introduction -- Morphological Disambiguation -- Representation -- Problem Definition -- Methodology -- Baseline Trigram-Based Model -- Perceptron Algorithm -- Experiments -- Data Set -- Features -- Optimal Parameter and Feature Selection -- Results -- Conclusions
Evaluation of an Automatic Extension of Temporal Expression Treatment to Catalan -- Introduction -- History of TERSEO Extensions -- Extension to Catalan -- Evaluation of the New Extension -- Corpus Development -- Results -- Conclusions -- A Generalized Approach to Word Segmentation Using Maximum Length Descending Frequency and Entropy Rate -- Introduction -- Related Work -- Proposed Method -- A Walk-Through Example -- Evaluation and Experimental Results -- Conclusion and Future Work -- References -- Tagging Sentence Boundaries in Biomedical Literature -- Introduction -- Methods -- Special SBD Issues in Biomedical Literature -- Rule-Based Approach for Biomedical Literature SBD -- Results -- Discussion -- References -- Probabilistic Classifications with TBL -- Introduction -- Transformation Based Learning -- Probability Estimation with TBL Classifiers -- The Proposed Method -- Equivalence Class Partitioning -- Smoothing -- Related Work -- Experiments -- Cross Entropy and Perplexity -- Rejection Curve -- Active Learning -- Conclusions -- The Non-associativity of Polarized Tree-Based Grammars -- Introduction -- Existing Polarity Systems -- XMG Colors -- PUGs -- General Polarity Systems -- Conclusion -- Dependency Analysis of Clauses Using Parse Tree Kernels -- Introduction -- Problem Setting -- Clausal Dependency Identification -- Parse Tree Kernels -- Clause Dependency Analysis with Parse Tree Kernels -- Dependency Relation in Korean Clauses -- Clause Representation -- Multi-class Classification -- Experiments -- Conclusions -- Unsupervised Method for Parsing Coordinated Base Noun Phrases -- Introduction -- Motivation -- Previous Work -- Syntactic Parsing -- Web Counts -- Approach and Statistical Modeling -- The Models -- Experimental Setup and Results -- Web Counts -- Evaluation Measures -- Conclusion