Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili
The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the...
Saved in:
Published in | Language Resources and Evaluation Vol. 45; no. 3; pp. 311 - 330 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Dordrecht
Springer
01.09.2011
Springer Netherlands Springer Nature B.V |
Subjects | |
Online Access | Get full text |
ISSN | 1574-020X 1572-8412 1574-0218 |
DOI | 10.1007/s10579-011-9155-y |
Cover
Abstract | The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people. |
---|---|
AbstractList | The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people. Adapted from the source document The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people. Issue Title: Special Issue on African Language Technology The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people.[PUBLICATION ABSTRACT] |
Author | Pouliquen, Bruno Kabadjov, Mijail Rocca, Leo Della Belyaeva, Jenya van der Goot, Erik Ombuya, Sylvia de Paola, Monica Steinberger, Ralf Ignat, Camelia |
Author_xml | – sequence: 1 givenname: Ralf surname: Steinberger fullname: Steinberger, Ralf – sequence: 2 givenname: Sylvia surname: Ombuya fullname: Ombuya, Sylvia – sequence: 3 givenname: Mijail surname: Kabadjov fullname: Kabadjov, Mijail – sequence: 4 givenname: Bruno surname: Pouliquen fullname: Pouliquen, Bruno – sequence: 5 givenname: Leo Della surname: Rocca fullname: Rocca, Leo Della – sequence: 6 givenname: Jenya surname: Belyaeva fullname: Belyaeva, Jenya – sequence: 7 givenname: Monica surname: de Paola fullname: de Paola, Monica – sequence: 8 givenname: Camelia surname: Ignat fullname: Ignat, Camelia – sequence: 9 givenname: Erik surname: van der Goot fullname: van der Goot, Erik |
BookMark | eNqFkU1PHSEUhkljk_rRH9BFk0k3rqZyGIYPd41Ra2Liwpq4IwwDlpsZuMJM9P57uXeamrjQDZzA-7wcznuA9kIMFqFvgH8CxvwkA265rDFALaFt680ntA8tJ7WgQPZ2Na0xwfdf0EHOK4wpoVzso_78ea1D78NDpatxHiY_lHrWQzXa3pejGPwU0-4-9JUPLqZRTz6Gyj5PSZtdOcU4lKVYBPtUDXrr8GBPq9sn_bcYHqHPTg_Zfv23H6K7i_M_Z7_r65vLq7Nf17WhGKa6bzjpGk7BEsY050ISpyVuTOuwlUbLDkvNjBMCa-ysY51h1nXM9l1hetkcouPFd53i42zzpEafjR1KQzbOWUmgjLSEwsfKMipJG0KK8scb5SrOKZRvKCGASUGAFRFfRCbFnJN1yvhpN6UyIj8owGqbklpSUiUltU1JbQoJb8h18qNOm3cZsjB5vQ3GpteW3oO-L9Aqlzz_v0KBCoZp27wA8wSyMg |
CODEN | COHUAD |
CitedBy_id | crossref_primary_10_1007_s10579_011_9165_9 crossref_primary_10_3390_app14104320 |
Cites_doi | 10.1162/089120103321337421 10.4314/lex.v19i1.49134 10.1017/S1351324902002930 10.3233/978-1-58603-898-4-295 10.3115/1072133.1072187 10.1109/MIS.2007.11 10.3233/978-1-58603-898-4-217 |
ContentType | Journal Article |
Copyright | 2011 Springer Springer Science+Business Media B.V. 2011 |
Copyright_xml | – notice: 2011 Springer – notice: Springer Science+Business Media B.V. 2011 |
DBID | AAYXX CITATION 3V. 7SC 7T9 7XB 8AL 8FD 8FE 8FG 8FK 8G5 ABUWG AFKRA AIMQZ ALSLI ARAPS AVQMV AZQEC BENPR BGLVJ CCPQU CPGLG CRLPW DWQXO GB0 GNUQQ GUQSH HCIFZ JQ2 K50 K7- L7M LIQON L~C L~D M0N M1D M2O MBDVC P5Z P62 PEJEM PHGZM PHGZT PKEHL PMKZF PQEST PQGLB PQQKQ PQUKI PRINS PRQQA Q9U |
DOI | 10.1007/s10579-011-9155-y |
DatabaseName | CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts Linguistics and Language Behavior Abstracts (LLBA) ProQuest Central (purchase pre-March 2016) Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Research Library ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest One Literature Social Science Premium Collection Advanced Technologies & Aerospace Collection Arts Premium Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College Linguistics Collection Linguistics Database ProQuest Central Korea DELNET Social Sciences & Humanities Collection ProQuest Central Student ProQuest Research Library SciTech Premium Collection (UHCL Subscription) ProQuest Computer Science Collection Art, Design & Architecture (OCUL) Computer Science Database Advanced Technologies Database with Aerospace ProQuest One Literature - U.S. Customers Only Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database Arts & Humanities Database Research Library Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest One Visual Arts & Design ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest Digital Collections ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest One Social Sciences ProQuest Central Basic |
DatabaseTitle | CrossRef ProQuest DELNET Social Sciences and Humanities Collection Research Library Prep Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College Research Library (Alumni Edition) ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Linguistics Collection Arts Premium Collection ProQuest Central Korea ProQuest Research Library ProQuest Central (New) ProQuest Art, Design and Architecture Collection Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Collection Social Science Premium Collection ProQuest Computing ProQuest One Literature - U.S. Customers Only ProQuest One Social Sciences ProQuest Central Basic ProQuest One Literature ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition Linguistics and Language Behavior Abstracts (LLBA) ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional ProQuest Digital Collections Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Linguistics Database ProQuest One Visual Arts & Design Arts & Humanities Full Text ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) |
DatabaseTitleList | Linguistics and Language Behavior Abstracts (LLBA) Computer and Information Systems Abstracts ProQuest DELNET Social Sciences and Humanities Collection |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Library & Information Science Computer Science |
EISSN | 1572-8412 1574-0218 |
EndPage | 330 |
ExternalDocumentID | 2420210541 10_1007_s10579_011_9155_y 41486045 |
GeographicLocations | Italy Netherlands United States--US Europe |
GeographicLocations_xml | – name: Netherlands – name: United States--US – name: Europe – name: Italy |
GroupedDBID | -DZ .4H .4S .86 .DC 06D 0R~ 0VY 199 203 29L 2J2 2JN 2JY 2KG 2LR 2VQ 2~H 30V 4.4 406 408 409 40E 5GY 5VS 67Z 6NX 78A 8FE 8FG 8G5 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAGAY AAHCP AAHNG AAIAL AAJBT AAJKR AANZL AAPKM AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBHK ABBRH ABBXA ABDBE ABDZT ABECU ABECW ABFSG ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABLJU ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ABXSQ ACAOD ACBXY ACDTI ACGFO ACGFS ACHSB ACHXU ACKNC ACMDZ ACMFV ACMLO ACNXV ACOKC ACOMO ACPIV ACREN ACSTC ACZOJ ADHIR ADHKG ADKNI ADKPE ADPTO ADRFC ADTPH ADULT ADURQ ADYFF ADYOE ADZKW AEBTG AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEUPB AEVLU AEXYK AEZWR AFBBN AFDZB AFFNX AFGCZ AFHIU AFKRA AFLOW AFQWF AFWTZ AFYQB AFZKB AGAYW AGDGC AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHBYD AHEXP AHPBZ AHSBF AHWEU AHYZX AIAKS AIGIU AIIXL AILAN AIMQZ AITGF AIXLP AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALSLI ALWAN AMKLP AMTXH AMXSW AMYLF AOCGG ARAPS ARCSS ARMRJ ATHPR AVQMV AXYYD AYFIA AYQZM AZFZN AZQEC B-. BA0 BDATZ BENPR BGLVJ BGNMA BPHCQ BSONS CAG CCPQU COF CPGLG CRLPW CS3 CSCUP DDRTE DL5 DNIVK DPUIP DWQXO EBLON EBS EDO EHI EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GB0 GGCAI GGRSB GJIRD GNUQQ GNWQR GPZZG GQ7 GQ8 GUQSH GXS HCIFZ HF~ HG5 HG6 HLICF HMHOC HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IJ- IKXTQ IPSME ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JAAYA JAB JBMMH JBSCW JCJTX JENOY JHFFW JKQEH JLEZI JLXEF JPL JST JZLTJ K50 K6V K7- KDC KOV LIQON LLZTM M1D M2O M4Y MA- MQGED N2Q NB0 NF0 NPVJJ NQJWS NU0 O9- O93 O9G O9I OAM P-O P19 P62 P9Q PF- PHGZM PHGZT PMKZF PQQKQ PROAC PT4 Q2X QN3 QN7 QOS R89 R9I RHV RIG ROL RPX RSV S16 S27 S3B SA0 SAP SDA SDH SDM SHS SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 TN5 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 ZMTXR ~EX -51 -5C -5G -BR -EM -Y2 -~C 07C 2.D 2P1 3EH 3V. AANTL AAYOK ADINQ AFEXP AHAVH AHKAY AZRUE BHNFS GQ6 H13 IHE JSODD M0N NDZJH O9J QF4 S1Z S26 S28 SCLPG T16 VQA VXZ Z7X Z83 Z88 Z8R Z8W Z92 ZWUKE AAWJA AAYXX ABRTQ AFOHR AGQPQ CITATION PEJEM PQGLB PRQQA PUEGO 7SC 7T9 7XB 8AL 8FD 8FK JQ2 L7M L~C L~D MBDVC PKEHL PQEST PQUKI PRINS Q9U |
ID | FETCH-LOGICAL-c401t-d372b3741e266a77892fa903c5f0e9ca9b09a6cf880a0fef6bc6efb6edb41ed93 |
IEDL.DBID | AGYKE |
ISSN | 1574-020X |
IngestDate | Fri Sep 05 07:48:20 EDT 2025 Fri Sep 05 07:41:28 EDT 2025 Sat Aug 23 13:21:40 EDT 2025 Sun Sep 21 06:06:46 EDT 2025 Thu Apr 24 23:01:39 EDT 2025 Fri Feb 21 02:30:23 EST 2025 Thu Jun 19 15:11:21 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Keywords | Date recognition Subject domain classification Geo-tagging Quotation recognition News analysis Swahili Information extraction Multilinguality Named entity recognition and classification Media monitoring |
Language | English |
License | http://www.springer.com/tdm |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c401t-d372b3741e266a77892fa903c5f0e9ca9b09a6cf880a0fef6bc6efb6edb41ed93 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 ObjectType-Article-2 ObjectType-Feature-1 |
PQID | 881698216 |
PQPubID | 28740 |
PageCount | 20 |
ParticipantIDs | proquest_miscellaneous_914625241 proquest_miscellaneous_902094322 proquest_journals_881698216 crossref_citationtrail_10_1007_s10579_011_9155_y crossref_primary_10_1007_s10579_011_9155_y springer_journals_10_1007_s10579_011_9155_y jstor_primary_41486045 |
PublicationCentury | 2000 |
PublicationDate | 2011-09-01 |
PublicationDateYYYYMMDD | 2011-09-01 |
PublicationDate_xml | – month: 09 year: 2011 text: 2011-09-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Dordrecht |
PublicationPlace_xml | – name: Dordrecht – name: Dordrect |
PublicationTitle | Language Resources and Evaluation |
PublicationTitleAbbrev | Lang Resources & Evaluation |
PublicationYear | 2011 |
Publisher | Springer Springer Netherlands Springer Nature B.V |
Publisher_xml | – name: Springer – name: Springer Netherlands – name: Springer Nature B.V |
References | De PauwGde SchryverG-MWagachaPWA corpus-based survey of four electronic Swahili–English bilingual dictionariesLexikos200919340352 Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., et al. (2006). Geocoding multilingual texts: Recognition, disambiguation and visualisation. In Proceedings of LREC’2006, (pp. 53–58). Genoa, Italy, 24–26 May 2006. Sproat, R., Roth, D., Zhai, C., Benmamoun, E., Fister, A., Karlinsky, N., et al. (2005). Named entity recognition and transliteration for 50 languages. Keynote address at the second midwest computational linguistics colloquium, 14–15 May 2010, The Ohio State University. Landauer, T., & Littman, M. (1991). A statistical method for language-independent representation of the topical content of text segments. In 11th International conference expert systems and their applications (Vol. 8, pp. 77–85), Avignon, France. De Pauw, G., Wagacha, P., & de Schryver, G.-M. (2011). Exploring the SAWA corpus—Collection and deployment of a parallel corpus English—Swahili. Language Resources and Evaluation Journal. Special Issue on African Language Technology, Springer. Wactlar, H. (1999). New directions in video information extraction and summarization. In Proceedings of the 10th DELOS workshop (pp. 1–10). Sanorini, Greece. Ng’ang’a, W. (2005). Word sense disambiguation of Swahili: Extending Swahili language technology with machine learning. Ph.D. thesis, Helsinki University. Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., & Yangarber, R. (2008b). Text mining from the web for medical intelligence. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 295–310). Amsterdam, The Netherlands: IOS Press. Pastra, K., Maynard, D., Hamza, O., Cunningham, H., & Wilks, Y. (2002). How feasible is the reuse of grammars for Named Entity Recognition? In Proceedings of LREC (pp. 412–1418). Las Palmas, Spain. Steinberger, R., Pouliquen, B., & van der Goot, E. (2009). An Introduction to the Europe media monitor family of applications. In F. Gey, N. Kando, & J. Karlgren (Eds.), Information access in a multilingual world. Proceedings of SIGIR-CLIR (pp. 1–8). Boston, USA. 23 July 2009. Shah, R., Lin, B., Gershman, A., & Frederking, R. (2010). SYNERGY: A named entity recognition system for resource-scarce languages such as Swahili using online machine translation. In Proceedings of the second workshop on African language technology (AfLAT), Malta, 9 July 2010. CareniniM.WhyteA.BertorelloL.VanocchiM.Improving communication in E-democracy using natural language processingIn IEEE Intelligent Systems20072212027 Wentland, W., Knopp, J., Silberer, C., Hartung, M. (2008). Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In Proceedings of LREC (pp. 3230–3237). Genoa, Italy. De PauwGde SchryverG-MImproving the computational morphological analysis of a Swahili corpus for lexicographic purposesLexikos200818303318 Steinberger, R., Pouliquen, B., & Ignat, C. (2008a). Using language-independent rules to achieve high multilinguality in text mining. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 217–240). Amsterdam, The Netherlands: IOS Press. VinokourovAShawe-TaylorJCristianiniNInferring a semantic representation of text via cross-language correlation analysisAdvances of Neural Information Processing Systems20021514731480 Ignat, C., Pouliquen, B., Ribeiro, A., & Steinberger, R. (2003). Extending an information extraction tool set to central and eastern European languages. In Proceedings of the workshop information extraction for slavonic and other central and eastern European languages (IESL’2003) (pp. 33–39). Borovets, Bulgaria, 8–9 Sep 2003. Pouliquen, B., Steinberger, R., & Best, C. (2007). Automatic detection of quotations in multilingual news. In Proceedings of the international conference recent advances in natural language processing (RANLP’2007) (pp. 487–492). Borovets, Bulgaria, 27–29.09.2007. Bering, C., Drożdżyński, W., Erbach, G., Guasch, L., Homola, P., Lehmann, S., et al. (2003). Corpora and evaluation tools for multilingual named entity grammar development. In Proceedings of the multilingual corpora workshop at corpus linguistics (pp. 42–52). Lancaster, UK. De Pauw, G., de Schryver, G.-M., & Wagacha, P. W. (2006). Data-driven part-of-speech tagging of Kiswahili. In Text, speech and dialogue (Vol. 4188, pp. 197–204). Berlin: Springer. MaynardD.TablanV.CunninghamH.UrsuC.SaggionH.BontchevaK.WilksY.Architectural elements of language engineering robustnessNatural Language Engineering200283257274 Gamon, M., Lozano, C., Pinkham, J., & Reutter, T. (1997). Practical experience with grammar sharing in multilingual NLP. In Proceedings of ACL/EACL, Madrid, Spain, pp. 49–56. Steinberger, R. (2011). A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation Journal, Special issue on LREC’2010. Manny, R., & Bouillon, P. (1996). Adapting the core language engine to French and Spanish. In Proceedings of the international conference NLP+IA,( pp. 224–232). Mouncton, Canada. Pouliquen, B., & Steinberger, R. (2009). Automatic construction of multilingual name dictionaries. In C. Goutte, N. Cancedda, M. Dymetman & G. Foster (Eds.), Learning machine translation (pp. 59–78). Cambridge: MIT Press—Advances in Neural Information Processing Systems Series (NIPS). Yarowski, D., Ngai, G., & Wicentowski, R. (2001). Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st international conference on Human Language Technology research (HLT) (pp. 1–8). Stroudsburg, PA, USA. Leek, T., Jin, H., Sista, S., & Schwartz, R. (1999). The BBN crosslingual topic detection and tracking system. In 1999 TDT evaluation system summary papers (pp. 214–221). Vienna, VA, USA. OchF.NeyH.A systematic comparison of various statistical alignment modelsComputational Linguistics20032911951 9155_CR27 9155_CR26 9155_CR24 G De Pauw (9155_CR3) 2008; 18 9155_CR28 9155_CR6 9155_CR23 9155_CR7 9155_CR22 A Vinokourov (9155_CR25) 2002; 15 9155_CR8 9155_CR21 9155_CR9 9155_CR20 9155_CR2 9155_CR4 G De Pauw (9155_CR5) 2009; 19 9155_CR1 9155_CR16 9155_CR15 9155_CR14 9155_CR13 9155_CR19 9155_CR18 9155_CR17 9155_CR12 9155_CR11 9155_CR10 |
References_xml | – reference: De PauwGde SchryverG-MImproving the computational morphological analysis of a Swahili corpus for lexicographic purposesLexikos200818303318 – reference: Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., et al. (2006). Geocoding multilingual texts: Recognition, disambiguation and visualisation. In Proceedings of LREC’2006, (pp. 53–58). Genoa, Italy, 24–26 May 2006. – reference: MaynardD.TablanV.CunninghamH.UrsuC.SaggionH.BontchevaK.WilksY.Architectural elements of language engineering robustnessNatural Language Engineering200283257274 – reference: Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., & Yangarber, R. (2008b). Text mining from the web for medical intelligence. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 295–310). Amsterdam, The Netherlands: IOS Press. – reference: De Pauw, G., Wagacha, P., & de Schryver, G.-M. (2011). Exploring the SAWA corpus—Collection and deployment of a parallel corpus English—Swahili. Language Resources and Evaluation Journal. Special Issue on African Language Technology, Springer. – reference: Pouliquen, B., & Steinberger, R. (2009). Automatic construction of multilingual name dictionaries. In C. Goutte, N. Cancedda, M. Dymetman & G. Foster (Eds.), Learning machine translation (pp. 59–78). Cambridge: MIT Press—Advances in Neural Information Processing Systems Series (NIPS). – reference: Manny, R., & Bouillon, P. (1996). Adapting the core language engine to French and Spanish. In Proceedings of the international conference NLP+IA,( pp. 224–232). Mouncton, Canada. – reference: Ignat, C., Pouliquen, B., Ribeiro, A., & Steinberger, R. (2003). Extending an information extraction tool set to central and eastern European languages. In Proceedings of the workshop information extraction for slavonic and other central and eastern European languages (IESL’2003) (pp. 33–39). Borovets, Bulgaria, 8–9 Sep 2003. – reference: Leek, T., Jin, H., Sista, S., & Schwartz, R. (1999). The BBN crosslingual topic detection and tracking system. In 1999 TDT evaluation system summary papers (pp. 214–221). Vienna, VA, USA. – reference: CareniniM.WhyteA.BertorelloL.VanocchiM.Improving communication in E-democracy using natural language processingIn IEEE Intelligent Systems20072212027 – reference: De Pauw, G., de Schryver, G.-M., & Wagacha, P. W. (2006). Data-driven part-of-speech tagging of Kiswahili. In Text, speech and dialogue (Vol. 4188, pp. 197–204). Berlin: Springer. – reference: OchF.NeyH.A systematic comparison of various statistical alignment modelsComputational Linguistics20032911951 – reference: Gamon, M., Lozano, C., Pinkham, J., & Reutter, T. (1997). Practical experience with grammar sharing in multilingual NLP. In Proceedings of ACL/EACL, Madrid, Spain, pp. 49–56. – reference: Steinberger, R., Pouliquen, B., & Ignat, C. (2008a). Using language-independent rules to achieve high multilinguality in text mining. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 217–240). Amsterdam, The Netherlands: IOS Press. – reference: Sproat, R., Roth, D., Zhai, C., Benmamoun, E., Fister, A., Karlinsky, N., et al. (2005). Named entity recognition and transliteration for 50 languages. Keynote address at the second midwest computational linguistics colloquium, 14–15 May 2010, The Ohio State University. – reference: Pouliquen, B., Steinberger, R., & Best, C. (2007). Automatic detection of quotations in multilingual news. In Proceedings of the international conference recent advances in natural language processing (RANLP’2007) (pp. 487–492). Borovets, Bulgaria, 27–29.09.2007. – reference: Yarowski, D., Ngai, G., & Wicentowski, R. (2001). Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st international conference on Human Language Technology research (HLT) (pp. 1–8). Stroudsburg, PA, USA. – reference: Wactlar, H. (1999). New directions in video information extraction and summarization. In Proceedings of the 10th DELOS workshop (pp. 1–10). Sanorini, Greece. – reference: Ng’ang’a, W. (2005). Word sense disambiguation of Swahili: Extending Swahili language technology with machine learning. Ph.D. thesis, Helsinki University. – reference: Wentland, W., Knopp, J., Silberer, C., Hartung, M. (2008). Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In Proceedings of LREC (pp. 3230–3237). Genoa, Italy. – reference: Landauer, T., & Littman, M. (1991). A statistical method for language-independent representation of the topical content of text segments. In 11th International conference expert systems and their applications (Vol. 8, pp. 77–85), Avignon, France. – reference: VinokourovAShawe-TaylorJCristianiniNInferring a semantic representation of text via cross-language correlation analysisAdvances of Neural Information Processing Systems20021514731480 – reference: Shah, R., Lin, B., Gershman, A., & Frederking, R. (2010). SYNERGY: A named entity recognition system for resource-scarce languages such as Swahili using online machine translation. In Proceedings of the second workshop on African language technology (AfLAT), Malta, 9 July 2010. – reference: Steinberger, R. (2011). A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation Journal, Special issue on LREC’2010. – reference: Bering, C., Drożdżyński, W., Erbach, G., Guasch, L., Homola, P., Lehmann, S., et al. (2003). Corpora and evaluation tools for multilingual named entity grammar development. In Proceedings of the multilingual corpora workshop at corpus linguistics (pp. 42–52). Lancaster, UK. – reference: De PauwGde SchryverG-MWagachaPWA corpus-based survey of four electronic Swahili–English bilingual dictionariesLexikos200919340352 – reference: Pastra, K., Maynard, D., Hamza, O., Cunningham, H., & Wilks, Y. (2002). How feasible is the reuse of grammars for Named Entity Recognition? In Proceedings of LREC (pp. 412–1418). Las Palmas, Spain. – reference: Steinberger, R., Pouliquen, B., & van der Goot, E. (2009). An Introduction to the Europe media monitor family of applications. In F. Gey, N. Kando, & J. Karlgren (Eds.), Information access in a multilingual world. Proceedings of SIGIR-CLIR (pp. 1–8). Boston, USA. 23 July 2009. – ident: 9155_CR7 – ident: 9155_CR14 doi: 10.1162/089120103321337421 – ident: 9155_CR26 – ident: 9155_CR21 – ident: 9155_CR24 – ident: 9155_CR16 – volume: 19 start-page: 340 year: 2009 ident: 9155_CR5 publication-title: Lexikos doi: 10.4314/lex.v19i1.49134 – ident: 9155_CR9 – ident: 9155_CR18 – ident: 9155_CR1 – ident: 9155_CR10 – volume: 15 start-page: 1473 year: 2002 ident: 9155_CR25 publication-title: Advances of Neural Information Processing Systems – ident: 9155_CR4 – ident: 9155_CR6 – ident: 9155_CR12 doi: 10.1017/S1351324902002930 – ident: 9155_CR22 doi: 10.3233/978-1-58603-898-4-295 – volume: 18 start-page: 303 year: 2008 ident: 9155_CR3 publication-title: Lexikos – ident: 9155_CR27 – ident: 9155_CR28 doi: 10.3115/1072133.1072187 – ident: 9155_CR20 – ident: 9155_CR19 – ident: 9155_CR2 doi: 10.1109/MIS.2007.11 – ident: 9155_CR15 – ident: 9155_CR17 – ident: 9155_CR8 – ident: 9155_CR13 – ident: 9155_CR23 doi: 10.3233/978-1-58603-898-4-217 – ident: 9155_CR11 |
SSID | ssj0042478 ssj0002228 |
Score | 1.9001087 |
Snippet | The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and... Issue Title: Special Issue on African Language Technology The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather,... |
SourceID | proquest crossref springer jstor |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 311 |
SubjectTerms | African languages Bantu languages Classification Clustering Computational Linguistics Computer Applications Computer Generated Language Analysis Computer Science Computer software Dictionaries Information management Information retrieval Language and Literature Languages Legal entities Linguistics Media Monitoring Multilingualism Names Nonnative languages Nouns Original Paper Recognition Reported Speech Social Sciences Speech Speech recognition Strapping Swahili Swahili language Text Analysis Text analytics Texts Verbs Words |
SummonAdditionalLinks | – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB5Be-ECtFARSisfKg4gi8SOHZsLQqhLhQQXqLS3yLEdqVJJtvsQ9N8zduwtRWIvufgRy_O0Pd8MwJlSpmJOSGpYuK2qDMqc4T3VeFZh2jrOfQAKf_0mLy7rL3MxT7E5qxRWmXViVNRutOGO_J1SldSKVfLD4oaGolHhcTVV0HgI-xVDRgpA8dnnrIhrVkdFXImmpugVzfOj5oScE00IFKpoSJBOb--ZpSky8Z7P-c8zabQ-s6fwOLmN5ONE5wN44IdDeJJLMpAkoYdwknAI5DVJQKOw8bn9Gbjz34sJyEIMicGEAY6-wakjhIT8jCK-jO2DI1d_zYFafDmhIMh6HK_xg1OgU07yled78v1XqIxw9RwuZ-c_Pl3QVGiBWjxeranjDes4-hYezbVpGqVZb3TJrehLr63RXamNtD3Kuil738vOSt930rsOxzjNj2BvGAf_AogyuuFCaa86W9sO-zvmrWykEUoaxQso8z63NmUhD8Uwrtu7_MmBNC2Spg2kaW8LeLMdsphScOzqfBSJt-1ZV6HMVi0KOM7UbJOYrtotUxVAtq0oX-HRxAx-3KxajZyja1R7O7qgtWECXaEC3mY2ufvHf5f6cueKjuHRdH0duPQV7K2XG3-C_s-6O41c_ge-6ALD priority: 102 providerName: ProQuest |
Title | Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili |
URI | https://www.jstor.org/stable/41486045 https://link.springer.com/article/10.1007/s10579-011-9155-y https://www.proquest.com/docview/881698216 https://www.proquest.com/docview/902094322 https://www.proquest.com/docview/914625241 |
Volume | 45 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
journalDatabaseRights | – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 1572-8412 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0042478 issn: 1574-020X databaseCode: AFBBN dateStart: 19970101 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVPQU databaseName: Arts & Humanities Database customDbUrl: eissn: 1572-8412 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0042478 issn: 1574-020X databaseCode: M1D dateStart: 20050201 isFulltext: true titleUrlDefault: https://search.proquest.com/artshumanities providerName: ProQuest – providerCode: PRVPQU databaseName: Linguistics Database customDbUrl: eissn: 1572-8412 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0042478 issn: 1574-020X databaseCode: CRLPW dateStart: 20050201 isFulltext: true titleUrlDefault: https://search.proquest.com/linguistics providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1572-8412 dateEnd: 20241001 omitProxy: true ssIdentifier: ssj0042478 issn: 1574-020X databaseCode: BENPR dateStart: 20050201 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 1572-8412 dateEnd: 20241001 omitProxy: true ssIdentifier: ssj0042478 issn: 1574-020X databaseCode: 8FG dateStart: 20050201 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 1572-8412 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0042478 issn: 1574-020X databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1572-8412 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0042478 issn: 1574-020X databaseCode: U2A dateStart: 20050201 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7BcumlPFrUlId8qHpoFZTYsWP3toVdEC0IUVbdniLHcSQEzaLdrFr66ztO4uWhFolLcvDEcZx52Z5vBuCdlDqmBRehpm63KtYoc5qVocK1ClWmYMw6oPDJqTgaJcdjPu5w3DMf7e6PJBtNfQ_sxlMX2xOHLqd5eLsMK9ytT3qw0j_88WXgFXBCk0YBxzxNQvSGxv4w81-dPDBHbUTiA1_z0fFoY3WGq3Dhx9sGm1ztzet8z_x5lMrxmR-0Bi87L5T0W7ZZhyVbbcCqr_BAOoHfgJ0O1kDekw635P6jb38FxeD3TYuLIZo0sYkO3T7HrhtECvnZaIxp014V5PJeH2gUpi2ogtSTyTVesAv08YnfQf1Evv1yhRYuX8NoOLjYPwq7ug2hwdVaHRYspTlDV8Wi9ddpKhUttYqY4WVkldEqj5QWpkTVoaPSliI3wpa5sEWOzxSKbUKvmlT2DRCpVcq4VFbmJjE50hfUGpEKzaXQkgUQ-d-XmS6puautcZ3dpWN2s5zhLGdulrPbAD4sHrlpM3o8RbzZ8MSCMold1a6EB7DlmSTrpH6WSRkLJWksAiCLVhRXdwajKzuZzzKFDKkS1KJPkKDxohw9qwA-esa5e8d_h_r2WdRb8KLdHXcMuA29ejq3O-he1fkuLMvh4W4nVHj_PDg9O8f7_vnXs-_YehIf4HVE-38BU-8jOA |
linkProvider | Springer Nature |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3NbtQwEB5V5QAXWgoVoS34ABxAKYmTODYSQoi22tJuL7TS3oLjOFJFSZbdrMryTrwKz8RMEm9pJfbWA5e92PFGzjffjD1_AM-l1CEvEuFrTrdVoUaZ01HpKzyrcGWKKLKUKDw8EYOz-NMoGa3AL5cLQ2GVjhNboi5qQ3fkb6QMhZI8FO_H331qGkXOVddBo0PFkZ1f4olt-u5wDz_vC84P9k8_Dvy-qYBv8CjR-EWU8jxCPWpRNek0lYqXWgWRScrAKqNVHigtTIm41kFpS5EbYctc2CLHZwqqvYSMfwetiJBK9Q_DPUf8MY9b4g-TNPbRChs5J2qXqZekFJgU-lSQ3Z9fU4NdJOQ1G_eGW7bVdgdr8NvtUxfk8nV31uS75ueNEpL_zUauw_3e7mYfOkF5ACu22oA119OC9RS3ATt9Igd7yfpMLUKuG38Ixf6PcZcJxDRrozEpn3-GS7c5OOxby5GTdrwq2Plfa6AanHRpJKyp6wv8wSXwVMPcnfFb9vmSWkucP4KzW9mLTVit6so-Bia1SqNEKitzE5sc5xfcGpEKnUihZeRB4ICTmb6MO3UTuciuClAT1jLEWkZYy-YevFo8Mu5qmCybvNmicTEzDqlPWZx4sOXwlPU8N80WYPKALUaRoMjrpCtbz6aZQlFQMeqNJVNQXfMEbUkPXjvcX_3HP1_1ydI3egZ3B6fD4-z48ORoC-51vgASwW1YbSYzu4PGZJM_bUWYwZfbhvof9o-E9w |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3NbtQwEB5VRUJcaClUhLbgA3AApU2cxLGREEJsVy2lFRJU2ltwHEeqKMmym1VZ3oxX4WmYSeItrcTeeuCyFzveyPnmx575ZgCeSqlDXiTC15xuq0KNMqej0ld4VuHKFFFkiSh8fCIOTuP3o2S0Ar8cF4bSKp1ObBV1URu6I9-TMhRK8lDslX1WxMfB8M34u08NpCjQ6rppdAg5svMLPL1NXx8O8FM_43y4__ndgd83GPANHisav4hSnkdoUy2aKZ2mUvFSqyAySRlYZbTKA6WFKRHjOihtKXIjbJkLW-T4TEF1mFD730pjEVDZ_uNw4IxAzOPWCIRJGvvokY1cQLVj7SUpJSmFPhVn9-dXTGKXFXnF370Wom0t33ANfrs96xJevu7OmnzX_LxWTvK_3NR1uNv74-xtJ0D3YMVWG7Dmel2wXvVtwE5P8GDPWc_gIkS78ftQ7P8YdwwhplmbpUk8_xku3XJz2LdWd07a8apgZ3-tgfsx6eglrKnrc_zBJfC0w9xd8iv26YJaTpw9gNMb2YtNWK3qyj4EJrVKo0QqK3MTmxznF9wakQqdSKFl5EHgQJSZvrw7dRk5zy4LUxPuMsRdRrjL5h68WDwy7mqbLJu82SJzMTMOqX9ZnHiw5bCV9fpvmi2A5QFbjKLiomiUrmw9m2YKxULFaE-WTEEzzhP0MT146WTg8j_--aqPlr7RE7iNIM8-HJ4cbcGdLkRA0rgNq81kZnfQx2zyx600M_hy00j_A_Gxjbw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Expanding+a+multilingual+media+monitoring+and+information+extraction+tool+to+a+new+language%3A+Swahili&rft.jtitle=Language+resources+and+evaluation&rft.au=Steinberger%2C+Ralf&rft.au=Ombuya%2C+Sylvia&rft.au=Kabadjov%2C+Mijail&rft.au=Pouliquen%2C+Bruno&rft.date=2011-09-01&rft.issn=1574-020X&rft.eissn=1574-0218&rft.volume=45&rft.issue=3&rft.spage=311&rft.epage=330&rft_id=info:doi/10.1007%2Fs10579-011-9155-y&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s10579_011_9155_y |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1574-020X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1574-020X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1574-020X&client=summon |