Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili

The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the...

Full description

Saved in:
Bibliographic Details
Published inLanguage Resources and Evaluation Vol. 45; no. 3; pp. 311 - 330
Main Authors Steinberger, Ralf, Ombuya, Sylvia, Kabadjov, Mijail, Pouliquen, Bruno, Rocca, Leo Della, Belyaeva, Jenya, de Paola, Monica, Ignat, Camelia, van der Goot, Erik
Format Journal Article
LanguageEnglish
Published Dordrecht Springer 01.09.2011
Springer Netherlands
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1574-020X
1572-8412
1574-0218
DOI10.1007/s10579-011-9155-y

Cover

Abstract The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people.
AbstractList The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people. Adapted from the source document
The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people.
Issue Title: Special Issue on African Language Technology The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people.[PUBLICATION ABSTRACT]
Author Pouliquen, Bruno
Kabadjov, Mijail
Rocca, Leo Della
Belyaeva, Jenya
van der Goot, Erik
Ombuya, Sylvia
de Paola, Monica
Steinberger, Ralf
Ignat, Camelia
Author_xml – sequence: 1
  givenname: Ralf
  surname: Steinberger
  fullname: Steinberger, Ralf
– sequence: 2
  givenname: Sylvia
  surname: Ombuya
  fullname: Ombuya, Sylvia
– sequence: 3
  givenname: Mijail
  surname: Kabadjov
  fullname: Kabadjov, Mijail
– sequence: 4
  givenname: Bruno
  surname: Pouliquen
  fullname: Pouliquen, Bruno
– sequence: 5
  givenname: Leo Della
  surname: Rocca
  fullname: Rocca, Leo Della
– sequence: 6
  givenname: Jenya
  surname: Belyaeva
  fullname: Belyaeva, Jenya
– sequence: 7
  givenname: Monica
  surname: de Paola
  fullname: de Paola, Monica
– sequence: 8
  givenname: Camelia
  surname: Ignat
  fullname: Ignat, Camelia
– sequence: 9
  givenname: Erik
  surname: van der Goot
  fullname: van der Goot, Erik
BookMark eNqFkU1PHSEUhkljk_rRH9BFk0k3rqZyGIYPd41Ra2Liwpq4IwwDlpsZuMJM9P57uXeamrjQDZzA-7wcznuA9kIMFqFvgH8CxvwkA265rDFALaFt680ntA8tJ7WgQPZ2Na0xwfdf0EHOK4wpoVzso_78ea1D78NDpatxHiY_lHrWQzXa3pejGPwU0-4-9JUPLqZRTz6Gyj5PSZtdOcU4lKVYBPtUDXrr8GBPq9sn_bcYHqHPTg_Zfv23H6K7i_M_Z7_r65vLq7Nf17WhGKa6bzjpGk7BEsY050ISpyVuTOuwlUbLDkvNjBMCa-ysY51h1nXM9l1hetkcouPFd53i42zzpEafjR1KQzbOWUmgjLSEwsfKMipJG0KK8scb5SrOKZRvKCGASUGAFRFfRCbFnJN1yvhpN6UyIj8owGqbklpSUiUltU1JbQoJb8h18qNOm3cZsjB5vQ3GpteW3oO-L9Aqlzz_v0KBCoZp27wA8wSyMg
CODEN COHUAD
CitedBy_id crossref_primary_10_1007_s10579_011_9165_9
crossref_primary_10_3390_app14104320
Cites_doi 10.1162/089120103321337421
10.4314/lex.v19i1.49134
10.1017/S1351324902002930
10.3233/978-1-58603-898-4-295
10.3115/1072133.1072187
10.1109/MIS.2007.11
10.3233/978-1-58603-898-4-217
ContentType Journal Article
Copyright 2011 Springer
Springer Science+Business Media B.V. 2011
Copyright_xml – notice: 2011 Springer
– notice: Springer Science+Business Media B.V. 2011
DBID AAYXX
CITATION
3V.
7SC
7T9
7XB
8AL
8FD
8FE
8FG
8FK
8G5
ABUWG
AFKRA
AIMQZ
ALSLI
ARAPS
AVQMV
AZQEC
BENPR
BGLVJ
CCPQU
CPGLG
CRLPW
DWQXO
GB0
GNUQQ
GUQSH
HCIFZ
JQ2
K50
K7-
L7M
LIQON
L~C
L~D
M0N
M1D
M2O
MBDVC
P5Z
P62
PEJEM
PHGZM
PHGZT
PKEHL
PMKZF
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PRQQA
Q9U
DOI 10.1007/s10579-011-9155-y
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest Central (purchase pre-March 2016)
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Research Library
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest One Literature
Social Science Premium Collection
Advanced Technologies & Aerospace Collection
Arts Premium Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
Linguistics Collection
Linguistics Database
ProQuest Central Korea
DELNET Social Sciences & Humanities Collection
ProQuest Central Student
ProQuest Research Library
SciTech Premium Collection (UHCL Subscription)
ProQuest Computer Science Collection
Art, Design & Architecture (OCUL)
Computer Science Database
Advanced Technologies Database with Aerospace
ProQuest One Literature - U.S. Customers Only
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
Arts & Humanities Database
Research Library
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest One Visual Arts & Design
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest Digital Collections
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest One Social Sciences
ProQuest Central Basic
DatabaseTitle CrossRef
ProQuest DELNET Social Sciences and Humanities Collection
Research Library Prep
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
Research Library (Alumni Edition)
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
Linguistics Collection
Arts Premium Collection
ProQuest Central Korea
ProQuest Research Library
ProQuest Central (New)
ProQuest Art, Design and Architecture Collection
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Collection
Social Science Premium Collection
ProQuest Computing
ProQuest One Literature - U.S. Customers Only
ProQuest One Social Sciences
ProQuest Central Basic
ProQuest One Literature
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
ProQuest Digital Collections
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Linguistics Database
ProQuest One Visual Arts & Design
Arts & Humanities Full Text
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
DatabaseTitleList Linguistics and Language Behavior Abstracts (LLBA)


Computer and Information Systems Abstracts
ProQuest DELNET Social Sciences and Humanities Collection
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Library & Information Science
Computer Science
EISSN 1572-8412
1574-0218
EndPage 330
ExternalDocumentID 2420210541
10_1007_s10579_011_9155_y
41486045
GeographicLocations Italy
Netherlands
United States--US
Europe
GeographicLocations_xml – name: Netherlands
– name: United States--US
– name: Europe
– name: Italy
GroupedDBID -DZ
.4H
.4S
.86
.DC
06D
0R~
0VY
199
203
29L
2J2
2JN
2JY
2KG
2LR
2VQ
2~H
30V
4.4
406
408
409
40E
5GY
5VS
67Z
6NX
78A
8FE
8FG
8G5
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAGAY
AAHCP
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAPKM
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBHK
ABBRH
ABBXA
ABDBE
ABDZT
ABECU
ABECW
ABFSG
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABLJU
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ABXSQ
ACAOD
ACBXY
ACDTI
ACGFO
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMFV
ACMLO
ACNXV
ACOKC
ACOMO
ACPIV
ACREN
ACSTC
ACZOJ
ADHIR
ADHKG
ADKNI
ADKPE
ADPTO
ADRFC
ADTPH
ADULT
ADURQ
ADYFF
ADYOE
ADZKW
AEBTG
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEUPB
AEVLU
AEXYK
AEZWR
AFBBN
AFDZB
AFFNX
AFGCZ
AFHIU
AFKRA
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHBYD
AHEXP
AHPBZ
AHSBF
AHWEU
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AIMQZ
AITGF
AIXLP
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALSLI
ALWAN
AMKLP
AMTXH
AMXSW
AMYLF
AOCGG
ARAPS
ARCSS
ARMRJ
ATHPR
AVQMV
AXYYD
AYFIA
AYQZM
AZFZN
AZQEC
B-.
BA0
BDATZ
BENPR
BGLVJ
BGNMA
BPHCQ
BSONS
CAG
CCPQU
COF
CPGLG
CRLPW
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DWQXO
EBLON
EBS
EDO
EHI
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GB0
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GPZZG
GQ7
GQ8
GUQSH
GXS
HCIFZ
HF~
HG5
HG6
HLICF
HMHOC
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I-F
I09
IJ-
IKXTQ
IPSME
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JAAYA
JAB
JBMMH
JBSCW
JCJTX
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JPL
JST
JZLTJ
K50
K6V
K7-
KDC
KOV
LIQON
LLZTM
M1D
M2O
M4Y
MA-
MQGED
N2Q
NB0
NF0
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
OAM
P-O
P19
P62
P9Q
PF-
PHGZM
PHGZT
PMKZF
PQQKQ
PROAC
PT4
Q2X
QN3
QN7
QOS
R89
R9I
RHV
RIG
ROL
RPX
RSV
S16
S27
S3B
SA0
SAP
SDA
SDH
SDM
SHS
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
TN5
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
ZMTXR
~EX
-51
-5C
-5G
-BR
-EM
-Y2
-~C
07C
2.D
2P1
3EH
3V.
AANTL
AAYOK
ADINQ
AFEXP
AHAVH
AHKAY
AZRUE
BHNFS
GQ6
H13
IHE
JSODD
M0N
NDZJH
O9J
QF4
S1Z
S26
S28
SCLPG
T16
VQA
VXZ
Z7X
Z83
Z88
Z8R
Z8W
Z92
ZWUKE
AAWJA
AAYXX
ABRTQ
AFOHR
AGQPQ
CITATION
PEJEM
PQGLB
PRQQA
PUEGO
7SC
7T9
7XB
8AL
8FD
8FK
JQ2
L7M
L~C
L~D
MBDVC
PKEHL
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c401t-d372b3741e266a77892fa903c5f0e9ca9b09a6cf880a0fef6bc6efb6edb41ed93
IEDL.DBID AGYKE
ISSN 1574-020X
IngestDate Fri Sep 05 07:48:20 EDT 2025
Fri Sep 05 07:41:28 EDT 2025
Sat Aug 23 13:21:40 EDT 2025
Sun Sep 21 06:06:46 EDT 2025
Thu Apr 24 23:01:39 EDT 2025
Fri Feb 21 02:30:23 EST 2025
Thu Jun 19 15:11:21 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Date recognition
Subject domain classification
Geo-tagging
Quotation recognition
News analysis
Swahili
Information extraction
Multilinguality
Named entity recognition and classification
Media monitoring
Language English
License http://www.springer.com/tdm
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c401t-d372b3741e266a77892fa903c5f0e9ca9b09a6cf880a0fef6bc6efb6edb41ed93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
PQID 881698216
PQPubID 28740
PageCount 20
ParticipantIDs proquest_miscellaneous_914625241
proquest_miscellaneous_902094322
proquest_journals_881698216
crossref_citationtrail_10_1007_s10579_011_9155_y
crossref_primary_10_1007_s10579_011_9155_y
springer_journals_10_1007_s10579_011_9155_y
jstor_primary_41486045
PublicationCentury 2000
PublicationDate 2011-09-01
PublicationDateYYYYMMDD 2011-09-01
PublicationDate_xml – month: 09
  year: 2011
  text: 2011-09-01
  day: 01
PublicationDecade 2010
PublicationPlace Dordrecht
PublicationPlace_xml – name: Dordrecht
– name: Dordrect
PublicationTitle Language Resources and Evaluation
PublicationTitleAbbrev Lang Resources & Evaluation
PublicationYear 2011
Publisher Springer
Springer Netherlands
Springer Nature B.V
Publisher_xml – name: Springer
– name: Springer Netherlands
– name: Springer Nature B.V
References De PauwGde SchryverG-MWagachaPWA corpus-based survey of four electronic Swahili–English bilingual dictionariesLexikos200919340352
Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., et al. (2006). Geocoding multilingual texts: Recognition, disambiguation and visualisation. In Proceedings of LREC’2006, (pp. 53–58). Genoa, Italy, 24–26 May 2006.
Sproat, R., Roth, D., Zhai, C., Benmamoun, E., Fister, A., Karlinsky, N., et al. (2005). Named entity recognition and transliteration for 50 languages. Keynote address at the second midwest computational linguistics colloquium, 14–15 May 2010, The Ohio State University.
Landauer, T., & Littman, M. (1991). A statistical method for language-independent representation of the topical content of text segments. In 11th International conference expert systems and their applications (Vol. 8, pp. 77–85), Avignon, France.
De Pauw, G., Wagacha, P., & de Schryver, G.-M. (2011). Exploring the SAWA corpus—Collection and deployment of a parallel corpus English—Swahili. Language Resources and Evaluation Journal. Special Issue on African Language Technology, Springer.
Wactlar, H. (1999). New directions in video information extraction and summarization. In Proceedings of the 10th DELOS workshop (pp. 1–10). Sanorini, Greece.
Ng’ang’a, W. (2005). Word sense disambiguation of Swahili: Extending Swahili language technology with machine learning. Ph.D. thesis, Helsinki University.
Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., & Yangarber, R. (2008b). Text mining from the web for medical intelligence. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 295–310). Amsterdam, The Netherlands: IOS Press.
Pastra, K., Maynard, D., Hamza, O., Cunningham, H., & Wilks, Y. (2002). How feasible is the reuse of grammars for Named Entity Recognition? In Proceedings of LREC (pp. 412–1418). Las Palmas, Spain.
Steinberger, R., Pouliquen, B., & van der Goot, E. (2009). An Introduction to the Europe media monitor family of applications. In F. Gey, N. Kando, & J. Karlgren (Eds.), Information access in a multilingual world. Proceedings of SIGIR-CLIR (pp. 1–8). Boston, USA. 23 July 2009.
Shah, R., Lin, B., Gershman, A., & Frederking, R. (2010). SYNERGY: A named entity recognition system for resource-scarce languages such as Swahili using online machine translation. In Proceedings of the second workshop on African language technology (AfLAT), Malta, 9 July 2010.
CareniniM.WhyteA.BertorelloL.VanocchiM.Improving communication in E-democracy using natural language processingIn IEEE Intelligent Systems20072212027
Wentland, W., Knopp, J., Silberer, C., Hartung, M. (2008). Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In Proceedings of LREC (pp. 3230–3237). Genoa, Italy.
De PauwGde SchryverG-MImproving the computational morphological analysis of a Swahili corpus for lexicographic purposesLexikos200818303318
Steinberger, R., Pouliquen, B., & Ignat, C. (2008a). Using language-independent rules to achieve high multilinguality in text mining. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 217–240). Amsterdam, The Netherlands: IOS Press.
VinokourovAShawe-TaylorJCristianiniNInferring a semantic representation of text via cross-language correlation analysisAdvances of Neural Information Processing Systems20021514731480
Ignat, C., Pouliquen, B., Ribeiro, A., & Steinberger, R. (2003). Extending an information extraction tool set to central and eastern European languages. In Proceedings of the workshop information extraction for slavonic and other central and eastern European languages (IESL’2003) (pp. 33–39). Borovets, Bulgaria, 8–9 Sep 2003.
Pouliquen, B., Steinberger, R., & Best, C. (2007). Automatic detection of quotations in multilingual news. In Proceedings of the international conference recent advances in natural language processing (RANLP’2007) (pp. 487–492). Borovets, Bulgaria, 27–29.09.2007.
Bering, C., Drożdżyński, W., Erbach, G., Guasch, L., Homola, P., Lehmann, S., et al. (2003). Corpora and evaluation tools for multilingual named entity grammar development. In Proceedings of the multilingual corpora workshop at corpus linguistics (pp. 42–52). Lancaster, UK.
De Pauw, G., de Schryver, G.-M., & Wagacha, P. W. (2006). Data-driven part-of-speech tagging of Kiswahili. In Text, speech and dialogue (Vol. 4188, pp. 197–204). Berlin: Springer.
MaynardD.TablanV.CunninghamH.UrsuC.SaggionH.BontchevaK.WilksY.Architectural elements of language engineering robustnessNatural Language Engineering200283257274
Gamon, M., Lozano, C., Pinkham, J., & Reutter, T. (1997). Practical experience with grammar sharing in multilingual NLP. In Proceedings of ACL/EACL, Madrid, Spain, pp. 49–56.
Steinberger, R. (2011). A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation Journal, Special issue on LREC’2010.
Manny, R., & Bouillon, P. (1996). Adapting the core language engine to French and Spanish. In Proceedings of the international conference NLP+IA,( pp. 224–232). Mouncton, Canada.
Pouliquen, B., & Steinberger, R. (2009). Automatic construction of multilingual name dictionaries. In C. Goutte, N. Cancedda, M. Dymetman & G. Foster (Eds.), Learning machine translation (pp. 59–78). Cambridge: MIT Press—Advances in Neural Information Processing Systems Series (NIPS).
Yarowski, D., Ngai, G., & Wicentowski, R. (2001). Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st international conference on Human Language Technology research (HLT) (pp. 1–8). Stroudsburg, PA, USA.
Leek, T., Jin, H., Sista, S., & Schwartz, R. (1999). The BBN crosslingual topic detection and tracking system. In 1999 TDT evaluation system summary papers (pp. 214–221). Vienna, VA, USA.
OchF.NeyH.A systematic comparison of various statistical alignment modelsComputational Linguistics20032911951
9155_CR27
9155_CR26
9155_CR24
G De Pauw (9155_CR3) 2008; 18
9155_CR28
9155_CR6
9155_CR23
9155_CR7
9155_CR22
A Vinokourov (9155_CR25) 2002; 15
9155_CR8
9155_CR21
9155_CR9
9155_CR20
9155_CR2
9155_CR4
G De Pauw (9155_CR5) 2009; 19
9155_CR1
9155_CR16
9155_CR15
9155_CR14
9155_CR13
9155_CR19
9155_CR18
9155_CR17
9155_CR12
9155_CR11
9155_CR10
References_xml – reference: De PauwGde SchryverG-MImproving the computational morphological analysis of a Swahili corpus for lexicographic purposesLexikos200818303318
– reference: Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., et al. (2006). Geocoding multilingual texts: Recognition, disambiguation and visualisation. In Proceedings of LREC’2006, (pp. 53–58). Genoa, Italy, 24–26 May 2006.
– reference: MaynardD.TablanV.CunninghamH.UrsuC.SaggionH.BontchevaK.WilksY.Architectural elements of language engineering robustnessNatural Language Engineering200283257274
– reference: Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., & Yangarber, R. (2008b). Text mining from the web for medical intelligence. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 295–310). Amsterdam, The Netherlands: IOS Press.
– reference: De Pauw, G., Wagacha, P., & de Schryver, G.-M. (2011). Exploring the SAWA corpus—Collection and deployment of a parallel corpus English—Swahili. Language Resources and Evaluation Journal. Special Issue on African Language Technology, Springer.
– reference: Pouliquen, B., & Steinberger, R. (2009). Automatic construction of multilingual name dictionaries. In C. Goutte, N. Cancedda, M. Dymetman & G. Foster (Eds.), Learning machine translation (pp. 59–78). Cambridge: MIT Press—Advances in Neural Information Processing Systems Series (NIPS).
– reference: Manny, R., & Bouillon, P. (1996). Adapting the core language engine to French and Spanish. In Proceedings of the international conference NLP+IA,( pp. 224–232). Mouncton, Canada.
– reference: Ignat, C., Pouliquen, B., Ribeiro, A., & Steinberger, R. (2003). Extending an information extraction tool set to central and eastern European languages. In Proceedings of the workshop information extraction for slavonic and other central and eastern European languages (IESL’2003) (pp. 33–39). Borovets, Bulgaria, 8–9 Sep 2003.
– reference: Leek, T., Jin, H., Sista, S., & Schwartz, R. (1999). The BBN crosslingual topic detection and tracking system. In 1999 TDT evaluation system summary papers (pp. 214–221). Vienna, VA, USA.
– reference: CareniniM.WhyteA.BertorelloL.VanocchiM.Improving communication in E-democracy using natural language processingIn IEEE Intelligent Systems20072212027
– reference: De Pauw, G., de Schryver, G.-M., & Wagacha, P. W. (2006). Data-driven part-of-speech tagging of Kiswahili. In Text, speech and dialogue (Vol. 4188, pp. 197–204). Berlin: Springer.
– reference: OchF.NeyH.A systematic comparison of various statistical alignment modelsComputational Linguistics20032911951
– reference: Gamon, M., Lozano, C., Pinkham, J., & Reutter, T. (1997). Practical experience with grammar sharing in multilingual NLP. In Proceedings of ACL/EACL, Madrid, Spain, pp. 49–56.
– reference: Steinberger, R., Pouliquen, B., & Ignat, C. (2008a). Using language-independent rules to achieve high multilinguality in text mining. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security (pp. 217–240). Amsterdam, The Netherlands: IOS Press.
– reference: Sproat, R., Roth, D., Zhai, C., Benmamoun, E., Fister, A., Karlinsky, N., et al. (2005). Named entity recognition and transliteration for 50 languages. Keynote address at the second midwest computational linguistics colloquium, 14–15 May 2010, The Ohio State University.
– reference: Pouliquen, B., Steinberger, R., & Best, C. (2007). Automatic detection of quotations in multilingual news. In Proceedings of the international conference recent advances in natural language processing (RANLP’2007) (pp. 487–492). Borovets, Bulgaria, 27–29.09.2007.
– reference: Yarowski, D., Ngai, G., & Wicentowski, R. (2001). Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st international conference on Human Language Technology research (HLT) (pp. 1–8). Stroudsburg, PA, USA.
– reference: Wactlar, H. (1999). New directions in video information extraction and summarization. In Proceedings of the 10th DELOS workshop (pp. 1–10). Sanorini, Greece.
– reference: Ng’ang’a, W. (2005). Word sense disambiguation of Swahili: Extending Swahili language technology with machine learning. Ph.D. thesis, Helsinki University.
– reference: Wentland, W., Knopp, J., Silberer, C., Hartung, M. (2008). Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In Proceedings of LREC (pp. 3230–3237). Genoa, Italy.
– reference: Landauer, T., & Littman, M. (1991). A statistical method for language-independent representation of the topical content of text segments. In 11th International conference expert systems and their applications (Vol. 8, pp. 77–85), Avignon, France.
– reference: VinokourovAShawe-TaylorJCristianiniNInferring a semantic representation of text via cross-language correlation analysisAdvances of Neural Information Processing Systems20021514731480
– reference: Shah, R., Lin, B., Gershman, A., & Frederking, R. (2010). SYNERGY: A named entity recognition system for resource-scarce languages such as Swahili using online machine translation. In Proceedings of the second workshop on African language technology (AfLAT), Malta, 9 July 2010.
– reference: Steinberger, R. (2011). A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation Journal, Special issue on LREC’2010.
– reference: Bering, C., Drożdżyński, W., Erbach, G., Guasch, L., Homola, P., Lehmann, S., et al. (2003). Corpora and evaluation tools for multilingual named entity grammar development. In Proceedings of the multilingual corpora workshop at corpus linguistics (pp. 42–52). Lancaster, UK.
– reference: De PauwGde SchryverG-MWagachaPWA corpus-based survey of four electronic Swahili–English bilingual dictionariesLexikos200919340352
– reference: Pastra, K., Maynard, D., Hamza, O., Cunningham, H., & Wilks, Y. (2002). How feasible is the reuse of grammars for Named Entity Recognition? In Proceedings of LREC (pp. 412–1418). Las Palmas, Spain.
– reference: Steinberger, R., Pouliquen, B., & van der Goot, E. (2009). An Introduction to the Europe media monitor family of applications. In F. Gey, N. Kando, & J. Karlgren (Eds.), Information access in a multilingual world. Proceedings of SIGIR-CLIR (pp. 1–8). Boston, USA. 23 July 2009.
– ident: 9155_CR7
– ident: 9155_CR14
  doi: 10.1162/089120103321337421
– ident: 9155_CR26
– ident: 9155_CR21
– ident: 9155_CR24
– ident: 9155_CR16
– volume: 19
  start-page: 340
  year: 2009
  ident: 9155_CR5
  publication-title: Lexikos
  doi: 10.4314/lex.v19i1.49134
– ident: 9155_CR9
– ident: 9155_CR18
– ident: 9155_CR1
– ident: 9155_CR10
– volume: 15
  start-page: 1473
  year: 2002
  ident: 9155_CR25
  publication-title: Advances of Neural Information Processing Systems
– ident: 9155_CR4
– ident: 9155_CR6
– ident: 9155_CR12
  doi: 10.1017/S1351324902002930
– ident: 9155_CR22
  doi: 10.3233/978-1-58603-898-4-295
– volume: 18
  start-page: 303
  year: 2008
  ident: 9155_CR3
  publication-title: Lexikos
– ident: 9155_CR27
– ident: 9155_CR28
  doi: 10.3115/1072133.1072187
– ident: 9155_CR20
– ident: 9155_CR19
– ident: 9155_CR2
  doi: 10.1109/MIS.2007.11
– ident: 9155_CR15
– ident: 9155_CR17
– ident: 9155_CR8
– ident: 9155_CR13
– ident: 9155_CR23
  doi: 10.3233/978-1-58603-898-4-217
– ident: 9155_CR11
SSID ssj0042478
ssj0002228
Score 1.9001087
Snippet The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and...
Issue Title: Special Issue on African Language Technology The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather,...
SourceID proquest
crossref
springer
jstor
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 311
SubjectTerms African languages
Bantu languages
Classification
Clustering
Computational Linguistics
Computer Applications
Computer Generated Language Analysis
Computer Science
Computer software
Dictionaries
Information management
Information retrieval
Language and Literature
Languages
Legal entities
Linguistics
Media
Monitoring
Multilingualism
Names
Nonnative languages
Nouns
Original Paper
Recognition
Reported Speech
Social Sciences
Speech
Speech recognition
Strapping
Swahili
Swahili language
Text Analysis
Text analytics
Texts
Verbs
Words
SummonAdditionalLinks – databaseName: ProQuest Technology Collection
  dbid: 8FG
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB5Be-ECtFARSisfKg4gi8SOHZsLQqhLhQQXqLS3yLEdqVJJtvsQ9N8zduwtRWIvufgRy_O0Pd8MwJlSpmJOSGpYuK2qDMqc4T3VeFZh2jrOfQAKf_0mLy7rL3MxT7E5qxRWmXViVNRutOGO_J1SldSKVfLD4oaGolHhcTVV0HgI-xVDRgpA8dnnrIhrVkdFXImmpugVzfOj5oScE00IFKpoSJBOb--ZpSky8Z7P-c8zabQ-s6fwOLmN5ONE5wN44IdDeJJLMpAkoYdwknAI5DVJQKOw8bn9Gbjz34sJyEIMicGEAY6-wakjhIT8jCK-jO2DI1d_zYFafDmhIMh6HK_xg1OgU07yled78v1XqIxw9RwuZ-c_Pl3QVGiBWjxeranjDes4-hYezbVpGqVZb3TJrehLr63RXamNtD3Kuil738vOSt930rsOxzjNj2BvGAf_AogyuuFCaa86W9sO-zvmrWykEUoaxQso8z63NmUhD8Uwrtu7_MmBNC2Spg2kaW8LeLMdsphScOzqfBSJt-1ZV6HMVi0KOM7UbJOYrtotUxVAtq0oX-HRxAx-3KxajZyja1R7O7qgtWECXaEC3mY2ufvHf5f6cueKjuHRdH0duPQV7K2XG3-C_s-6O41c_ge-6ALD
  priority: 102
  providerName: ProQuest
Title Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili
URI https://www.jstor.org/stable/41486045
https://link.springer.com/article/10.1007/s10579-011-9155-y
https://www.proquest.com/docview/881698216
https://www.proquest.com/docview/902094322
https://www.proquest.com/docview/914625241
Volume 45
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 1572-8412
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0042478
  issn: 1574-020X
  databaseCode: AFBBN
  dateStart: 19970101
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVPQU
  databaseName: Arts & Humanities Database
  customDbUrl:
  eissn: 1572-8412
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0042478
  issn: 1574-020X
  databaseCode: M1D
  dateStart: 20050201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/artshumanities
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Linguistics Database
  customDbUrl:
  eissn: 1572-8412
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0042478
  issn: 1574-020X
  databaseCode: CRLPW
  dateStart: 20050201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/linguistics
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1572-8412
  dateEnd: 20241001
  omitProxy: true
  ssIdentifier: ssj0042478
  issn: 1574-020X
  databaseCode: BENPR
  dateStart: 20050201
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Technology Collection
  customDbUrl:
  eissn: 1572-8412
  dateEnd: 20241001
  omitProxy: true
  ssIdentifier: ssj0042478
  issn: 1574-020X
  databaseCode: 8FG
  dateStart: 20050201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/technologycollection1
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 1572-8412
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0042478
  issn: 1574-020X
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1572-8412
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0042478
  issn: 1574-020X
  databaseCode: U2A
  dateStart: 20050201
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7BcumlPFrUlId8qHpoFZTYsWP3toVdEC0IUVbdniLHcSQEzaLdrFr66ztO4uWhFolLcvDEcZx52Z5vBuCdlDqmBRehpm63KtYoc5qVocK1ClWmYMw6oPDJqTgaJcdjPu5w3DMf7e6PJBtNfQ_sxlMX2xOHLqd5eLsMK9ytT3qw0j_88WXgFXBCk0YBxzxNQvSGxv4w81-dPDBHbUTiA1_z0fFoY3WGq3Dhx9sGm1ztzet8z_x5lMrxmR-0Bi87L5T0W7ZZhyVbbcCqr_BAOoHfgJ0O1kDekw635P6jb38FxeD3TYuLIZo0sYkO3T7HrhtECvnZaIxp014V5PJeH2gUpi2ogtSTyTVesAv08YnfQf1Evv1yhRYuX8NoOLjYPwq7ug2hwdVaHRYspTlDV8Wi9ddpKhUttYqY4WVkldEqj5QWpkTVoaPSliI3wpa5sEWOzxSKbUKvmlT2DRCpVcq4VFbmJjE50hfUGpEKzaXQkgUQ-d-XmS6puautcZ3dpWN2s5zhLGdulrPbAD4sHrlpM3o8RbzZ8MSCMold1a6EB7DlmSTrpH6WSRkLJWksAiCLVhRXdwajKzuZzzKFDKkS1KJPkKDxohw9qwA-esa5e8d_h_r2WdRb8KLdHXcMuA29ejq3O-he1fkuLMvh4W4nVHj_PDg9O8f7_vnXs-_YehIf4HVE-38BU-8jOA
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3NbtQwEB5V5QAXWgoVoS34ABxAKYmTODYSQoi22tJuL7TS3oLjOFJFSZbdrMryTrwKz8RMEm9pJfbWA5e92PFGzjffjD1_AM-l1CEvEuFrTrdVoUaZ01HpKzyrcGWKKLKUKDw8EYOz-NMoGa3AL5cLQ2GVjhNboi5qQ3fkb6QMhZI8FO_H331qGkXOVddBo0PFkZ1f4olt-u5wDz_vC84P9k8_Dvy-qYBv8CjR-EWU8jxCPWpRNek0lYqXWgWRScrAKqNVHigtTIm41kFpS5EbYctc2CLHZwqqvYSMfwetiJBK9Q_DPUf8MY9b4g-TNPbRChs5J2qXqZekFJgU-lSQ3Z9fU4NdJOQ1G_eGW7bVdgdr8NvtUxfk8nV31uS75ueNEpL_zUauw_3e7mYfOkF5ACu22oA119OC9RS3ATt9Igd7yfpMLUKuG38Ixf6PcZcJxDRrozEpn3-GS7c5OOxby5GTdrwq2Plfa6AanHRpJKyp6wv8wSXwVMPcnfFb9vmSWkucP4KzW9mLTVit6so-Bia1SqNEKitzE5sc5xfcGpEKnUihZeRB4ICTmb6MO3UTuciuClAT1jLEWkZYy-YevFo8Mu5qmCybvNmicTEzDqlPWZx4sOXwlPU8N80WYPKALUaRoMjrpCtbz6aZQlFQMeqNJVNQXfMEbUkPXjvcX_3HP1_1ydI3egZ3B6fD4-z48ORoC-51vgASwW1YbSYzu4PGZJM_bUWYwZfbhvof9o-E9w
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3NbtQwEB5VRUJcaClUhLbgA3AApU2cxLGREEJsVy2lFRJU2ltwHEeqKMmym1VZ3oxX4WmYSeItrcTeeuCyFzveyPnmx575ZgCeSqlDXiTC15xuq0KNMqej0ld4VuHKFFFkiSh8fCIOTuP3o2S0Ar8cF4bSKp1ObBV1URu6I9-TMhRK8lDslX1WxMfB8M34u08NpCjQ6rppdAg5svMLPL1NXx8O8FM_43y4__ndgd83GPANHisav4hSnkdoUy2aKZ2mUvFSqyAySRlYZbTKA6WFKRHjOihtKXIjbJkLW-T4TEF1mFD730pjEVDZ_uNw4IxAzOPWCIRJGvvokY1cQLVj7SUpJSmFPhVn9-dXTGKXFXnF370Wom0t33ANfrs96xJevu7OmnzX_LxWTvK_3NR1uNv74-xtJ0D3YMVWG7Dmel2wXvVtwE5P8GDPWc_gIkS78ftQ7P8YdwwhplmbpUk8_xku3XJz2LdWd07a8apgZ3-tgfsx6eglrKnrc_zBJfC0w9xd8iv26YJaTpw9gNMb2YtNWK3qyj4EJrVKo0QqK3MTmxznF9wakQqdSKFl5EHgQJSZvrw7dRk5zy4LUxPuMsRdRrjL5h68WDwy7mqbLJu82SJzMTMOqX9ZnHiw5bCV9fpvmi2A5QFbjKLiomiUrmw9m2YKxULFaE-WTEEzzhP0MT146WTg8j_--aqPlr7RE7iNIM8-HJ4cbcGdLkRA0rgNq81kZnfQx2zyx600M_hy00j_A_Gxjbw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Expanding+a+multilingual+media+monitoring+and+information+extraction+tool+to+a+new+language%3A+Swahili&rft.jtitle=Language+resources+and+evaluation&rft.au=Steinberger%2C+Ralf&rft.au=Ombuya%2C+Sylvia&rft.au=Kabadjov%2C+Mijail&rft.au=Pouliquen%2C+Bruno&rft.date=2011-09-01&rft.issn=1574-020X&rft.eissn=1574-0218&rft.volume=45&rft.issue=3&rft.spage=311&rft.epage=330&rft_id=info:doi/10.1007%2Fs10579-011-9155-y&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s10579_011_9155_y
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1574-020X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1574-020X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1574-020X&client=summon