MCut: A Thresholding Strategy for Multi-label Classification

The multi-label classification is a frequent task in machine learning notably in text categorization. When binary classifiers are not suited, an alternative consists in using a multiclass classifier that provides for each document a score per category and then in applying a thresholding strategy in...

Full description

Saved in:
Bibliographic Details
Published inAdvances in Intelligent Data Analysis XI pp. 172 - 183
Main Authors Largeron, Christine, Moulin, Christophe, Géry, Mathias
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2012
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783642341557
3642341551
ISSN0302-9743
1611-3349
DOI10.1007/978-3-642-34156-4_17

Cover

More Information
Summary:The multi-label classification is a frequent task in machine learning notably in text categorization. When binary classifiers are not suited, an alternative consists in using a multiclass classifier that provides for each document a score per category and then in applying a thresholding strategy in order to select the set of categories which must be assigned to the document. The common thresholding strategies, such as RCut, PCut and SCut methods, need a training step to determine the value of the threshold. To overcome this limit, we propose a new strategy, called MCut which automatically estimates a value for the threshold. This method does not have to be trained and does not need any parametrization. Experiments performed on two textual corpora, XML Mining 2009 and RCV1 collections, show that the MCut strategy results are on par with the state of the art but MCut is easy to implement and parameter free.
ISBN:9783642341557
3642341551
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-642-34156-4_17