基于多源的跨领域数据分类快速新算法
研究跨领域学习与分类是为了将对多源域的有监督学习结果有效地迁移至目标域,实现对目标域的无标记分类.当前的跨领域学习一般侧重于对单一源域到目标域的学习,且样本规模普遍较小,此类方法领域自适应性较差,面对大样本数据更显得无能为力,从而直接影响跨域学习的分类精度与效率.为了尽可能多地利用相关领域的有用数据,本文提出了一种多源跨领域分类算法(Multiple sources cross-domain classification, MSCC),,该算法依据被众多实验证明有效的“罗杰斯特回归模型”与“一致性方法”构建多个源域分类器并综合指导目标域的数据分类.为了充分高效利用大样本的源域数据,满足大样本的...
Saved in:
| Published in | 自动化学报 Vol. 40; no. 3; pp. 531 - 547 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | Chinese |
| Published |
无锡职业技术学院 无锡 214000
2014
江苏北方湖光光电有限责任公司 无锡 214035%江南大学数字媒体学院 无锡 214122%江南大学数字媒体学院 无锡 214122 江南大学数字媒体学院 无锡 214122 |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0254-4156 1874-1029 |
| DOI | 10.3724/SP.J.1004.2014.00531 |
Cover
| Summary: | 研究跨领域学习与分类是为了将对多源域的有监督学习结果有效地迁移至目标域,实现对目标域的无标记分类.当前的跨领域学习一般侧重于对单一源域到目标域的学习,且样本规模普遍较小,此类方法领域自适应性较差,面对大样本数据更显得无能为力,从而直接影响跨域学习的分类精度与效率.为了尽可能多地利用相关领域的有用数据,本文提出了一种多源跨领域分类算法(Multiple sources cross-domain classification, MSCC),,该算法依据被众多实验证明有效的“罗杰斯特回归模型”与“一致性方法”构建多个源域分类器并综合指导目标域的数据分类.为了充分高效利用大样本的源域数据,满足大样本的快速运算,在MSCC的基础上,本文结合最新的CDdual(Dualcoordinatedescentmethod)算法,提出了算法MSCC的快速算法MSCC—CDdual,并进行了相关的理论分析.人工数据集、文本数据集与图像数据集的实验运行结果表明,该算法对于大样本数据集有着较高的分类精度、快速的运行速度和较高的领域自适应性.本文的主要贡献体现在三个方面:1)针对多源跨领域分类提出了一种新的“一致性方法”,该方法有利于将MSCC算法发展为MSCC—CDdual快速算法;2)提出了MSCC—CDdual快速算法,该算法既适用于样本较少的数据集又适用于大样本数据集;3)MSCC—CDdual算法在高维数据集上相比其他算法展现了其独特的优势. |
|---|---|
| Bibliography: | Cross-domain, multi-source, logistic regression, posterior probability, classification Cross-domain learning and classification involved in this paper attempts to effectively transfer the classification results obtained from supervised multisource domains to an unsupervised target domain. Generally speaking, although current cross-domain learning methods have obtained great successes for cross-single-domain learning problems, they will encounter overwhelming troubles in the sense of classification accuracy and running speed when carrying out them on large cross-multisource datasets. In this paper, based on the logistic regression model and the proposed consensus measure, a multi-source cross-domain classification (MSCC) algorithm is proposed to realize effective cross-domain classification for the target domain. In order to enable the MSCC to work well for large datasets, based on the algorithm CDdual (Dual coordinate descent method) as the recent advance about large-scale logistic regression, an MSCC~s fast v |
| ISSN: | 0254-4156 1874-1029 |
| DOI: | 10.3724/SP.J.1004.2014.00531 |