基于Web数据的农业网络信息自动采集与分类系统

为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统。该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能。应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高...

Full description

Saved in:
Bibliographic Details
Published in农业工程学报 Vol. 32; no. 12; pp. 172 - 178
Main Author 段青玲 魏芳芳 张磊 肖晓琰
Format Journal Article
LanguageChinese
Published 中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083 2016
北京市农业物联网工程技术研究中心,北京 100083
Subjects
Online AccessGet full text
ISSN1002-6819
DOI10.11975/j.issn.1002-6819.2016.12.025

Cover

Abstract 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统。该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能。应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高等特点,该系统为农业信息整合和服务提供参考。
AbstractList TP274+.2; 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统.该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能.应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高等特点,该系统为农业信息整合和服务提供参考.
为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统。该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能。应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高等特点,该系统为农业信息整合和服务提供参考。
Abstract_FL The purpose of this study is to obtain agricultural web information efficiently, and to provide users with personalized service through the integration of agricultural resources scattered in different sites and the fusion of heterogeneous environmental data. The research in this paper has improved some key information technologies, which are agricultural web data acquisition and extraction technologies, text classification based on support vector machine (SVM) and heterogeneous data collection based on the Internet of things (IOT). We first add quality target seed site into the system, and get website URL (uniform resource locator) and category information. The web crawler program can save original pages. The de-noised web page can be obtained through HTML parser and regular expressions, which create custom Node Filter objects. Therefore, the system builds a document object model (DOM) tree before digging out data area. According to filtering rules, the target data area can be identified from a plurality of data regions with repeated patterns. Next, the structured data can be extracted after property segmentation. Secondly, we construct linear SVM classification model, and realize agricultural text classification automatically. The procedures of our model include 4 steps. First of all, we use segment tool ICTCLAS to carry out the word segment and part-of-speech (POS) tagging, followed by combining agricultural key dictionary and document frequency adjustment rule to choose feature words, and building a feature vector and calculating inverse document frequency (IDF) weight value for feature words; lastly we design adaptive classifier of SVM algorithm. Finally, the perception data of different format collected by the sensor are transmitted to the designated server as the source data through the wireless sensor network. Relational database in accordance with specified acquisition frequency can be achieved through data conversion and data filtering. The key step of data conversion can be implemented on the basis of mapping rules between source data and target data. The mapping rules include 3 kinds of rules. The first is the source data directly corresponding to the target data; the second is that we create a temporary table, which corresponds to target table if they have same field name; and the third is converting perception data of XML (extensible markup language) type to relational database. Besides, data filtering is required to process abnormal values of the measured value beyond the sensor range. In this paper, unified modeling language (UML) is used to describe the agricultural network information automatic acquisition and classification system. User requirement analysis is described by the system's use case diagram. Web data extraction process is described by the system activity diagram. These help the system's key function implement of automatic information acquisition from Internet. The IOT data sharing module is implemented based on the proposed data conversion and filtering rules. The system can supply the services of on-time agricultural news, agricultural product prices, supply and demand information browsing query, real-time agricultural environment monitoring and personalized information statistics. The preliminary application shows that the agricultural network information automatic acquisition and classification system improves the accuracy of information extraction and text classification. The information acquisition accuracy rate for sample web sets is 98.2%, and the accuracy rate of text classification with rules is 92.5%. Compared with sequential minimal optimization (SMO), Bayesian, C4.5 decision tree and radial basis function (RBF) based SVM algorithm, linear SVM is more suitable for agricultural news classification. The system has high real-time performance and good user participation for IOT applications, which will expect to be applied to agricultural information integration and intelligent processing.
Author 段青玲 魏芳芳 张磊 肖晓琰
AuthorAffiliation 中国农业大学信息与电气工程学院,北京100083 北京市农业物联网工程技术研究中心,北京100083
AuthorAffiliation_xml – name: 中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083;北京市农业物联网工程技术研究中心,北京 100083
Author_FL Zhang Lei
Duan Qingling
Wei Fangfang
Xiao Xiaoyan
Author_FL_xml – sequence: 1
  fullname: Duan Qingling
– sequence: 2
  fullname: Wei Fangfang
– sequence: 3
  fullname: Zhang Lei
– sequence: 4
  fullname: Xiao Xiaoyan
Author_xml – sequence: 1
  fullname: 段青玲 魏芳芳 张磊 肖晓琰
BookMark eNo9j01LwzAAhnOY4Jz7E4J4ak2apGlOIsMvGHgZeCxNTGeHZroiuqMg2xAcePAyB8ObePADL2NF_DPt6v6FkYmnF14e3o8lUNBNrQBYRdBGiDO63rCjONY2gtCxXA9x24HItZFjQ4cWQPHfXwTlOI4EpAgzCAkqgo1sNEkn_QMlpvdv09uXfHCddYbpeJB_3uXJMP16nF69fnefs5unWbc7e-ik437W6-TvSf6R5MloGSyEwXGsyn9aArXtrVpl16ru7-xVNquWpJxagnOIKJemlXhEeNAJsRKYEoQEVqGUMCBU4kOClecqJaWSLmOQEW4QLgJcAmvz2ItAh4Gu-43meUubQl-36_JS_P5FjnlryJU5KY-aun4WGfa0FZ0ErbbvumYFw4ThHyZib74
ClassificationCodes TP274+.2
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
W95
~WA
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.11975/j.issn.1002-6819.2016.12.025
DatabaseName 维普期刊资源整合服务平台
中文科技期刊数据库-CALIS站点
维普中文期刊数据库
中文科技期刊数据库-农业科学
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Agriculture
DocumentTitleAlternate Automatic acquisition and classification system for agricultural network information based on Web data
DocumentTitle_FL Automatic acquisition and classification system for agricultural network information based on Web data
EndPage 178
ExternalDocumentID nygcxb201612025
669017347
GrantInformation_xml – fundername: 国家高技术研究发展计划(863计划)资助项目; 山东省自主创新资助项目; 中央高校基本科研业务费专项资金资助项目
  funderid: (2013AA102306); (2014XGA13054); (2015XD001)
GroupedDBID -04
2B.
2B~
2RA
5XA
5XE
92G
92I
92L
ABDBF
ABJNI
ACGFO
ACGFS
AEGXH
AIAGR
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CHDYS
CQIGP
CW9
EOJEC
FIJ
IPNFZ
OBODZ
RIG
TCJ
TGD
TUS
U1G
U5N
W95
~WA
4A8
93N
ACUHS
PSX
ID FETCH-LOGICAL-c595-b990159c370484b802f3eb35411b3efcc0a45c3d43e86eeccec67707495419ba3
ISSN 1002-6819
IngestDate Thu May 29 04:04:20 EDT 2025
Wed Feb 14 10:18:42 EST 2024
IsPeerReviewed false
IsScholarly true
Issue 12
Keywords agriculture
text processing
文本处理
农业
信息
采集系统
information
物联网
information systems
the Internet of things
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c595-b990159c370484b802f3eb35411b3efcc0a45c3d43e86eeccec67707495419ba3
Notes Duan Qingling, Wei Fangfang, Zhang Lei, Xiao Xiaoyan (1. College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China; 2. Beijing Agricultural Networking Engineering Technology Research Center, Beijing 100083, China)
11-2047/S
agriculture; text processing; information systems; information; the Internet of things
The purpose of this study is to obtain agricultural web information efficiently, and to provide users with personalized service through the integration of agricultural resources scattered in different sites and the fusion of heterogeneous environmental data. The research in this paper has improved some key information technologies, which are agricultural web data acquisition and extraction technologies, text classification based on support vector machine(SVM) and heterogeneous data collection based on the Internet of things(IOT). We first add quality target seed site into the system, and get website URL(uniform resource locator) and category information. The web
PageCount 7
ParticipantIDs wanfang_journals_nygcxb201612025
chongqing_primary_669017347
PublicationCentury 2000
PublicationDate 2016
PublicationDateYYYYMMDD 2016-01-01
PublicationDate_xml – year: 2016
  text: 2016
PublicationDecade 2010
PublicationTitle 农业工程学报
PublicationTitleAlternate Transactions of the Chinese Society of Agricultural Engineering
PublicationTitle_FL Transactions of the Chinese Society of Agricultural Engineering
PublicationYear 2016
Publisher 中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083
北京市农业物联网工程技术研究中心,北京 100083
Publisher_xml – name: 北京市农业物联网工程技术研究中心,北京 100083
– name: 中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083
SSID ssib051370041
ssib017478172
ssj0041925
ssib001101065
ssib023167668
Score 2.136531
Snippet 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采...
TP274+.2; 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 172
SubjectTerms 信息
农业
文本处理
物联网
采集系统
Title 基于Web数据的农业网络信息自动采集与分类系统
URI http://lib.cqvip.com/qk/90712X/201612/669017347.html
https://d.wanfangdata.com.cn/periodical/nygcxb201612025
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  issn: 1002-6819
  databaseCode: ABDBF
  dateStart: 20140101
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  omitProxy: true
  ssIdentifier: ssj0041925
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1NaxQxNNQWRA_iJ9aq9GBOZepkZjKTnCSznaUIeqrY2zKZZranrdYWtDdBuiJY8OClFoo38eAHXkoX8b_Ibtf-C9_LzM6uWooKS3i8JO-95GXfvHy8hJAbOedplIapY9xIO4GJMkebiDtezk2gmac9e9r9zt1w_l5we5Evjo19Hzm1tL6mZ7ONI-NK_kergAO9YpTsP2i2IgoIgEG_kIKGIf0rHdOEU1mnsaJJgKlI7htNk5BKTmMXATFHVUKTiErIDbC8gNyaLS8QCVnxHJXMAvEgq04Vs9UZVXWaCCoiqpStrqgSNJGIgR8AMkaaBUGR2DLCYoAgQ5oI-AMAWPxykvAokUB4YMexPPASMWIUNCS0IoEAfDBMEAHti7kVBNrhYSWQIvZmEAW1RCG_QhkGwIzlUaPKtTx8wM7YTI9Ky0QCNd92mwsdOboyUoRslmYc7XwoSmNc2vnhOup6dXa7sNqseD2odABY8abQn98WGXH7cUEWsxULPB4Y2iXlIoL7t-u7Q3zxK_KD6ASZ8HDNaJxMqHgurg-dVobz8sqqMnzRgA2jmT28qyAcTgo58_FJguogE27jc7unXwp0ktCBuDePExZvE1leaTUfgo9kQ9ZaedpqjnhXC2fJmXJaNK2KMX6OjG0snyenVXO1vBrGXCC3erv73f0tGOEHrz8dvPzQ337W29zp7m33v77qd3a6394ePP34o_2-9-LdYbt9-Gazu7fVe77Z_9zpf-n0O7sXyUI9WajNO-XzH07GJXc07thymUFrAxFo4Xq5b7TPA8a0b_Isc9OAZ_5S4BsRGrBEJgujCDxiCUWkTv1LZLy10jKXyXSULzE_dw1PUxHkMkjBS11yOdfGT93MpJNkquqJxoPilpdGpbhJMl32TaP87z9qtJ40s8caO5N50JVXjiUwRU5hyWLh7ioZX1tdN9fAlV3T18ux8BN0q3xQ
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8EWeb%E6%95%B0%E6%8D%AE%E7%9A%84%E5%86%9C%E4%B8%9A%E7%BD%91%E7%BB%9C%E4%BF%A1%E6%81%AF%E8%87%AA%E5%8A%A8%E9%87%87%E9%9B%86%E4%B8%8E%E5%88%86%E7%B1%BB%E7%B3%BB%E7%BB%9F&rft.jtitle=%E5%86%9C%E4%B8%9A%E5%B7%A5%E7%A8%8B%E5%AD%A6%E6%8A%A5&rft.au=%E6%AE%B5%E9%9D%92%E7%8E%B2+%E9%AD%8F%E8%8A%B3%E8%8A%B3+%E5%BC%A0%E7%A3%8A+%E8%82%96%E6%99%93%E7%90%B0&rft.date=2016&rft.issn=1002-6819&rft.volume=32&rft.issue=12&rft.spage=172&rft.epage=178&rft_id=info:doi/10.11975%2Fj.issn.1002-6819.2016.12.025&rft.externalDocID=669017347
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F90712X%2F90712X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fnygcxb%2Fnygcxb.jpg