基于Web数据的农业网络信息自动采集与分类系统
为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统。该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能。应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高...
Saved in:
| Published in | 农业工程学报 Vol. 32; no. 12; pp. 172 - 178 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | Chinese |
| Published |
中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083
2016
北京市农业物联网工程技术研究中心,北京 100083 |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1002-6819 |
| DOI | 10.11975/j.issn.1002-6819.2016.12.025 |
Cover
| Abstract | 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统。该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能。应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高等特点,该系统为农业信息整合和服务提供参考。 |
|---|---|
| AbstractList | TP274+.2; 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统.该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能.应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高等特点,该系统为农业信息整合和服务提供参考. 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采集等技术,并采用统一建模语言(unified modeling language,UML)描述了农业网络信息自动采集与分类系统。该系统实现了农业网站、物联网数据的自动抓取和共享,为用户提供农业资讯、农产品市场行情、供求信息在线查询,环境数据实时监测和个性化信息服务等功能。应用结果表明,该系统对样本集网站的信息抓取准确率为98.2%,资讯分类准确率为92.5%,具有数据采集实时性强、用户参与度好、通用性高等特点,该系统为农业信息整合和服务提供参考。 |
| Abstract_FL | The purpose of this study is to obtain agricultural web information efficiently, and to provide users with personalized service through the integration of agricultural resources scattered in different sites and the fusion of heterogeneous environmental data. The research in this paper has improved some key information technologies, which are agricultural web data acquisition and extraction technologies, text classification based on support vector machine (SVM) and heterogeneous data collection based on the Internet of things (IOT). We first add quality target seed site into the system, and get website URL (uniform resource locator) and category information. The web crawler program can save original pages. The de-noised web page can be obtained through HTML parser and regular expressions, which create custom Node Filter objects. Therefore, the system builds a document object model (DOM) tree before digging out data area. According to filtering rules, the target data area can be identified from a plurality of data regions with repeated patterns. Next, the structured data can be extracted after property segmentation. Secondly, we construct linear SVM classification model, and realize agricultural text classification automatically. The procedures of our model include 4 steps. First of all, we use segment tool ICTCLAS to carry out the word segment and part-of-speech (POS) tagging, followed by combining agricultural key dictionary and document frequency adjustment rule to choose feature words, and building a feature vector and calculating inverse document frequency (IDF) weight value for feature words; lastly we design adaptive classifier of SVM algorithm. Finally, the perception data of different format collected by the sensor are transmitted to the designated server as the source data through the wireless sensor network. Relational database in accordance with specified acquisition frequency can be achieved through data conversion and data filtering. The key step of data conversion can be implemented on the basis of mapping rules between source data and target data. The mapping rules include 3 kinds of rules. The first is the source data directly corresponding to the target data; the second is that we create a temporary table, which corresponds to target table if they have same field name; and the third is converting perception data of XML (extensible markup language) type to relational database. Besides, data filtering is required to process abnormal values of the measured value beyond the sensor range. In this paper, unified modeling language (UML) is used to describe the agricultural network information automatic acquisition and classification system. User requirement analysis is described by the system's use case diagram. Web data extraction process is described by the system activity diagram. These help the system's key function implement of automatic information acquisition from Internet. The IOT data sharing module is implemented based on the proposed data conversion and filtering rules. The system can supply the services of on-time agricultural news, agricultural product prices, supply and demand information browsing query, real-time agricultural environment monitoring and personalized information statistics. The preliminary application shows that the agricultural network information automatic acquisition and classification system improves the accuracy of information extraction and text classification. The information acquisition accuracy rate for sample web sets is 98.2%, and the accuracy rate of text classification with rules is 92.5%. Compared with sequential minimal optimization (SMO), Bayesian, C4.5 decision tree and radial basis function (RBF) based SVM algorithm, linear SVM is more suitable for agricultural news classification. The system has high real-time performance and good user participation for IOT applications, which will expect to be applied to agricultural information integration and intelligent processing. |
| Author | 段青玲 魏芳芳 张磊 肖晓琰 |
| AuthorAffiliation | 中国农业大学信息与电气工程学院,北京100083 北京市农业物联网工程技术研究中心,北京100083 |
| AuthorAffiliation_xml | – name: 中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083;北京市农业物联网工程技术研究中心,北京 100083 |
| Author_FL | Zhang Lei Duan Qingling Wei Fangfang Xiao Xiaoyan |
| Author_FL_xml | – sequence: 1 fullname: Duan Qingling – sequence: 2 fullname: Wei Fangfang – sequence: 3 fullname: Zhang Lei – sequence: 4 fullname: Xiao Xiaoyan |
| Author_xml | – sequence: 1 fullname: 段青玲 魏芳芳 张磊 肖晓琰 |
| BookMark | eNo9j01LwzAAhnOY4Jz7E4J4ak2apGlOIsMvGHgZeCxNTGeHZroiuqMg2xAcePAyB8ObePADL2NF_DPt6v6FkYmnF14e3o8lUNBNrQBYRdBGiDO63rCjONY2gtCxXA9x24HItZFjQ4cWQPHfXwTlOI4EpAgzCAkqgo1sNEkn_QMlpvdv09uXfHCddYbpeJB_3uXJMP16nF69fnefs5unWbc7e-ik437W6-TvSf6R5MloGSyEwXGsyn9aArXtrVpl16ru7-xVNquWpJxagnOIKJemlXhEeNAJsRKYEoQEVqGUMCBU4kOClecqJaWSLmOQEW4QLgJcAmvz2ItAh4Gu-43meUubQl-36_JS_P5FjnlryJU5KY-aun4WGfa0FZ0ErbbvumYFw4ThHyZib74 |
| ClassificationCodes | TP274+.2 |
| ContentType | Journal Article |
| Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| DBID | 2RA 92L CQIGP W95 ~WA 2B. 4A8 92I 93N PSX TCJ |
| DOI | 10.11975/j.issn.1002-6819.2016.12.025 |
| DatabaseName | 维普期刊资源整合服务平台 中文科技期刊数据库-CALIS站点 维普中文期刊数据库 中文科技期刊数据库-农业科学 中文科技期刊数据库- 镜像站点 Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Agriculture |
| DocumentTitleAlternate | Automatic acquisition and classification system for agricultural network information based on Web data |
| DocumentTitle_FL | Automatic acquisition and classification system for agricultural network information based on Web data |
| EndPage | 178 |
| ExternalDocumentID | nygcxb201612025 669017347 |
| GrantInformation_xml | – fundername: 国家高技术研究发展计划(863计划)资助项目; 山东省自主创新资助项目; 中央高校基本科研业务费专项资金资助项目 funderid: (2013AA102306); (2014XGA13054); (2015XD001) |
| GroupedDBID | -04 2B. 2B~ 2RA 5XA 5XE 92G 92I 92L ABDBF ABJNI ACGFO ACGFS AEGXH AIAGR ALMA_UNASSIGNED_HOLDINGS CCEZO CHDYS CQIGP CW9 EOJEC FIJ IPNFZ OBODZ RIG TCJ TGD TUS U1G U5N W95 ~WA 4A8 93N ACUHS PSX |
| ID | FETCH-LOGICAL-c595-b990159c370484b802f3eb35411b3efcc0a45c3d43e86eeccec67707495419ba3 |
| ISSN | 1002-6819 |
| IngestDate | Thu May 29 04:04:20 EDT 2025 Wed Feb 14 10:18:42 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 12 |
| Keywords | agriculture text processing 文本处理 农业 信息 采集系统 information 物联网 information systems the Internet of things |
| Language | Chinese |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c595-b990159c370484b802f3eb35411b3efcc0a45c3d43e86eeccec67707495419ba3 |
| Notes | Duan Qingling, Wei Fangfang, Zhang Lei, Xiao Xiaoyan (1. College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China; 2. Beijing Agricultural Networking Engineering Technology Research Center, Beijing 100083, China) 11-2047/S agriculture; text processing; information systems; information; the Internet of things The purpose of this study is to obtain agricultural web information efficiently, and to provide users with personalized service through the integration of agricultural resources scattered in different sites and the fusion of heterogeneous environmental data. The research in this paper has improved some key information technologies, which are agricultural web data acquisition and extraction technologies, text classification based on support vector machine(SVM) and heterogeneous data collection based on the Internet of things(IOT). We first add quality target seed site into the system, and get website URL(uniform resource locator) and category information. The web |
| PageCount | 7 |
| ParticipantIDs | wanfang_journals_nygcxb201612025 chongqing_primary_669017347 |
| PublicationCentury | 2000 |
| PublicationDate | 2016 |
| PublicationDateYYYYMMDD | 2016-01-01 |
| PublicationDate_xml | – year: 2016 text: 2016 |
| PublicationDecade | 2010 |
| PublicationTitle | 农业工程学报 |
| PublicationTitleAlternate | Transactions of the Chinese Society of Agricultural Engineering |
| PublicationTitle_FL | Transactions of the Chinese Society of Agricultural Engineering |
| PublicationYear | 2016 |
| Publisher | 中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083 北京市农业物联网工程技术研究中心,北京 100083 |
| Publisher_xml | – name: 北京市农业物联网工程技术研究中心,北京 100083 – name: 中国农业大学信息与电气工程学院,北京,100083%中国农业大学信息与电气工程学院,北京 100083 |
| SSID | ssib051370041 ssib017478172 ssj0041925 ssib001101065 ssib023167668 |
| Score | 2.136531 |
| Snippet | 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异构数据采... TP274+.2; 为了快速、高效地获取农业Web信息,解决信息孤岛和信息不对称的问题,重点研究了农业Web数据自动采集与抽取、基于SVM(support vector machine)的文本分类、物联网异... |
| SourceID | wanfang chongqing |
| SourceType | Aggregation Database Publisher |
| StartPage | 172 |
| SubjectTerms | 信息 农业 文本处理 物联网 采集系统 |
| Title | 基于Web数据的农业网络信息自动采集与分类系统 |
| URI | http://lib.cqvip.com/qk/90712X/201612/669017347.html https://d.wanfangdata.com.cn/periodical/nygcxb201612025 |
| Volume | 32 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate issn: 1002-6819 databaseCode: ABDBF dateStart: 20140101 customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn isFulltext: true dateEnd: 99991231 titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn omitProxy: true ssIdentifier: ssj0041925 providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1NaxQxNNQWRA_iJ9aq9GBOZepkZjKTnCSznaUIeqrY2zKZZranrdYWtDdBuiJY8OClFoo38eAHXkoX8b_Ibtf-C9_LzM6uWooKS3i8JO-95GXfvHy8hJAbOedplIapY9xIO4GJMkebiDtezk2gmac9e9r9zt1w_l5we5Evjo19Hzm1tL6mZ7ONI-NK_kergAO9YpTsP2i2IgoIgEG_kIKGIf0rHdOEU1mnsaJJgKlI7htNk5BKTmMXATFHVUKTiErIDbC8gNyaLS8QCVnxHJXMAvEgq04Vs9UZVXWaCCoiqpStrqgSNJGIgR8AMkaaBUGR2DLCYoAgQ5oI-AMAWPxykvAokUB4YMexPPASMWIUNCS0IoEAfDBMEAHti7kVBNrhYSWQIvZmEAW1RCG_QhkGwIzlUaPKtTx8wM7YTI9Ky0QCNd92mwsdOboyUoRslmYc7XwoSmNc2vnhOup6dXa7sNqseD2odABY8abQn98WGXH7cUEWsxULPB4Y2iXlIoL7t-u7Q3zxK_KD6ASZ8HDNaJxMqHgurg-dVobz8sqqMnzRgA2jmT28qyAcTgo58_FJguogE27jc7unXwp0ktCBuDePExZvE1leaTUfgo9kQ9ZaedpqjnhXC2fJmXJaNK2KMX6OjG0snyenVXO1vBrGXCC3erv73f0tGOEHrz8dvPzQ337W29zp7m33v77qd3a6394ePP34o_2-9-LdYbt9-Gazu7fVe77Z_9zpf-n0O7sXyUI9WajNO-XzH07GJXc07thymUFrAxFo4Xq5b7TPA8a0b_Isc9OAZ_5S4BsRGrBEJgujCDxiCUWkTv1LZLy10jKXyXSULzE_dw1PUxHkMkjBS11yOdfGT93MpJNkquqJxoPilpdGpbhJMl32TaP87z9qtJ40s8caO5N50JVXjiUwRU5hyWLh7ioZX1tdN9fAlV3T18ux8BN0q3xQ |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8EWeb%E6%95%B0%E6%8D%AE%E7%9A%84%E5%86%9C%E4%B8%9A%E7%BD%91%E7%BB%9C%E4%BF%A1%E6%81%AF%E8%87%AA%E5%8A%A8%E9%87%87%E9%9B%86%E4%B8%8E%E5%88%86%E7%B1%BB%E7%B3%BB%E7%BB%9F&rft.jtitle=%E5%86%9C%E4%B8%9A%E5%B7%A5%E7%A8%8B%E5%AD%A6%E6%8A%A5&rft.au=%E6%AE%B5%E9%9D%92%E7%8E%B2+%E9%AD%8F%E8%8A%B3%E8%8A%B3+%E5%BC%A0%E7%A3%8A+%E8%82%96%E6%99%93%E7%90%B0&rft.date=2016&rft.issn=1002-6819&rft.volume=32&rft.issue=12&rft.spage=172&rft.epage=178&rft_id=info:doi/10.11975%2Fj.issn.1002-6819.2016.12.025&rft.externalDocID=669017347 |
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F90712X%2F90712X.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fnygcxb%2Fnygcxb.jpg |