Research and Design of Theme Image Crawler Based on Difference Hash Algorithm

For the problem of high repetition rate of image resources collected by general theme crawler, a theme image crawler system is designed to reduce image similarity. The main contents of the design include the main function modules of the crawler, the workflow of the system and the implementation meth...

Full description

Saved in:
Bibliographic Details
Published inIOP conference series. Materials Science and Engineering Vol. 563; no. 4; pp. 42080 - 42086
Main Authors Wang, De-zhi, Liang, Jun-yan
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.07.2019
Subjects
Online AccessGet full text
ISSN1757-8981
1757-899X
1757-899X
DOI10.1088/1757-899X/563/4/042080

Cover

More Information
Summary:For the problem of high repetition rate of image resources collected by general theme crawler, a theme image crawler system is designed to reduce image similarity. The main contents of the design include the main function modules of the crawler, the workflow of the system and the implementation method of the key modules. The difference hash algorithm is used to solve the problem of image similarity effectively. Combined with Web text cosine correlation algorithm and link PageRank algorithm, the paper comprehensively evaluates the relevance between Web resources and topics. The experimental results show that the subject image crawler can effectively reduce the similarity of the collected images and improve the efficiency of crawler image resources acquisition.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1757-8981
1757-899X
1757-899X
DOI:10.1088/1757-899X/563/4/042080