Duplicated record detection based on improved RBF neural network

This paper presents a method based on modified Radial Basis Function(RBF) neural network to improve the accuracy and recall rate for detection of duplicated records. Firstly, key fields of records are clustered by Density-Based Spatial Clustering of Applications with Noise(DBSCAN) and all records ar...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) pp. 2034 - 2037
Main Authors Liu, Xinting, Cai, Xiaodong, Li, Bo, Chen, Mingyao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2017
Subjects
Online AccessGet full text
DOI10.1109/IAEAC.2017.8054373

Cover

Abstract This paper presents a method based on modified Radial Basis Function(RBF) neural network to improve the accuracy and recall rate for detection of duplicated records. Firstly, key fields of records are clustered by Density-Based Spatial Clustering of Applications with Noise(DBSCAN) and all records are classified to several classes which include duplicated records. Secondly, the similarity of corresponding fields of records in each class is computed using Jaro algorithm and duplicated records are labeled manually. Finally, Subtractive Clustering Method (SCM) and Particle Swarm Algorithm (PSO) are used to optimize the parameters of RBF neural network so that monitoring model of duplicated records is built. This method is tested with different datasets. The experimental results show that the accuracy and recall rate for the detection of duplicated records are improved significantly.
AbstractList This paper presents a method based on modified Radial Basis Function(RBF) neural network to improve the accuracy and recall rate for detection of duplicated records. Firstly, key fields of records are clustered by Density-Based Spatial Clustering of Applications with Noise(DBSCAN) and all records are classified to several classes which include duplicated records. Secondly, the similarity of corresponding fields of records in each class is computed using Jaro algorithm and duplicated records are labeled manually. Finally, Subtractive Clustering Method (SCM) and Particle Swarm Algorithm (PSO) are used to optimize the parameters of RBF neural network so that monitoring model of duplicated records is built. This method is tested with different datasets. The experimental results show that the accuracy and recall rate for the detection of duplicated records are improved significantly.
Author Li, Bo
Chen, Mingyao
Liu, Xinting
Cai, Xiaodong
Author_xml – sequence: 1
  givenname: Xinting
  surname: Liu
  fullname: Liu, Xinting
  organization: School of Computer and Information Security, Guilin University of Electronic Technology, China
– sequence: 2
  givenname: Xiaodong
  surname: Cai
  fullname: Cai, Xiaodong
  email: caixiaodong@guet.edu.cn
  organization: School of Computer and Information Security, Guilin University of Electronic Technology, China
– sequence: 3
  givenname: Bo
  surname: Li
  fullname: Li, Bo
  organization: School of Computer and Information Security, Guilin University of Electronic Technology, China
– sequence: 4
  givenname: Mingyao
  surname: Chen
  fullname: Chen, Mingyao
  organization: Guilin Topintelligent Communication Technology Co., Ltd, China
BookMark eNotj8FKAzEYhCPowVZfQC_7Arsmm-wm_811bbVQEKQHbyWb_IHgdrOkqeLbG2hP8zEww8yCXE9hQkIeGK0Yo_C06VZdX9WUyUrRRnDJr8iCiVZyBRK-bsnz62kevdEJbRHRhGgLiwlN8mEqBn3MdgZ_mGP4yfz5si4mPEU9Zkm_IX7fkRunxyPeX3RJduvVrn8vtx9vm77blh5oKkEISUEgQ0Ebi3wQDWWsRgWIBoQ1Ls8DpWvXUsUHZ5XLEQDJTN0OoPiSPJ5rPSLu5-gPOv7tL5_4Pyj9RWY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IAEAC.2017.8054373
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore digital library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 146738979X
9781467389792
EndPage 2037
ExternalDocumentID 8054373
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i90t-9447094e1e405de3b450112e89eec94dcf80598a2f6083bfd8f9449971c26b983
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:06 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-9447094e1e405de3b450112e89eec94dcf80598a2f6083bfd8f9449971c26b983
PageCount 4
ParticipantIDs ieee_primary_8054373
PublicationCentury 2000
PublicationDate 2017-March
PublicationDateYYYYMMDD 2017-03-01
PublicationDate_xml – month: 03
  year: 2017
  text: 2017-March
PublicationDecade 2010
PublicationTitle 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)
PublicationTitleAbbrev IAEAC
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.6622632
Snippet This paper presents a method based on modified Radial Basis Function(RBF) neural network to improve the accuracy and recall rate for detection of duplicated...
SourceID ieee
SourceType Publisher
StartPage 2034
SubjectTerms Classification algorithms
Clustering algorithms
Clustering methods
complex system Duplicated records
Computational modeling
Conferences
Neural networks
parameter optimizing
Particle swarm optimization
PSO
RBF neural network
SCM
Title Duplicated record detection based on improved RBF neural network
URI https://ieeexplore.ieee.org/document/8054373
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEB3anjyptOI3OXg02_3KJrlZa0sVKiIVeiubZBaKshXZvfjrTbLbiuLBU4aQkIQQ5mXy3gTgSoa5UjyJqWJC0jSPGJUcDQ05NwlGzEJ0J06eP2azl_RhyZYduN5pYRDRk88wcKZ_yzcbXbtQ2VBYfJHwpAtdLrJGq7XVwYRyeD-ajMaOrMWDtuGPH1O8w5juw3w7VMMTeQ3qSgX681cWxv_O5QAG39I88rRzOofQwbIPN3d18wyNhjRRF2Kw8iSrkjg_ZYg11j5-YO3n2ylxeSzzN1t4FvgAFtPJYjyj7dcIdC3Diso05fZehhFavGUwUSmz5zRGIRG1TI0u7PSkyOMisxBLFUYUtouUPNJxpqRIjqBXbko8BmIRiQq1O8fa5amJZSSw0CwXKlYomTmBvlv86r1JfrFq1336d_UZ7LkNaEha59CrPmq8sF67Upd-u74AAyqYVg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwIxEJ0gHvSkBozf7sGju-xHS9ubiBBQIMZgwo1s29mEYBZjdi_-etsuYDQePHXStGmbppnX6XtTgBsRplKyJPYl5cInaUR9wVD7IWM6wYgaiG7FyeNJe_BKHmd0VoPbrRYGER35DANrurd8vVKlDZW1uMEXCUt2YJcSQmil1tooYULRGnZ6na6la7Fg3fTHnynOZfQPYLwZrGKKLIOykIH6_JWH8b-zOYTmtzjPe966nSOoYd6Au4eyeohG7VVxF09j4WhWuWc9lfaMsXARBGO_3Pc9m8kyfTOF44E3YdrvTbsDf_05gr8QYeELQpi5mWGEBnFpTCSh5qTGyAWiEkSrzExP8DTO2gZkyUzzzHQRgkUqbkvBk2Oo56scT8AzmESGyp5kZTPVxCLimCmachlLFFSfQsMufv5epb-Yr9d99nf1NewNpuPRfDScPJ3Dvt2MirJ1AfXio8RL48MLeeW27gsvQpuj
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+IEEE+2nd+Advanced+Information+Technology%2C+Electronic+and+Automation+Control+Conference+%28IAEAC%29&rft.atitle=Duplicated+record+detection+based+on+improved+RBF+neural+network&rft.au=Liu%2C+Xinting&rft.au=Cai%2C+Xiaodong&rft.au=Li%2C+Bo&rft.au=Chen%2C+Mingyao&rft.date=2017-03-01&rft.pub=IEEE&rft.spage=2034&rft.epage=2037&rft_id=info:doi/10.1109%2FIAEAC.2017.8054373&rft.externalDocID=8054373