Approximate data instance matching: a survey
Approximate data matching is a central problem in several data management processes, such as data integration, data cleaning, approximate queries, similarity search and so on. An approximate matching process aims at defining whether two data represent the same real-world object. For atomic values (s...
        Saved in:
      
    
          | Published in | Knowledge and information systems Vol. 27; no. 1; pp. 1 - 21 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          Springer-Verlag
    
        01.04.2011
     Springer Springer Nature B.V  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0219-1377 0219-3116  | 
| DOI | 10.1007/s10115-010-0285-0 | 
Cover
| Summary: | Approximate data matching is a central problem in several data management processes, such as data integration, data cleaning, approximate queries, similarity search and so on. An approximate matching process aims at defining whether two data represent the same real-world object. For atomic values (strings, dates, etc), similarity functions have been defined for several value domains (person names, addresses, and so on). For matching aggregated values, such as relational tuples and XML trees, approaches alternate from the definition of simple functions that combine values of similarity of record attributes to sophisticated techniques based on machine learning, for example. For complex data comparison, including structured and semistructured documents, existing approaches use both structure and data for the comparison, by either considering or not considering data semantics. This survey presents terminology and concepts that base approximated data matching, as well as discusses related work on the use of similarity functions in such a subject. | 
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23  | 
| ISSN: | 0219-1377 0219-3116  | 
| DOI: | 10.1007/s10115-010-0285-0 |