Systematic review of data-centric approaches in artificial intelligence and machine learning

Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach...

Full description

Saved in:
Bibliographic Details
Published inData science and management Vol. 6; no. 3; pp. 144 - 157
Main Author Singh, Prerna
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.09.2023
KeAi Communications Co. Ltd
Subjects
Online AccessGet full text
ISSN2666-7649
2666-7649
DOI10.1016/j.dsm.2023.06.001

Cover

Abstract Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches.
AbstractList Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of AI-oriented structures. However, model-centric approaches are limited by the absence of high-quality data. Data-centric AI is an emerging approach for solving machine learning (ML) problems. It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline. However, data-centric AI approaches are not well documented. Researchers have conducted various experiments without a clear set of guidelines. This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems. These include big data quality assessment, data preprocessing, transfer learning, semi-supervised learning, machine ​learning ​operations (MLOps), and the effect of adding more data. In addition, it highlights recent data-centric techniques adopted by ML practitioners. We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them. Finally, we discuss the causes of technical debt in AI. Technical debt builds up when software design and implementation decisions run into “or outright collide with” business goals and timelines. This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches.
Author Singh, Prerna
Author_xml – sequence: 1
  givenname: Prerna
  orcidid: 0000-0003-2770-9493
  surname: Singh
  fullname: Singh, Prerna
  email: prernasingh7990@gmail.com
  organization: Wellington, New Zealand
BookMark eNp9kMtqHDEQRUVwILbjD8iuf6DbklpPsgomfoAhC9s7g1CXShMNPepBamL899ZkQghZeFUP6lyKc0ZO8pKRkC-MDowydbkdQt0NnPJxoGqglH0gp1wp1Wsl7Mk__SdyUeuWUsoNY1yqU_L88FpX3Pk1QVfwV8KXbold8KvvAfNa2trv92Xx8BNrl3Lny5piguTnNq04z2mDGbDzOXS7dpUydjP6klPefCYfo58rXvyp5-Tp-vvj1W1__-Pm7urbfQ-jNqy3MmrFxaQtszpaFgLXMCmDHJBaqYxhI0Q9eUOFBqBRTWIyQYRJShrYOJ6Tu2NuWPzW7Uva-fLqFp_c78VSNu7wNszoDCo7ylFMUQhhAlgbONIohZLWShAtix2zoCy1Fox_8xh1B9tu65ptd7DtqHLNdmP0fwyktSldmkCf5nfJr0cSm55mv7gK6eAzpIKwtv_TO_QbWneb-w
CitedBy_id crossref_primary_10_1038_s41386_023_01724_y
crossref_primary_10_1109_ACCESS_2024_3487851
crossref_primary_10_1016_j_identj_2024_06_017
crossref_primary_10_1002_sres_3124
crossref_primary_10_1016_j_watres_2024_121999
crossref_primary_10_1109_TCSS_2023_3344597
crossref_primary_10_3390_s24237697
crossref_primary_10_1142_S0218126625501889
crossref_primary_10_1016_j_xphs_2024_09_015
crossref_primary_10_1016_j_dsm_2024_12_003
crossref_primary_10_3390_computers14020032
crossref_primary_10_1016_j_ins_2024_121610
crossref_primary_10_1109_TCE_2024_3361037
crossref_primary_10_3390_f14091782
crossref_primary_10_1016_j_addma_2024_104013
crossref_primary_10_26833_ijeg_1394111
crossref_primary_10_1109_TMLCN_2025_3533427
crossref_primary_10_3390_electronics13234763
crossref_primary_10_1016_j_apenergy_2024_123081
crossref_primary_10_1145_3711118
crossref_primary_10_1111_cas_16330
crossref_primary_10_1016_j_aei_2024_102520
crossref_primary_10_1080_00207543_2024_2316279
crossref_primary_10_3390_app14072909
crossref_primary_10_1038_s41598_024_73643_x
Cites_doi 10.1109/TKDE.2019.2946162
10.1109/TCSVT.2020.2967419
10.1038/d41586-018-03071-1
10.1162/tacl_a_00328
10.1016/j.jnca.2016.04.008
10.1016/j.bushor.2019.02.001
10.32604/cmc.2023.035894
10.14445/22312803/IJCTT-V38P109
10.1007/s43762-022-00034-1
10.23919/JCC.2019.11.015
10.1016/j.inffus.2019.12.001
10.1109/TSG.2019.2892595
10.1016/j.ijinfomgt.2014.10.007
10.1109/JIOT.2020.3035035
10.1136/bmjopen-2021-054186
10.1093/jamia/ocz162
10.1002/int.22725
10.1145/3299887.3299891
10.1016/j.ipm.2018.01.010
ContentType Journal Article
Copyright 2023 Xi’an Jiaotong University
Copyright_xml – notice: 2023 Xi’an Jiaotong University
DBID 6I.
AAFTH
AAYXX
CITATION
DOA
DOI 10.1016/j.dsm.2023.06.001
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DOAJ (Directory of Open Access Journals)
DatabaseTitle CrossRef
DatabaseTitleList

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2666-7649
EndPage 157
ExternalDocumentID oai_doaj_org_article_8e693534bf4448dc99d2e0f5465995c4
10_1016_j_dsm_2023_06_001
S2666764923000279
GroupedDBID 6I.
AAEDW
AAFTH
AAXUO
AEXQZ
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
EBS
FDB
GROUPED_DOAJ
M~E
OK1
ROL
0R~
AALRI
AAYWO
AAYXX
ACVFH
ADCNI
ADVLN
AEUPX
AFPUW
AIGII
AITUG
AKBMS
AKRWK
AKYEP
CITATION
ID FETCH-LOGICAL-c3781-95f7624b79197f91dd27cb68e2ce09568813cf7ba8047cc0f6b4b8d4db550d133
IEDL.DBID DOA
ISSN 2666-7649
IngestDate Wed Aug 27 01:30:26 EDT 2025
Thu Apr 24 22:54:44 EDT 2025
Tue Jul 01 01:06:46 EDT 2025
Fri Feb 23 02:36:31 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords MLOps
Data management
Semi-supervised learning
Data preprocessing
Machine learning
Technical debt
Data-centric
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3781-95f7624b79197f91dd27cb68e2ce09568813cf7ba8047cc0f6b4b8d4db550d133
ORCID 0000-0003-2770-9493
OpenAccessLink https://doaj.org/article/8e693534bf4448dc99d2e0f5465995c4
PageCount 14
ParticipantIDs doaj_primary_oai_doaj_org_article_8e693534bf4448dc99d2e0f5465995c4
crossref_primary_10_1016_j_dsm_2023_06_001
crossref_citationtrail_10_1016_j_dsm_2023_06_001
elsevier_sciencedirect_doi_10_1016_j_dsm_2023_06_001
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-09-01
PublicationDateYYYYMMDD 2023-09-01
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-09-01
  day: 01
PublicationDecade 2020
PublicationTitle Data science and management
PublicationYear 2023
Publisher Elsevier B.V
KeAi Communications Co. Ltd
Publisher_xml – name: Elsevier B.V
– name: KeAi Communications Co. Ltd
References Caruccio, Cirillo, Deufemia (bib10) 2021
Punmiya, Choe (bib51) 2019; 10
Yang, Ke, Cui (bib75) 2022; 37
Nakkiran, Kaplun, Bansal (bib45) 2021; 2021
Chen, Chow, Davidson (bib13) 2020
Zhang, Zhao, Pfoser (bib80) 2022
Chen, Su, Chuang (bib15) 2022
Nguyen, Chang (bib46) 2021
Kumar, Dabas, Hooda (bib33) 2020; 12 (Feb.)
Ruder (bib57) 2017
Roh, Heo, Whang (bib56) 2019; 33
Murphy (bib44) 2019
Granlund, Kopponen, Stirbu (bib25) 2021
Sanjeeva (bib59) 2018; 3 (Sep.)
Schiermeier (bib62) 2018; 555
Yoon, Yoo, Seo (bib76) 2022
Meng, Wu, Liu (bib40) 2020
Chakraborty, Krishna (bib11) 2014
Miranda (bib43) 2021
Saggi, Jain (bib58) 2018; 54
Shao, Chen, Zeng (bib64) 2019; 16
Sundarraj (bib68) 2022
Panimalar, Shree, Kathrine (bib49) 2017; 4
Polyzotis, Roy, Whang (bib50) 2018; 47
Ramponi, Plank (bib53) 2020
Zhang, Wang, Liu (bib79) 2020; 31
Ahmad, Maabreh, Ghaly (bib2) 2022; 43 (Feb.)
Cooney, Wan, O’Donncha (bib16) 2021; 23 (3)
Min, Y., Chen, L., Karbasi, A., 2021. The curious case of adversarially robust models: more data can help, double descend, or hurt generalization. In: Uncertainty in Artificial Intell. PMLR, pp. 129–139.
Eberendu (bib21) 2016; 38
Gandomi, Haider (bib23) 2015; 35
Li, Xie, Wang (bib36) 2020
Mäkinen, Skogström, Laaksonen (bib38) 2021
Wadekar, Schwartz, Kannan (bib73) 2021
Ben-David, Rabinovitz, Reichart (bib5) 2020; 8 (Jul.)
Taleb, Serhani (bib70) 2017
Abhishek, ud din Tahir (bib1) 2023; 75
Taleb, Serhani, Dssouli (bib71) 2018
Chen, Gao, Wang (bib14) 2020
Sarker, DeRoos, Perrone (bib60) 2020; 27
Gordon, Fennessy, Varma (bib24) 2022; 12
Jiang, Crooks, Kavak (bib30) 2022; 2
Oussous, Benjelloun, Ait (bib48) 2017; 30
Sculley (bib63) 2022
Meng, Jing, Yan (bib41) 2020; 57
Bérard, Kim, Nikoulina (bib6) 2020
Lee, Alzamil, Doskenov (bib35) 2021
Alzahrani, Al-Nuaimy, Al-Bander (bib3) 2019
Sidiropoulos, Voskarides, Kanoulas (bib67) 2020
Huang, Wang, Yong (bib29) 2019
Fursin, Guillou, Essayan (bib22) 2020
Wang, Derr, Ma (bib74) 2020
Bossér, Sörstadius, Chehreghani (bib9) 2020
Tian, Wang, Zhou (bib28) 2018
Kim, Jin, Kavak (bib32) 2020
Bogner, Verdecchia, Gerostathopoulos (bib8) 2021
Zhang, Cao, Wu (bib78) 2020
Zhang, Yan (bib77) 2020
Rekatsinas, Chu, Ilyas (bib54) 2017
Dou, Hu, Anastasopoulos (bib20) 2019
Anik, Bunt (bib4) 2021
Bifulco, Cirillo, Esposito (bib7) 2021; 184 (1)
Noorbehbahani, Saberi (bib47) 2020
Mansourifar, Chen, Shi (bib39) 2019
Lwakatare, Raj, Crnkovic (bib37) 2020; 127 (Nov.)
Schelter, Boese, Kirschnick (bib61) 2017
Sharma, Liu (bib65) 2020; 8
Tabesh, Mousavidin, Hasani (bib69) 2019; 62
Trivedi, Patel, Faruqui (bib72) 2023
Czakon (bib18) 2020
Han, Eisenstein (bib27) 2019
Siddiqa, Hashem, Yaqoob (bib66) 2016; 71 (Aug.)
Lee, Im, Shim (bib34) 2019
Chao, Yue, Guanghan (bib12) 2021
Crawshaw (bib17) 2020
Dilmegani (bib19) 2021
Juneja, Das (bib31) 2019
Renggli, Rimanic, Gürel (bib55) 2021
Gururangan, Marasović, Swayamdipta (bib26) 2020
Quan, Wang, Yan (bib52) 2020; 35
Zhou, Yu, Ding (bib81) 2020
Granlund (10.1016/j.dsm.2023.06.001_bib25) 2021
Nakkiran (10.1016/j.dsm.2023.06.001_bib45) 2021; 2021
Meng (10.1016/j.dsm.2023.06.001_bib40) 2020
10.1016/j.dsm.2023.06.001_bib42
Tian (10.1016/j.dsm.2023.06.001_bib28) 2018
Chen (10.1016/j.dsm.2023.06.001_bib14) 2020
Yoon (10.1016/j.dsm.2023.06.001_bib76) 2022
Chao (10.1016/j.dsm.2023.06.001_bib12) 2021
Cooney (10.1016/j.dsm.2023.06.001_bib16) 2021; 23 (3)
Dou (10.1016/j.dsm.2023.06.001_bib20) 2019
Sharma (10.1016/j.dsm.2023.06.001_bib65) 2020; 8
Chakraborty (10.1016/j.dsm.2023.06.001_bib11) 2014
Eberendu (10.1016/j.dsm.2023.06.001_bib21) 2016; 38
Punmiya (10.1016/j.dsm.2023.06.001_bib51) 2019; 10
Kim (10.1016/j.dsm.2023.06.001_bib32) 2020
Wang (10.1016/j.dsm.2023.06.001_bib74) 2020
Gordon (10.1016/j.dsm.2023.06.001_bib24) 2022; 12
Shao (10.1016/j.dsm.2023.06.001_bib64) 2019; 16
Yang (10.1016/j.dsm.2023.06.001_bib75) 2022; 37
Alzahrani (10.1016/j.dsm.2023.06.001_bib3) 2019
Sculley (10.1016/j.dsm.2023.06.001_bib63)
Ramponi (10.1016/j.dsm.2023.06.001_bib53) 2020
Zhang (10.1016/j.dsm.2023.06.001_bib78) 2020
Huang (10.1016/j.dsm.2023.06.001_bib29) 2019
Anik (10.1016/j.dsm.2023.06.001_bib4) 2021
Lee (10.1016/j.dsm.2023.06.001_bib35) 2021
Ahmad (10.1016/j.dsm.2023.06.001_bib2) 2022; 43 (Feb.)
Caruccio (10.1016/j.dsm.2023.06.001_bib10) 2021
Fursin (10.1016/j.dsm.2023.06.001_bib22) 2020
Gururangan (10.1016/j.dsm.2023.06.001_bib26) 2020
Schelter (10.1016/j.dsm.2023.06.001_bib61)
Zhang (10.1016/j.dsm.2023.06.001_bib79) 2020; 31
Chen (10.1016/j.dsm.2023.06.001_bib15) 2022
Crawshaw (10.1016/j.dsm.2023.06.001_bib17) 2020
Schiermeier (10.1016/j.dsm.2023.06.001_bib62) 2018; 555
Li (10.1016/j.dsm.2023.06.001_bib36) 2020
Quan (10.1016/j.dsm.2023.06.001_bib52) 2020; 35
Sidiropoulos (10.1016/j.dsm.2023.06.001_bib67) 2020
Noorbehbahani (10.1016/j.dsm.2023.06.001_bib47) 2020
Saggi (10.1016/j.dsm.2023.06.001_bib58) 2018; 54
Taleb (10.1016/j.dsm.2023.06.001_bib71) 2018
Sanjeeva (10.1016/j.dsm.2023.06.001_bib59) 2018; 3 (Sep.)
Roh (10.1016/j.dsm.2023.06.001_bib56) 2019; 33
Zhou (10.1016/j.dsm.2023.06.001_bib81) 2020
Panimalar (10.1016/j.dsm.2023.06.001_bib49) 2017; 4
Bogner (10.1016/j.dsm.2023.06.001_bib8) 2021
Bossér (10.1016/j.dsm.2023.06.001_bib9) 2020
Tabesh (10.1016/j.dsm.2023.06.001_bib69) 2019; 62
Meng (10.1016/j.dsm.2023.06.001_bib41) 2020; 57
Kumar (10.1016/j.dsm.2023.06.001_bib33) 2020; 12 (Feb.)
Renggli (10.1016/j.dsm.2023.06.001_bib55) 2021
Zhang (10.1016/j.dsm.2023.06.001_bib80) 2022
Bifulco (10.1016/j.dsm.2023.06.001_bib7) 2021; 184 (1)
Polyzotis (10.1016/j.dsm.2023.06.001_bib50) 2018; 47
Ruder (10.1016/j.dsm.2023.06.001_bib57) 2017
Gandomi (10.1016/j.dsm.2023.06.001_bib23) 2015; 35
Chen (10.1016/j.dsm.2023.06.001_bib13) 2020
Juneja (10.1016/j.dsm.2023.06.001_bib31) 2019
Abhishek (10.1016/j.dsm.2023.06.001_bib1) 2023; 75
Miranda (10.1016/j.dsm.2023.06.001_bib43)
Nguyen (10.1016/j.dsm.2023.06.001_bib46) 2021
Trivedi (10.1016/j.dsm.2023.06.001_bib72) 2023
Wadekar (10.1016/j.dsm.2023.06.001_bib73) 2021
Mansourifar (10.1016/j.dsm.2023.06.001_bib39) 2019
Lee (10.1016/j.dsm.2023.06.001_bib34) 2019
Taleb (10.1016/j.dsm.2023.06.001_bib70) 2017
Rekatsinas (10.1016/j.dsm.2023.06.001_bib54) 2017
Jiang (10.1016/j.dsm.2023.06.001_bib30) 2022; 2
Ben-David (10.1016/j.dsm.2023.06.001_bib5) 2020; 8 (Jul.)
Dilmegani (10.1016/j.dsm.2023.06.001_bib19)
Mäkinen (10.1016/j.dsm.2023.06.001_bib38) 2021
Siddiqa (10.1016/j.dsm.2023.06.001_bib66) 2016; 71 (Aug.)
Oussous (10.1016/j.dsm.2023.06.001_bib48) 2017; 30
Bérard (10.1016/j.dsm.2023.06.001_bib6) 2020
Han (10.1016/j.dsm.2023.06.001_bib27) 2019
Zhang (10.1016/j.dsm.2023.06.001_bib77) 2020
Lwakatare (10.1016/j.dsm.2023.06.001_bib37) 2020; 127 (Nov.)
Murphy (10.1016/j.dsm.2023.06.001_bib44)
Sarker (10.1016/j.dsm.2023.06.001_bib60) 2020; 27
Sundarraj (10.1016/j.dsm.2023.06.001_bib68)
Czakon (10.1016/j.dsm.2023.06.001_bib18)
References_xml – start-page: 204
  year: 2022
  end-page: 216
  ident: bib76
  publication-title: Data-centric and model-centric approaches for biomedical question answering. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction
– volume: 127 (Nov.)
  year: 2020
  ident: bib37
  article-title: Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions
  publication-title: Info. and soft. tech.
– year: 2021
  ident: bib35
  article-title: A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance.
– year: 2022
  ident: bib63
  article-title: Data in deployment. Data-centric AI resource hub
– volume: 71 (Aug.)
  start-page: 151
  year: 2016
  end-page: 166
  ident: bib66
  article-title: A survey of big data management: taxonomy and state-of-the-art
  publication-title: J. Netw. Comput. Appl.
– start-page: 109
  year: 2021
  end-page: 112
  ident: bib38
  article-title: Who needs MLOps: what data scientists seek to accomplish and how can MLOps help?
  publication-title: 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN)
– volume: 35
  start-page: 137
  year: 2015
  end-page: 144
  ident: bib23
  article-title: Beyond the hype: big data concepts, methods, and analytics
  publication-title: Int. J. Info Manage.
– year: 2019
  ident: bib44
  article-title: Big data vs good data [internet]. PLANERGY software
– start-page: 1316
  year: 2020
  end-page: 1321
  ident: bib74
  article-title: Learning from incomplete labeled data via adversarial data generation
  publication-title: 2020 IEEE International Conference on Data Mining (ICDM)
– volume: 33
  start-page: 1328
  year: 2019
  end-page: 1347
  ident: bib56
  article-title: A survey on data collection for machine learning: a big data-ai integration perspective
  publication-title: IEEE Trans. Knowl. Data Eng.
– year: 2020
  ident: bib36
  article-title: Provable more data hurt in high dimensional least squares estimator.
– start-page: 563
  year: 2019
  end-page: 566
  ident: bib29
  article-title: A feature enginering framework for short-term earthquake prediction based on AETA data
  publication-title: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC)
– start-page: 206
  year: 2020
  end-page: 210
  ident: bib77
  article-title: Semi-supervised active learning image classification method based on Tri-Training algorithm
  publication-title: 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS)
– reference: Min, Y., Chen, L., Karbasi, A., 2021. The curious case of adversarially robust models: more data can help, double descend, or hurt generalization. In: Uncertainty in Artificial Intell. PMLR, pp. 129–139.
– start-page: 248
  year: 2018
  end-page: 252
  ident: bib28
  article-title: Data quality assessment for on-line monitoring and measuring system of power quality based on big data and data provenance theory
  publication-title: 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis
– volume: 38
  start-page: 46
  year: 2016
  end-page: 50
  ident: bib21
  article-title: Unstructured data: an overview of the data of big data
  publication-title: Int. J. Comput. Trends Technol.
– volume: 62
  start-page: 347
  year: 2019
  end-page: 358
  ident: bib69
  article-title: Implementing big data strategies: a managerial perspective
  publication-title: Bus. Horiz.
– volume: 43 (Feb.)
  year: 2022
  ident: bib2
  article-title: Developing future human-centered smart cities: critical analysis of smart city security, data management, and ethical challenges
  publication-title: Comp. Sci. Review
– volume: 16
  start-page: 183
  year: 2019
  end-page: 200
  ident: bib64
  article-title: Labeling malicious communication samples based on semi-supervised deep neural network
  publication-title: China Commun.
– volume: 10
  start-page: 2326
  year: 2019
  end-page: 2329
  ident: bib51
  article-title: Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing
  publication-title: IEEE Trans. Smart Grid
– start-page: 400
  year: 2021
  end-page: 409
  ident: bib10
  publication-title: Efficient discovery of functional dependencies from incremental databases
– year: 2017
  ident: bib61
  article-title: Automatically tracking metadata and provenance of machine learning experiments
– start-page: 158
  year: 2020
  end-page: 167
  ident: bib32
  article-title: Location-based social network data generation based on patterns of life
  publication-title: 2020 21st IEEE International Conference on Mobile Data Management (MDM)
– year: 2020
  ident: bib9
  article-title: Model-centric and data-centric aspects of active learning for neural network models.
– start-page: 1288
  year: 2014
  end-page: 2014
  ident: bib11
  article-title: Analysis of unstructured data: applications of text analytics and sentiment mining
  publication-title: SAS Global Forum
– volume: 35
  year: 2020
  ident: bib52
  article-title: Learn with diversity and from harder samples: improving the generalization of CNN-Based detection of computer-generated images
  publication-title: Forensic Sci. Int.: Digit. Invest.
– start-page: 498
  year: 2017
  end-page: 501
  ident: bib70
  article-title: Big data pre-processing: closing the data quality enforcement loop
  publication-title: 2017 IEEE International Congress on Big Data
– volume: 23 (3)
  year: 2021
  ident: bib16
  article-title: Designing environmentally efficient aquafeeds through the use of multicriteria decision support tools
  publication-title: Curr. Opinion Environ. Sci. Health
– volume: 2
  start-page: 7
  year: 2022
  ident: bib30
  article-title: A method to create a synthetic population with social networks for geographically-explicit agent-based models
  publication-title: Comp. Urban Sci.
– volume: 37
  start-page: 4437
  year: 2022
  end-page: 4470
  ident: bib75
  article-title: Toward a real-time Smart Parking Data Management and Prediction (SPDMP) system by attributes representation learning
  publication-title: Int. J. Intell. Syst.
– year: 2019
  ident: bib27
  article-title: Unsupervised domain adaptation of contextualized embeddings for sequence labeling.
– start-page: 366
  year: 2023
  end-page: 378
  ident: bib72
  publication-title: Human interaction and classification via K-ary tree hashing over body pose attributes using sports data
– start-page: 101
  year: 2022
  end-page: 108
  ident: bib15
  publication-title: Apache submarine: a unified machine learning platform made simple
– volume: 12
  year: 2022
  ident: bib24
  article-title: Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
  publication-title: BMJ Open
– volume: 184 (1)
  year: 2021
  ident: bib7
  article-title: An intelligent system for focused crawling from Big Data sources
  publication-title: Expert Syst. Appl.
– year: 2019
  ident: bib20
  article-title: Unsupervised domain adaptation for neural machine translation with domain-aware feature embeddings.
– year: 2017
  ident: bib57
  article-title: An overview of multi-task learning in deep neural networks.
– volume: 4
  start-page: 329
  year: 2017
  end-page: 333
  ident: bib49
  article-title: The 17 V’s of big data
  publication-title: Inter. Res. J. Eng. Tech.
– volume: 31
  start-page: 15
  year: 2020
  end-page: 28
  ident: bib79
  article-title: Deep adversarial data augmentation for extremely low data regimes
  publication-title: IEEE Trans. Circ. Syst. Video Technol.
– start-page: 1478
  year: 2019
  end-page: 1487
  ident: bib39
  article-title: Virtual big data for GAN based data augmentation
  publication-title: 2019 IEEE International Conference on Big Data
– year: 2022
  ident: bib68
  article-title: Data management: how to stay on top of your customer’s mind?
– volume: 8 (Jul.)
  start-page: 504
  year: 2020
  end-page: 521
  ident: bib5
  article-title: PERL: pivot-based domain adaptation for pre-trained deep contextualized embedding models
  publication-title: Trans. Assoc. Comp. Linguistics
– volume: 57
  start-page: 115
  year: 2020
  end-page: 129
  ident: bib41
  article-title: A survey on machine learning for data fusion
  publication-title: Inf. Fusion
– start-page: 1676
  year: 2020
  end-page: 1680
  ident: bib78
  article-title: Circular shift: an effective data augmentation method for convolutional neural network on image classification
  publication-title: 2020 IEEE International Conference on Image Processing (ICIP)
– volume: 75
  start-page: 1391
  year: 2023
  end-page: 1409
  ident: bib1
  article-title: Human verification over activity analysis via deep data mining
  publication-title: Comput. Mater. Continua (CMC)
– volume: 30
  start-page: 431
  year: 2017
  end-page: 448
  ident: bib48
  article-title: Big data technologies: a survey. Journal of King Saud University–Comput
  publication-title: Info. Sci.
– year: 2021
  ident: bib19
  article-title: MLOps tools & platforms landscape: in-depth guide for 2022
– volume: 8
  start-page: 4991
  year: 2020
  end-page: 4999
  ident: bib65
  article-title: A machine-learning-based data-centric misbehavior detection model for internet of vehicles
  publication-title: IEEE Internet Things J.
– start-page: 1393
  year: 2020
  end-page: 1396
  ident: bib40
  article-title: Semi-supervised deep learning seismic impedance inversion using generative adversarial networks
  publication-title: IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium
– year: 2021
  ident: bib43
  article-title: Towards data-centric machine learning: a short review
– start-page: 64
  year: 2021
  end-page: 73
  ident: bib8
  article-title: Characterizing technical debt and antipatterns in AI-based systems: a systematic mapping study
  publication-title: 2021 IEEE/ACM International Conference on Technical Debt (TechDebt)
– volume: 27
  start-page: 315
  year: 2020
  end-page: 329
  ident: bib60
  article-title: Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework
  publication-title: J. Am. Med. Inf. Assoc.
– year: 2017
  ident: bib54
  article-title: Holoclean: Holistic data repairs with probabilistic inference.
– year: 2020
  ident: bib22
  article-title: CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking.
– start-page: 559
  year: 2019
  end-page: 563
  ident: bib31
  article-title: Big data quality framework: pre-processing data in weather monitoring application
  publication-title: 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing
– year: 2021
  ident: bib46
  article-title: COVID-19 pneumonia severity prediction using hybrid convolution-attention neural architectures.
– volume: 2021
  year: 2021
  ident: bib45
  article-title: Deep double descent: where bigger models and more data hurt
  publication-title: J. Stat. Mech. Theor. Exp.
– start-page: 69
  year: 2018
  end-page: 74
  ident: bib71
  article-title: Big data quality assessment model for unstructured data
  publication-title: 2018 International Conference on Innovations in Information Technology (IIT)
– start-page: 237
  year: 2020
  end-page: 241
  ident: bib14
  article-title: Cervical cancer single cell image data augmentation using residual condition generative adversarial networks
  publication-title: 2020 3rd International Conference on Artificial Intelligence and Big Data
– start-page: 82
  year: 2021
  end-page: 88
  ident: bib25
  article-title: MLOps challenges in multi-organization setup: experiences from two real-world cases
  publication-title: 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI
– start-page: 24
  year: 2020
  end-page: 29
  ident: bib47
  article-title: Ransomware detection with semi-supervised learning
  publication-title: 2020 10th International Conference on Computer and Knowledge Engineering
– volume: 47
  start-page: 17
  year: 2018
  end-page: 28
  ident: bib50
  article-title: Data lifecycle challenges in production machine learning: a survey
  publication-title: ACM SIGMOD Rec
– volume: 54
  start-page: 758
  year: 2018
  end-page: 790
  ident: bib58
  article-title: A survey towards an integration of big data analytics to big insights for value-creation
  publication-title: Inf. Process. Manag.
– year: 2020
  ident: bib18
  article-title: ML experiment tracking: what it is, why it matters, and how to implement it
– start-page: 1
  year: 2020
  end-page: 4
  ident: bib13
  article-title: Developments in mlflow: a system to accelerate the machine learning lifecycle
  publication-title: Proceedings of the Fourth International Workshop on Data Management for End-To-End Machine Learning
– year: 2020
  ident: bib6
  article-title: A multilingual neural machine translation model for biomedical data.
– year: 2020
  ident: bib26
  article-title: Don’t stop pretraining: adapt language models to domains and tasks.
– start-page: 1
  year: 2019
  end-page: 4
  ident: bib34
  article-title: Data labeling research for deep learning based fire detection system
  publication-title: 2019 International Conference on Systems of Collaboration Big Data
– start-page: 1
  year: 2019
  end-page: 4
  ident: bib3
  article-title: Hybrid feature learning and engineering based approach for face shape classification
  publication-title: 2019 International Conference on Intelligent Systems and Advanced Computing Sciences (ISACS)
– year: 2020
  ident: bib17
  article-title: Multi-task learning with deep neural networks: a Survey.
– volume: 12 (Feb.)
  start-page: 1159
  year: 2020
  end-page: 1169
  ident: bib33
  article-title: Text classification algorithms for mining unstructured data: a SWOT analysis
  publication-title: Int. J. Inf. Technol.
– volume: 555
  start-page: 403
  year: 2018
  end-page: 406
  ident: bib62
  article-title: Data management made simple
  publication-title: Nature
– start-page: 494
  year: 2020
  end-page: 500
  ident: bib81
  article-title: Towards mlops: a case study of ml pipeline platform
  publication-title: 2020 International Conference on Artificial Intelligence and Computer Engineering
– year: 2021
  ident: bib73
  article-title: Towards end-to-end deep learning for autonomous racing: on data collection and a unified architecture for steering and throttle prediction.
– year: 2020
  ident: bib53
  article-title: Neural unsupervised domain adaptation in NLP–a survey.
– start-page: 1
  year: 2021
  end-page: 13
  ident: bib4
  publication-title: Data-centric explanations: explaining training data of machine learning systems to promote transparency
– year: 2020
  ident: bib67
  article-title: Knowledge graph simple question answering for unseen domains.
– year: 2021
  ident: bib55
  article-title: A data quality-driven view of mlops.
– start-page: 90
  year: 2021
  end-page: 94
  ident: bib12
  article-title: Pseudo-label generation method based on wind turbine SCADA data
  publication-title: 2021 7th International Conference on Condition Monitoring of Machinery in Non-stationary Operations
– start-page: 1
  year: 2022
  end-page: 12
  ident: bib80
  publication-title: Factorized deep generative models for end-to-end trajectory generation with spatiotemporal validity constraints
– volume: 3 (Sep.)
  start-page: 5314
  year: 2018
  ident: bib59
  article-title: Research data management: a new role for academic/research librarians
  publication-title: Inter. Res. J.
– start-page: 559
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib31
  article-title: Big data quality framework: pre-processing data in weather monitoring application
– volume: 4
  start-page: 329
  issue: 9
  year: 2017
  ident: 10.1016/j.dsm.2023.06.001_bib49
  article-title: The 17 V’s of big data
  publication-title: Inter. Res. J. Eng. Tech.
– volume: 33
  start-page: 1328
  issue: 4
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib56
  article-title: A survey on data collection for machine learning: a big data-ai integration perspective
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2019.2946162
– start-page: 204
  year: 2022
  ident: 10.1016/j.dsm.2023.06.001_bib76
– volume: 2021
  issue: 12
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib45
  article-title: Deep double descent: where bigger models and more data hurt
  publication-title: J. Stat. Mech. Theor. Exp.
– volume: 31
  start-page: 15
  issue: 1
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib79
  article-title: Deep adversarial data augmentation for extremely low data regimes
  publication-title: IEEE Trans. Circ. Syst. Video Technol.
  doi: 10.1109/TCSVT.2020.2967419
– start-page: 82
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib25
  article-title: MLOps challenges in multi-organization setup: experiences from two real-world cases
– volume: 30
  start-page: 431
  issue: 4
  year: 2017
  ident: 10.1016/j.dsm.2023.06.001_bib48
  article-title: Big data technologies: a survey. Journal of King Saud University–Comput
  publication-title: Info. Sci.
– year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib73
– year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib20
– volume: 12 (Feb.)
  start-page: 1159
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib33
  article-title: Text classification algorithms for mining unstructured data: a SWOT analysis
  publication-title: Int. J. Inf. Technol.
– start-page: 494
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib81
  article-title: Towards mlops: a case study of ml pipeline platform
– year: 2017
  ident: 10.1016/j.dsm.2023.06.001_bib54
– ident: 10.1016/j.dsm.2023.06.001_bib43
– volume: 555
  start-page: 403
  issue: 7696
  year: 2018
  ident: 10.1016/j.dsm.2023.06.001_bib62
  article-title: Data management made simple
  publication-title: Nature
  doi: 10.1038/d41586-018-03071-1
– volume: 43 (Feb.)
  year: 2022
  ident: 10.1016/j.dsm.2023.06.001_bib2
  article-title: Developing future human-centered smart cities: critical analysis of smart city security, data management, and ethical challenges
  publication-title: Comp. Sci. Review
– volume: 8 (Jul.)
  start-page: 504
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib5
  article-title: PERL: pivot-based domain adaptation for pre-trained deep contextualized embedding models
  publication-title: Trans. Assoc. Comp. Linguistics
  doi: 10.1162/tacl_a_00328
– start-page: 1478
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib39
  article-title: Virtual big data for GAN based data augmentation
– year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib46
– start-page: 90
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib12
  article-title: Pseudo-label generation method based on wind turbine SCADA data
– year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib55
– start-page: 1316
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib74
  article-title: Learning from incomplete labeled data via adversarial data generation
– start-page: 109
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib38
  article-title: Who needs MLOps: what data scientists seek to accomplish and how can MLOps help?
– year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib35
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib22
– start-page: 24
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib47
  article-title: Ransomware detection with semi-supervised learning
– start-page: 366
  year: 2023
  ident: 10.1016/j.dsm.2023.06.001_bib72
– start-page: 248
  year: 2018
  ident: 10.1016/j.dsm.2023.06.001_bib28
  article-title: Data quality assessment for on-line monitoring and measuring system of power quality based on big data and data provenance theory
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib9
– volume: 71 (Aug.)
  start-page: 151
  year: 2016
  ident: 10.1016/j.dsm.2023.06.001_bib66
  article-title: A survey of big data management: taxonomy and state-of-the-art
  publication-title: J. Netw. Comput. Appl.
  doi: 10.1016/j.jnca.2016.04.008
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib67
– year: 2017
  ident: 10.1016/j.dsm.2023.06.001_bib57
– volume: 184 (1)
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib7
  article-title: An intelligent system for focused crawling from Big Data sources
  publication-title: Expert Syst. Appl.
– ident: 10.1016/j.dsm.2023.06.001_bib18
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib53
– ident: 10.1016/j.dsm.2023.06.001_bib42
– volume: 62
  start-page: 347
  issue: 3
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib69
  article-title: Implementing big data strategies: a managerial perspective
  publication-title: Bus. Horiz.
  doi: 10.1016/j.bushor.2019.02.001
– volume: 75
  start-page: 1391
  issue: 1
  year: 2023
  ident: 10.1016/j.dsm.2023.06.001_bib1
  article-title: Human verification over activity analysis via deep data mining
  publication-title: Comput. Mater. Continua (CMC)
  doi: 10.32604/cmc.2023.035894
– volume: 38
  start-page: 46
  issue: 1
  year: 2016
  ident: 10.1016/j.dsm.2023.06.001_bib21
  article-title: Unstructured data: an overview of the data of big data
  publication-title: Int. J. Comput. Trends Technol.
  doi: 10.14445/22312803/IJCTT-V38P109
– start-page: 400
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib10
– start-page: 1
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib4
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib6
– start-page: 1288
  year: 2014
  ident: 10.1016/j.dsm.2023.06.001_bib11
  article-title: Analysis of unstructured data: applications of text analytics and sentiment mining
– volume: 2
  start-page: 7
  issue: 1
  year: 2022
  ident: 10.1016/j.dsm.2023.06.001_bib30
  article-title: A method to create a synthetic population with social networks for geographically-explicit agent-based models
  publication-title: Comp. Urban Sci.
  doi: 10.1007/s43762-022-00034-1
– volume: 35
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib52
  article-title: Learn with diversity and from harder samples: improving the generalization of CNN-Based detection of computer-generated images
  publication-title: Forensic Sci. Int.: Digit. Invest.
– volume: 127 (Nov.)
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib37
  article-title: Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions
  publication-title: Info. and soft. tech.
– start-page: 206
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib77
  article-title: Semi-supervised active learning image classification method based on Tri-Training algorithm
– start-page: 1676
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib78
  article-title: Circular shift: an effective data augmentation method for convolutional neural network on image classification
– start-page: 158
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib32
  article-title: Location-based social network data generation based on patterns of life
– volume: 16
  start-page: 183
  issue: 11
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib64
  article-title: Labeling malicious communication samples based on semi-supervised deep neural network
  publication-title: China Commun.
  doi: 10.23919/JCC.2019.11.015
– volume: 57
  start-page: 115
  issue: May
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib41
  article-title: A survey on machine learning for data fusion
  publication-title: Inf. Fusion
  doi: 10.1016/j.inffus.2019.12.001
– start-page: 1
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib13
  article-title: Developments in mlflow: a system to accelerate the machine learning lifecycle
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib36
– start-page: 237
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib14
  article-title: Cervical cancer single cell image data augmentation using residual condition generative adversarial networks
– ident: 10.1016/j.dsm.2023.06.001_bib63
– volume: 23 (3)
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib16
  article-title: Designing environmentally efficient aquafeeds through the use of multicriteria decision support tools
  publication-title: Curr. Opinion Environ. Sci. Health
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib26
– volume: 10
  start-page: 2326
  issue: 2
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib51
  article-title: Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing
  publication-title: IEEE Trans. Smart Grid
  doi: 10.1109/TSG.2019.2892595
– ident: 10.1016/j.dsm.2023.06.001_bib44
– year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib17
– ident: 10.1016/j.dsm.2023.06.001_bib19
– volume: 35
  start-page: 137
  issue: 2
  year: 2015
  ident: 10.1016/j.dsm.2023.06.001_bib23
  article-title: Beyond the hype: big data concepts, methods, and analytics
  publication-title: Int. J. Info Manage.
  doi: 10.1016/j.ijinfomgt.2014.10.007
– start-page: 64
  year: 2021
  ident: 10.1016/j.dsm.2023.06.001_bib8
  article-title: Characterizing technical debt and antipatterns in AI-based systems: a systematic mapping study
– volume: 8
  start-page: 4991
  issue: 6
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib65
  article-title: A machine-learning-based data-centric misbehavior detection model for internet of vehicles
  publication-title: IEEE Internet Things J.
  doi: 10.1109/JIOT.2020.3035035
– volume: 12
  issue: 5
  year: 2022
  ident: 10.1016/j.dsm.2023.06.001_bib24
  article-title: Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
  publication-title: BMJ Open
  doi: 10.1136/bmjopen-2021-054186
– volume: 3 (Sep.)
  start-page: 5314
  year: 2018
  ident: 10.1016/j.dsm.2023.06.001_bib59
  article-title: Research data management: a new role for academic/research librarians
  publication-title: Inter. Res. J.
– year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib27
– start-page: 1
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib3
  article-title: Hybrid feature learning and engineering based approach for face shape classification
– ident: 10.1016/j.dsm.2023.06.001_bib68
– ident: 10.1016/j.dsm.2023.06.001_bib61
– volume: 27
  start-page: 315
  issue: 2
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib60
  article-title: Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework
  publication-title: J. Am. Med. Inf. Assoc.
  doi: 10.1093/jamia/ocz162
– volume: 37
  start-page: 4437
  issue: 8
  year: 2022
  ident: 10.1016/j.dsm.2023.06.001_bib75
  article-title: Toward a real-time Smart Parking Data Management and Prediction (SPDMP) system by attributes representation learning
  publication-title: Int. J. Intell. Syst.
  doi: 10.1002/int.22725
– start-page: 1
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib34
  article-title: Data labeling research for deep learning based fire detection system
– start-page: 1393
  year: 2020
  ident: 10.1016/j.dsm.2023.06.001_bib40
  article-title: Semi-supervised deep learning seismic impedance inversion using generative adversarial networks
– start-page: 1
  year: 2022
  ident: 10.1016/j.dsm.2023.06.001_bib80
– volume: 47
  start-page: 17
  issue: 2
  year: 2018
  ident: 10.1016/j.dsm.2023.06.001_bib50
  article-title: Data lifecycle challenges in production machine learning: a survey
  publication-title: ACM SIGMOD Rec
  doi: 10.1145/3299887.3299891
– start-page: 101
  year: 2022
  ident: 10.1016/j.dsm.2023.06.001_bib15
– start-page: 563
  year: 2019
  ident: 10.1016/j.dsm.2023.06.001_bib29
  article-title: A feature enginering framework for short-term earthquake prediction based on AETA data
– start-page: 69
  year: 2018
  ident: 10.1016/j.dsm.2023.06.001_bib71
  article-title: Big data quality assessment model for unstructured data
– volume: 54
  start-page: 758
  issue: 5
  year: 2018
  ident: 10.1016/j.dsm.2023.06.001_bib58
  article-title: A survey towards an integration of big data analytics to big insights for value-creation
  publication-title: Inf. Process. Manag.
  doi: 10.1016/j.ipm.2018.01.010
– start-page: 498
  year: 2017
  ident: 10.1016/j.dsm.2023.06.001_bib70
  article-title: Big data pre-processing: closing the data quality enforcement loop
SSID ssj0002811256
Score 2.4731266
Snippet Artificial intelligence (AI) relies on data and algorithms. State-of-the-art (SOTA) AI smart algorithms have been developed to improve the performance of...
SourceID doaj
crossref
elsevier
SourceType Open Website
Enrichment Source
Index Database
Publisher
StartPage 144
SubjectTerms Data management
Data preprocessing
Data-centric
Machine learning
MLOps
Semi-supervised learning
Technical debt
Title Systematic review of data-centric approaches in artificial intelligence and machine learning
URI https://dx.doi.org/10.1016/j.dsm.2023.06.001
https://doaj.org/article/8e693534bf4448dc99d2e0f5465995c4
Volume 6
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQJxgQT1Fe8sCEFHASx_aNgEAVAwtU6oBkxS_UClJEy__Hj6QKA7AwxnLs5HzKfRd__g6hs9pD1ooWJuNGFxmlSmW185cCOBBrSo9QIsv3gY3G9H5STXqlvgInLMkDJ8NdCsugrEqqHPWZhNEAprDEhRreAJWOSqAESC-ZmsVfRh5HxNKtPgCxjDMK3ZZmJHeZRTiFXpQXaS_iW1CK2v292NSLN3dbaLMFivgqPeA2WrPNDtroyQfuoufHlQwzTkdQ8NzhwPnMIufSN3eS4XaBpw0Ob5sUI_C0J8WJ68bgt8iqtLgtI_Gyh8Z3t083o6ytlpDpkos8g8r5DxtVHHLgDnJjCq4VE7bQNqgNCpGX2nFVC0K51sQxRZUw1CifpBifqu6jQTNv7AHCYHnNakOMICUNgm8cHBPKh3uTgy70EJHOXFK3UuKhosWr7DhjM-ktLIOFZeLNDdH56pb3pKPxW-frsAarjkECOzZ4x5CtY8i_HGOIaLeCskUTCSX4oaY_z334H3MfofUwZOKiHaPB8uPTnnjwslSn0U-_AMHb6Ms
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Systematic+review+of+data-centric+approaches+in+artificial+intelligence+and+machine+learning&rft.jtitle=Data+science+and+management&rft.au=Prerna+Singh&rft.date=2023-09-01&rft.pub=KeAi+Communications+Co.+Ltd&rft.issn=2666-7649&rft.eissn=2666-7649&rft.volume=6&rft.issue=3&rft.spage=144&rft.epage=157&rft_id=info:doi/10.1016%2Fj.dsm.2023.06.001&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_8e693534bf4448dc99d2e0f5465995c4
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-7649&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-7649&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-7649&client=summon