Towards the Specification and Generation of Time Series Datasets from Data Lakes
These days, more and more organizations are building data lakes as a mechanism to store the information they generate. This information is considered as a valuable asset that, if properly analyzed, can help to make more informed decisions. However, since the analyses to be performed are often not kn...
Saved in:
| Published in | IEEE International Requirements Engineering Conference Workshops (Online) pp. 302 - 306 |
|---|---|
| Main Authors | , , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.09.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2770-6834 |
| DOI | 10.1109/REW57809.2023.00057 |
Cover
| Summary: | These days, more and more organizations are building data lakes as a mechanism to store the information they generate. This information is considered as a valuable asset that, if properly analyzed, can help to make more informed decisions. However, since the analyses to be performed are often not known in advance, these data are stored in a raw format. This means that any application built on top of a data lake must carefully elicit what data will be used for a particular analysis and how those data will be transformed to make them all fit together into a dataset. This data selection and preparation task is typically performed by data scientists that write large and complicated scripts in data management languages to extract and transform the required data. This reduces the productivity of data scientists, who must write large pieces of highly similar code. It also makes it difficult for domain experts to participate in this process because they have little understanding of these scripts. To alleviate this problem, this work introduces a work-in-progress version of a high-level declarative language for specifying the requirements that a dataset coming from a data lake must satisfy. This language is then processed to automatically generate the specified dataset, allowing data scientists and domain experts to be agnostic about the details of how data are exactly retrieved and transformed. |
|---|---|
| ISSN: | 2770-6834 |
| DOI: | 10.1109/REW57809.2023.00057 |