As that being said, there are generally 3 principles to follow to build and use a data lake:
- to store all the data by its original format, and use a curated layer in an open-source format.
- to have a foundational compute layer that supports all of the core lake-house use cases such as ETL with/without streaming processing, data science with machine learning, and SQL analytics on the data lake.
- to be able to accept new or additional use cases in terms of integration as that not a part of the core lake-house use cases.
To the last point, the curated data lake, the foundational compute layer, and other services with tools become key requirements to support easy integration.
August 2, 2021