D

Data Lake

A data lake is a centralized repository storing raw data in native format at any scale for analytics.

What is a Data Lake?

A data lake is a centralized repository that stores all structured and unstructured data at any scale in its raw format, enabling diverse analytics and machine learning workloads.

Data lake characteristics

Schema-on-read, Raw data storage, Multiple data types, Scalable storage, Analytics-ready.

Common misconceptions

  • "Data lake equals data warehouse" — Different purposes
  • "Data lakes are just cheap storage" — Require governance
  • "Dump everything in" — Becomes data swamp without management