D

Data Pipeline

A data pipeline automates the flow of data from source systems through processing stages to destination systems.

What is a Data Pipeline?

A data pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next, automating data movement and transformation.

Pipeline components

Sources, Transformations, Orchestration, Monitoring, Destinations.

Common misconceptions

  • "Pipeline equals ETL" — ETL is one type of pipeline
  • "Pipelines are set and forget" — Require monitoring and maintenance
  • "Real-time is always better" — Batch has valid use cases