Data pipeline design anti-patterns

Posted Mar 18, 2023 Updated Jan 20, 2024

By Dorian Beganovic 1 min read

Bunch of scripts
Single run-everything script
Hacky “homemade” dependency control
Hard-coded values eg. data sources, queries…
ETL jobs that create duplicate data if ran twice or create holes in the data if not ran at all
No data quality checks or data ingestion monitoring

data-engineering

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.

Trending Tags

sql snowflake principles spark envoy postgresql python airflow aws dbt