Data pipeline design anti-patterns
Bunch of scripts
Single run-everything script
Hacky “homemade” dependency control
Hard-coded values eg. data sources, queries…
ETL jobs that create duplicate data if ran twice or create holes in the data if not ran at all
No data quality checks or data ingestion monitoring
This post is licensed under CC BY 4.0 by the author.
Comments powered by Disqus.