data-engineering 25
- How I halved the runtime of my PostgreSQL dbt model using DuckDB
- Modern data engineering stack
- Important skills for data engineers
- General guidelines for design of batch jobs
- Data pipeline design anti-patterns
- Data extraction and transformation design patterns
- My view on responsibilities of a modern data engineer
- Apache Spark Presentation
- My articles for Sonra Intelligence
- Loading Data into Snowflake Data Warehouse and performance of joins
- My favorite features of Snowflake Data Warehouse
- Using Spark Structured Streaming to upsert Kafka messages into a database
- Clustering keys Snowflake
- Advanced Spark Structured Streaming - Aggregations, Joins, Checkpointing
- Writing UDAFs on Snowflake
- Apache Airflow for data pipelines and ETL management
- Ingesting realtime tweets using Apache Kafka, Tweepy and Python
- Implementing the Speed Layer of Lambda Architecture using Spark Structured Streaming
- Implementing the Serving Layer of Lambda Architecture using Redshift
- Implementing the Batch Layer of Lambda Architecture using S3, Redshift and Apache Kafka
- Introduction to Lambda Architecture
- Windows functions in PostgresQL
- T-SQL Window functions syntax
- Spark vs Pandas benchmark: Why you should use Spark 2.1 only for really big data
- How to fix 'Task not serializable' issues in Apache Spark