Archives
- 03 Mar How I halved the runtime of my PostgreSQL dbt model using DuckDB
- 10 Feb Python project setup best practices
- 20 Jan Load balancers feature comparison: Envoy vs HA Proxy vs Nginx
- 20 Jan Python tips from "Advanced Python Mastery" course
- 18 Mar My favourite papers on columnar databases
- 18 Mar Modern data engineering stack
- 18 Mar Important skills for data engineers
- 18 Mar General guidelines for design of batch jobs
- 18 Mar Data pipeline design anti-patterns
- 18 Mar Data extraction and transformation design patterns
- 02 Aug Kubernetes auto-scaling on relative resource usage
- 25 Jan My view on responsibilities of a modern data engineer
- 02 Nov Introduction to PostgreSQL High availability with pg_auto_failover
- 02 Nov Envoy proxy and modern load balancing
- 21 May How get into top 30% of House Prices: Advanced Regression Kaggle competition with 50 lines of code
- 22 Jun Tips for easier development on AWS
- 13 May Apache Spark Presentation
- 11 May My articles for Sonra Intelligence
- 16 Mar Loading Data into Snowflake Data Warehouse and performance of joins
- 14 Mar My favorite features of Snowflake Data Warehouse
- 14 Mar Caching in Snowflake Data Warehouse
- 11 Feb Using Spark Structured Streaming to upsert Kafka messages into a database
- 11 Feb Clustering keys Snowflake
- 11 Feb Advanced Spark Structured Streaming - Aggregations, Joins, Checkpointing
- 07 Feb Writing UDAFs on Snowflake
- 01 Feb Window Functions on Snowflake
- 01 Feb Apache Airflow for data pipelines and ETL management
- 11 Nov Ingesting realtime tweets using Apache Kafka, Tweepy and Python
- 11 Nov Implementing the Speed Layer of Lambda Architecture using Spark Structured Streaming
- 11 Nov Implementing the Serving Layer of Lambda Architecture using Redshift
- 11 Nov Implementing the Batch Layer of Lambda Architecture using S3, Redshift and Apache Kafka
- 10 Nov Introduction to Lambda Architecture
- 03 Nov Windows functions in PostgresQL
- 30 Sep T-SQL Window functions syntax
- 24 Sep Advanced data analysis for cBioPortal
- 04 Sep SQL Server Security Basics - Logins and Users
- 27 Aug Spark vs Pandas benchmark: Why you should use Spark 2.1 only for really big data
- 27 Aug GSoC Weekly posts Summary
- 26 Aug Google Summer of Code 2017 Summary
- 25 Aug GSOC: Week 13
- 18 Aug GSOC: Week 12
- 10 Aug GSOC: Week 11
- 03 Aug GSOC: Week 10
- 26 Jul GSOC: Week 9
- 20 Jul GSOC: Week 8
- 13 Jul GSOC: Week 7
- 06 Jul GSOC: Week 6
- 29 Jun GSOC: Week 5
- 22 Jun GSOC: Week 4
- 18 Jun Get Spark Clasifier metrics using the Confusion Matrix
- 15 Jun GSOC: Week 3
- 12 Jun How to fix 'Task not serializable' issues in Apache Spark
- 08 Jun GSOC: Week 2
- 01 Jun GSOC: Week 1
- 31 May Detailed plans for my GSoC project
- 26 May GSOC: Community Bonding week 3
- 21 May GSOC: Community Bonding week 2
- 21 May GSOC: Community Bonding week 1