Post

Advanced Spark Structured Streaming - Aggregations, Joins, Checkpointing

I wrote a blog post demonstrating advanced Spark Structured Streaming topics.

An overview of the content is:

  • setting up a Kafka server
  • producing messages with Kafka
  • consuming tweets with Spark Structured Streaming
  • watermarking messages
  • parsing JSON data
  • performing aggregattion queries on the stream of data
  • analyzing execution plans of queries
  • upserting data to Snowflake
  • checkpointing a structured stream

You can find the full blog post here.

A small preview:

Screen Shot 2018-02-11 at 16.50.58.png

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.