In the ever-changing world of data and analytics, it can be challenging to assess how organization is doing compared to the rest of the market and how to frame data strategy. To better understand how organization compares, Gartner defined “The Analytics Continuum” that lays out seven high-level tasks and plots them on the scale of analytics maturity and competitive advantage.
THE TRADITIONAL ETL CHALLENGE
The transition from “What happened?” to “Why did it happen?” on “The Analytics Continuum” is the most difficult step to take because the capability to answer these questions requires an entirely new approach to data management. This step requires access to data in real time as the events happen and a traditional extract, transform, load (ETL) process isn’t capable of supporting actionable feedback. For example, let’s say you would like to send real-time targeted ads via push notifications to customers in your store. A traditional ETL process might only be kicked off once a day when the store is closing, well after the customer has left the store. So how do you solve this problem in today’s world?
THE MODERN DATA ARCHITECTURE SOLUTION
A modern data architecture (MDA) allows to process real-time streaming events in addition to more traditional data pipelines. There are two primary approaches when building an MDA for organization, each having their own strengths and weaknesses. The first approach is called a Lambda architecture and has two different components: batch processing and stream processing. The second approach is called a Kappa architecture where all data in your environment is treated as a stream.
LAMBDA ARCHITECTURE OVERVIEW
The main advantage of implementing a Lambda based MDA is that you can typically continue to use existing batch ETL processes as the batch component. The only time this wouldn’t be true is if existing systems are unable to handle the throughput of data organization is seeing. A well-known weakness of Lambda is that you now have to manage and maintain two separate systems to acquire data.
Lambda architecture example
KAPPA ARCHITECTURE OVERVIEW
The biggest advantage of Kappa architecture is that it is a simplification of the Lambda architecture and allows to have only streaming services as main source of data. This reduces the number of services and amount of code organization has to maintain. Treating every data point in organization as a streaming event also provides the ability to ‘time travel’ to any point and see the state of all data in organization. One downside of Kappa is the need to re-process events in the case of errors; however access to affordable, elastic compute makes this a minor issue.
Kappa architecture example
Choosing the correct modern data architecture is an important step in crafting organization’s data strategy. This involves carefully assessing organization’s current state architecture and planning for maximum flexibility to best serve the consumers of this data. Both Kappa and Lambda architectures will provide a strong foundation when constructing a broader data-oriented business.
Author : Chandra Bhaskar