- 15 Apr 2017 » An example of Lambda Architecture to analyse Twitter's tweets with Spark, Spark-streaming, Cassandra, Kafka, Twitter4j, Akka and Akka-http by Narayan Kumar
- 25 Mar 2017 » Applying the Lambda Architecture on Microsoft Azure cloud by Vladimir Dorokhov
- 16 Jul 2016 » An example Lambda Architecture for analytics of IoT data with spark, cassandra, Kafka and Akka by Achim Nierbeck
- 27 Aug 2014 » A RAD Stack: Kafka, Storm, Hadoop, and Druid by Druid Committers
- 24 Jul 2014 » Deploop: A Lambda Architecture Provisioning Tool by Javi Roman
- 01 Jul 2014 » Nathan Marz's Big Data book by Michael Hausenblas
- 30 Jun 2014 » Speed Components by Michael Hausenblas
- 30 Jun 2014 » Serving Components by Michael Hausenblas
- 30 Jun 2014 » Batch Components by Michael Hausenblas
- 22 Jun 2014 » Buildoop: A Lambda Architecture ecosystem builder by Javi Roman
- 20 Jan 2014 » Lambda Architecture: A state-of-the-art by Pere Ferrera
- 19 Jan 2014 » An example Lambda Architecture for real-time analysis of hashtags using Trident, Hadoop and Splout SQL by Pere Ferrera
- 25 Dec 2013 » Twitter Summingbird by Michael Hausenblas
- 25 Dec 2013 » Lambdoop by Michael Hausenblas
- 25 Dec 2013 » Issues in Combined Static and Dynamic Data Management by Michael Hausenblas
- 24 Dec 2013 » Where Polyglot Persistence meets the Lambda Architecture by Michael Hausenblas
- 11 Dec 2013 » A real-time architecture using Hadoop and Storm by Nathan Bijnens
- 10 Dec 2013 » Why are we doing this and why are we doing this now? by Michael Hausenblas
What is the Lambda Architecture?
Nathan Marz came up with the term Lambda Architecture (LA) for a generic, scalable and fault-tolerant data processing architecture, based on his experience working on distributed data processing systems at Backtype and Twitter.
The LA aims to satisfy the needs for a robust system that is fault-tolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases, and in which low-latency reads and updates are required. The resulting system should be linearly scalable, and it should scale out rather than up.
Here’s how it looks like, from a high-level perspective:
- All data entering the system is dispatched to both the batch layer and the speed layer for processing.
- The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views.
- The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.
- The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
- Any incoming query can be answered by merging results from batch views and real-time views.
- Big Data, book by Nathan Marz and James Warren
- Applying the Big Data Lambda Architecture, Dr. Dobb’s article by Michael Hausenblas
- The Lambda architecture: principles for architecting realtime Big Data systems, blog post by James Kinley
- Lambda Architecture: Achieving Velocity and Volume with Big Data, article by Christian Prokopp
- Lambda Architecture with Apache Spark by Michael Hausenblas
Who is behind this?
See the about us section for details.