Bistro | General-purpose data processing
engine for both batch and stream analytics. It is based on a novel data
model, which represents data via functions and processes data via column
operations as opposed to having only set
operations in conventional approaches like MapReduce or SQL. |
IBM Streams | Platform for distributed processing and real-time analytics.
Integrates with many of the popular technologies in the Big Data ecosystem
(Kafka, HDFS, Spark, etc.) |
Apache Hadoop | Framework for distributed processing.
Integrates MapReduce (parallel processing), YARN (job scheduling) and
HDFS (distributed file system). |
Tigon | High Throughput Real-time Stream Processing Framework. |
Pachyderm | Pachyderm is a data storage platform built on Docker and
Kubernetes to provide reproducible data processing and analysis. |
Polyaxon | A platform for reproducible and scalable machine learning and
deep learning. |