Apache Kafka is a distributed commit log service that functions much like a publish/subscribe messaging system, but with better throughput, built-in partitioning, replication, and fault tolerance. Kafka provides the following:
- Persistent messaging with O(1) disk structures that provide constant time performance, even with terabytes of stored messages.
- High throughput, supporting hundreds of thousands of messages per second, even with modest hardware.
- Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
- Support for parallel data load into Hadoop.
- Setting up Kafka
- Describes how to set up Kafka after installation.
- Integrating with Kafka
- Describes how to configure security, manage mulitple Kafka versions, manage topics across multiple Kafka clusters, set up an end-to-end streaming pipeline, develop Kafka clients, and manage metrics.
- Administering Kafka
- Describes how to administer Kafka.
- Kafka Performance Tuning
- Describes best practices for Kafka performance tuning.
- Using Kafka Streams
- Describes where to get information about using Kafka Streams.