Architecture

The Community Edition of Cloudera Streaming Analytics consists of preconfigured Docker images for Zookeeper, Kafka and PostgreSQL to make getting started easier. The components can be reached using their dedicated ports. Storage for the Community Edition is handled by docker volumes, while PostgreSQL is integrated for database management and storing the Materialized Views.

PostgreSQL is used by SQL Stream Builder components internally. It is also used as the underlying database for the Materialized View Engine. The PostgreSQL database for the Materialized View tables (eventador_snapper database) can be accessed by using the user eventador_snapper. The default password for the database is cloudera.

The containers use the following docker volumes to provide persistent local storage between restarts. If the volumes do not exist in your local environment, they are created when running the docker-compose up command.

flink-volume
Persistent in the Flink TaskManager and JobManager containers. It is used for storing savepoints of the jobs. When using the Filesystem connector, it is also recommended to use a volume.
ssb-volume
Used by the Streaming SQL Engine for persistent storage under the Streaming SQL Engine container.
pg-volume
Used by the PostgreSQL database. It stores the internal tables required for SQL Stream Builder to work, as well as the created Materialized Views.
kf-volume
Used by the Kafka container to store the topics.
zk-volume
Used by Zookeeper.

It is possible to delete the docker volumes for a fresh start by shutting down all of the containers with docker-compose down --volumes command, or individually removing them with docker volume rm <volume name> command. The containers use a docker network (named ssb-net) to communicate.

By default, the Kafka container is preconfigured in SQL Stream Builder as the Local Kafka data provider.

For local prototyping to the preconfigured Kafka, you can create and produce data to topics with the following commands:
docker-compose exec kafka /opt/kafka/bin/kafka-topics.sh --bootstrap-server kafka:9092
          --create --topic myNewTopic
docker-compose exec -T kafka /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server
          kafka:9092 --topic airplanes

The Kafka container is also accessible from outside the Docker network. However, only kafka:9092 has been set as the advertised listener. To connect to Kafka from your computer (outside the network), you need to add an entry to your /etc/hosts file to resolve the kafka domain name to localhost: 27.0.0.1 kafka