Sampling

How to enable sampling for your SQL queries.

To enable sampling, you create a secret with the parameters of the Kafka instance used for sampling before starting helm install.

Example for non-secure setup:

kubectl create secret generic ssb-sampling-kafka -n flink \
   --from-literal=SSB_SAMPLING_BOOTSTRAP_SERVERS=kafka.example.com:9092 \
   --from-literal=SSB_SAMPLING_SECURITY_PROTOCOL=PLAINTEXT

Example for secure setup:

kubectl create secret generic ssb-sampling-kafka -n flink \
   --from-literal=SSB_SAMPLING_BOOTSTRAP_SERVERS=kafka-ssl.example.com:9092 \
   --from-literal=SSB_SAMPLING_SECURITY_PROTOCOL=SSL \
   --from-file=sampling_kafka_truststore.jks=[*** YOUR PATH ***]/truststore.jks \
   --from-literal=SSB_SAMPLING_TRUSTSTORE_PASSWORD=[*** PASSWORD ***]

In the values.yaml file, set sampling enabled to true and set your secret.

ssb:
  sampling:
    enabled: true
    secure: true
    secretRef: ssb-sampling-kafka

Sample results

How to view sample results from your SQL queries.

Because Cloudera Streaming Analytics - Kubernetes Operator does not install Kafka, in the Cloudera SQL Stream Builder UI you are not able to see any rows from the Flink jobs. To see sampled results from your SQL queries, you need to have a Kafka cluster installed and accessible by both Cloudera SQL Stream Builder and Flink pods, and change ssbConfiguration to configure Cloudera SQL Stream Builder to use Kafka for data sampling:

ssbConfiguration:
  application.properties: |+
	kafka.enabled=true
	spring.kafka.bootstrap-servers=example-kafka:9092
	spring.kafka.jaas.enabled=false
	spring.kafka.properties.security.protocol=PLAINTEXT