Customize container images
Updating SSB images to use Kudu, Hive, HBase, and HDFS with the SQL Stream Builder.
To be able to use Kudu, Hive, HBase or HDFS, you need to update the images supplied to you, and
add the required JAR files and dependencies using Dockerfiles
.
There are two images you need to update, both of which can be found under the
sqlRunner.image
and sse.image
configurations.
sqlRunner.image
is the image that will be used for the Flink
deployments. This image is responsible for executing the SQL commands.
sse.image
is SSB itself.
If you want to use the updated container image, make sure to upload it to a registry your
Kubernetes cluster can access, and update the configuration in the values.yaml
file to point to your new images.
Here is an example of adding Hadoop and Hive to the SQL Runner image:
FROM [***REGISTRY HOST***]:[***PORT***]/[***PROJECT***]/ssb-sql-runner:latest
ENV CLOUDERA_ARCHIVES "https://archive.cloudera.com"
# Hadoop
ENV HADOOP_VERSION "3.1.1.7.1.9.0-387"
ENV HADOOP_HOME "/opt/hadoop"
RUN rm -rf ${HADOOP_HOME}/ \
&& cd /opt \
&& curl -sL --retry 3 "https://${CLOUDERA_ARCHIVES}/artifacts/build/44702451/cdh/7.x/redhat8/yum/tars/hadoop/hadoop-client-${HADOOP_VERSION}.tar.gz" | tar xz \
&& chown -R root:root hadoop-client-${HADOOP_VERSION} \
&& ln -sfn hadoop-client-${HADOOP_VERSION} hadoop \
&& rm -rf ${HADOOP_HOME}/share/doc \
&& find /opt/ -name *-sources.jar -delete
ENV HADOOP_CONF_DIR "${HADOOP_HOME}/etc/hadoop"
ENV PATH="${HADOOP_HOME}/bin:${PATH}"
ENV HADOOP_CLASSPATH "/opt/hadoop/share/hadoop/client/lib/*"
# Hive
RUN wget https://${CLOUDERA_ARCHIVES}/maven/org/apache/flink/flink-sql-connector-hive-3.1.3_2.12/1.18.0-csaop1.0.0/flink-sql-connector-hive-3.1.3_2.12-1.18.0-csaop1.0.0.jar \
-O /opt/flink/lib/flink-sql-connector-hive-3.1.3_2.12-1.18.0-csaop1.0.0.jar
Here is an example of adding Hadoop and Hive to the SSB image:
FROM [***REGISTRY HOST***]:[***PORT***]/[***PROJECT***]/ssb-sse:latest
ENV CLOUDERA_ARCHIVES "https://archive.cloudera.com"
ENV HADOOP_VERSION "3.1.1.7.1.9.0-387"
ENV HADOOP_HOME "/opt/hadoop"
RUN rm -rf ${HADOOP_HOME}/ \
&& cd /opt \
&& curl -sL --retry 3 "https://${CLOUDERA_ARCHIVES}/artifacts/build/44702451/cdh/7.x/redhat8/yum/tars/hadoop/hadoop-client-${HADOOP_VERSION}.tar.gz" | tar xz \
&& chown -R root:root hadoop-client-${HADOOP_VERSION} \
&& ln -sfn hadoop-client-${HADOOP_VERSION} hadoop \
&& rm -rf ${HADOOP_HOME}/share/doc \
&& find /opt/ -name *-sources.jar -delete
ENV HADOOP_CONF_DIR "${HADOOP_HOME}/etc/hadoop"
ENV PATH="${HADOOP_HOME}/bin:${PATH}"
# Only copy Hadoop jars that are required for SSB to communicate with Hive
RUN cp "${HADOOP_HOME}/share/hadoop/client/lib/hadoop-common-${HADOOP_VERSION}.jar" /opt/cloudera/ssb-sse/lib/ \
&& cp "${HADOOP_HOME}/share/hadoop/client/lib/hadoop-auth-${HADOOP_VERSION}.jar" /opt/cloudera/ssb-sse/lib/ \
&& cp "${HADOOP_HOME}/share/hadoop/client/lib/hadoop-mapreduce-client-core-${HADOOP_VERSION}.jar" /opt/cloudera/ssb-sse/lib/ \
# Hive
RUN wget https://${CLOUDERA_ARCHIVES}/maven/org/apache/flink/flink-sql-connector-hive-3.1.3_2.12/1.18.0-csaop1.0.0/flink-sql-connector-hive-3.1.3_2.12-1.18.0-csaop1.0.0.jar \
-O /opt/cloudera/ssb-sse/lib/flink-sql-connector-hive-3.1.3_2.12-1.18.0.jar