You need to add Cloudera Schema Registry, Kudu, Hive, or other services as a Catalog using the Streaming SQL
Console in SQL Stream Builder (SSB) to use them with Flink DDL.
Make sure that you have the required service on your cluster.
Make sure that you have the right permissions set in Ranger for SSB and the
services.
Navigate to the Streaming SQL Console.
Navigate to Management Console > Environments, and select the environment where you have created your
cluster.
Select the Streaming Analytics cluster from the list of
Data Hub clusters.
Select Streaming SQL Console from the list of
services.
The Streaming SQL Console opens in a new window.
Open a project from the Projects page of Streaming SQL
Console.
Select an already existing project from the list by clicking the
Open button or Switch button.
Create a new project by clicking the New Project
button.
Import a project by clicking the Import button.
You are redirected to the Explorer view of the project.
Open Data Sources from the
Explorer view.
Click next to Catalogs.
Select New Catalog.
The Add Catalog window appears.
Select the Catalog Type from the following
options:
Select Schema Registry from the
Catalog Type drop-down.
Select the Kafka cluster you registered
as Data Source.
Enable TLS, if needed for the communication.
If you enabled TLS, provide the Schema Registry
Truststore location and password to the SR
TrustStore and SR TrustStore
Password field.
Add the Schema Registry URL.
Go to the Streams Messaging cluster in your
environment.
Select Cloudera Manager from the
list of services.
Select Schema Registry from the
list of services.
Click on Instances.
Copy the Hostname of Schema Registry.
Add the default port of Schema Registry after the
hostname.
Example:
http://docs-test-1.vpc.cloudera.com:7788/api/v1
Optional: Specify a Schema key suffix and
a Schema value suffix.
By default, the
Schema Registry catalog looks for a topic in Kafka with the
same name as the schemas stored in Cloudera Schema Registry.
It uses the schema to deserialize the payload of the
messages in that topic and ignores the message key. If you
have different schemas to deserialize message key and
payload you can define suffixes to differentiate them.For
example, if you configure the catalog's schema key suffix as
-key and the schema value suffix as
-value, and you have schemas stored in
Cloudera Schema Registry named example-key
and example-value. The SSB catalog will
automatically create a table called example
and use those schemas to deserialize the key and the payload
of the messages, respectively.
Optional: Add a Prefix for key fields in
schema.
You can set a prefix for key field
names to avoid table creation failure as the column names in
the Flink schema are created based on the field names of the
Avro schema. Table creation fails if the key and value
schemas contain overlapping fields. The provided prefix will
be assigned to the key fields of the Flink table as shown in
the following
example:
CREATE TABLE … (k_a INT, k_b INT, b INT, c INT0 with (..., ‘key.fields’=’k_a;k_b’)
Select Kudu from the Catalog Type
drop-down.
Add the host URL of Kudu Masters.
Go to the Real-Time Data Mart cluster in your
environment.
Select Cloudera Manager from the
list of services.
Select Kudu from the list of
services.
Click on Instances.
Copy the Hostname of the Master Default Group.
Add the default port of Kudu after the
hostname.
Example:
docs-test-1.vpc.cloudera.com:7051
Select Hive from the Catalog Type
drop-down.
Provide a Name to the Hive catalog
created in SSB.
Provide the name of the Default Database
in Hive.
Provide the hostname of Cloudera
Manager at CM Host.
You need to provide
the Cloudera Manager hostname where the Hive service is also
located. You can check the hostname of Cloudera Manager by
clicking CM-UI from the
Services of the CDP Public Cloud
environment page, then access Hosts > All Hosts from the Cloudera Manager homepage:
Select Custom from the Catalog Type
drop-down.
Provide a Property Key.
Proivde a Property Value.
If needed,
you can specify more custom properties by using the
plus icon.
Click on Add Filter.
Provide a Database and
Table filter if you want to select specific
tables to use from the catalog.
Click on Validate.
If the validation is successful, click Create.
You are ready to use the added catalog in SSB with Flink
DDL. The already existing schemas in Schema Registry and Confluent Schema Registry,
tables in Kudu and Hive are automatically imported to SSB.After registering a catalog, you can edit, duplicate and
delete it from Streaming SQL Console:
Open Data Sources from the
Explorer view.
Click next to Catalogs.
Select Manage.
The Catalogs tab
opens where the registered catalogs are listed. You have the following
options to manage the catalog sources:
Click on one of the existing catalogs to edit its configurations.
Click to remove the catalog.
Click to duplicate the catalog with
its configurations.