Connecting Kafka clients to Cloudera Data Hub provisioned
clusters
Learn how to connect Kafka clients to clusters provisioned with Cloudera Data Hub.
Use the following steps to connect Kafka clients to clusters provisioned with Cloudera Data Hub. Configuration examples provided in this list of steps
assume that the cluster you are connecting to was provisioned with a Streams Messaging
cluster definition.
If you are connecting your clients from outside of your virtual network (VPC or Vnet)
verify that both inbound and outbound traffic is enabled on the port used by Kafka
brokers for secure communication. The default port is 9093. For more information, see
the following resources:
If you are connecting your clients over the internet, verify that your virtual network
(VPC or Vnet) is assigned a public IP address. For more information, see the following resources:
Clients connecting to Cloudera Data Hub provisioned clusters require a
Cloudera user account that provides access to the
required Cloudera resources. Verify that a Cloudera user account with the required roles and
permissions is available for use. If not, create one. Any type of Cloudera user account can be used. If you are creating a
new account to be used by Kafka clients, Cloudera recommends that you create a machine user account. For more information, see User Management in the ClouderaManagement Console documentation.
In addition to the Cloudera user account having access
to the required Cloudera resources, the user account
also needs to have the correct policies assigned to it in Ranger. Otherwise, the client
cannot perform tasks on Kafka resources. These policies are specified within the Ranger
instance that provides authorization to the Kafka service you want to connect to. For
more information, see the Cloudera Runtime documentation on Apache Ranger and the
Kafka specific Ranger documentation.
Obtain the FreeIPA certificate of your environment:
From the Cloudera Home Page navigate to Management Console > Environments.
Locate and select your environment from the list of available environments.
Go to the FreeIPA tab.
Click Get FreeIPA Certificate.
The FreeIPA certificate file, [***ENVIRONMENT
NAME***].crt, is downloaded to your computer.
Add the FreeIPA certificate to the truststore of the client.
The certificate needs to be added for all clients that you want to connect to the Cloudera Data Hub provisioned cluster. The exact steps of adding the
certificate to the truststore depends on the platform and key management software used.
For example, you can use the Java Keytool command line
tool:
A valid workload username and password has to be provided to the client, otherwise it
cannot connect to the cluster. Credentials can be obtained from Management Console.
From the Cloudera Home Page navigate to Management Console
> User Management.
Locate and select the user account you want to use from the list of available
accounts.
The user details page displays information about the user.
Find the username found in the Workload Username entry and
note it down.
Find the Workload Password entry and click Set
Workload Password.
In the dialog box that appears, enter a new workload password, confirm the password
and note it down.
Fill out the Environment text box.
Click Set Workload Password and wait for the process to
finish.
Click close.
Configure clients.
In order for clients to be able to connect to Kafka brokers, all required security
related properties have to be added to the client's properties file. The following example
configuration lists the default properties that are needed when connecting clients to a
cluster provisioned by Cloudera Data Hub with a Streams Messaging cluster
definition. If you made changes to the security configuration of the brokers, or
provisioned a custom cluster with non-default Kafka security settings, make sure to change
the appropriate parameters in the client configuration as well.
Replace [***CLIENT TRUSTSTORE.JKS***] with the path to the client's
truststore file.This is the same file that you added the FreeIPA certificate to in Step
2.
Replace [***TRUSTSTORE PASSWORD***] with the password of the
truststore file.
Replace [***USERNAME***] and [***PASSWORD***]
with the workload username and password obtained in Step 3.
Obtain Kafka broker hostnames:
You can obtain the Kafka broker hostnames from the Cloudera Manager UI.
From the Cloudera Home Page navigate to Management Console > Environments.
Locate and select your environment from the list of available environments.
Select the Cloudera Data Hub cluster you want to connect to from
the list of available clusters.
Click the link found in the Cloudera Manger Info
section.
You are redirected to the Cloudera Manager web UI.
Click Clusters and select the cluster that the Kafka service
is running on.
The default name for clusters created with a Streams Messaging Cluster definition
is streams-messaging.
Select the Kafka service.
Go to Instances.
The Kafka broker hostnames are listed in the Hostname
column.
Connect clients to brokers.
Connect the clients by supplying them with the broker hostnames obtained in step 5. The actions you need to take
differ depending on the type of client you are using.
Custom developed Kafka Applications
When producing or consuming messages with your own Kafka client application, you
have to provide the Kafka broker hostnames within the client code.
Kafka console producer and consumer
When producing or consuming messages with the
kafka-console-consumer or kafka-console-producer
command line tools, run the producer or consumer with the appropriate hostnames.
Additionally, you must also pass the client properties file containing the security
related properties with --producer.config or
--consumer.config. For
example:
If schema.registry.url is not set, the client does not try to
connect to Schema Registry.
If this property is set, the username and password must also be configured.
You can find the Schema Registry endpoint in Management Console > Data Hub Clusters > [***YOUR DATA HUB CLUSTER***] > Endpoints.
Some applications may need to connect directly to Schema Registry to interact with it.
For example, to retrieve or store a schema.
A Schema Registry API client is also provided by Cloudera for these cases. For example,
to retrieve the latest version of a schema you can use the
following: