Installing Kafka Connect connectors

Learn how to install custom developed (third party) connectors as well as the FileStream connectors in CDP.

Kafka Connect connectors are distributed as:

A directory of JAR files:
The directory includes the JAR for the connector itself, as well as all its dependencies.
An uber JAR/FAT JAR/JAR with dependencies file:
This is a single JAR file that contains the connector, as well as its dependencies.

In CDP, all connectors that do not come prepackaged with the Runtime distribution, as well as the FileStream connectors (FileStreamSourceConnector and FileStreamsSinkConnector), must be installed manually. This is done by making the connector JAR files available on all cluster hosts in a specific location. After installation is complete, you will be able to deploy the connector using the Streams Messaging Manager UI (recommended), Streams Messaging Manager API, or Kafka Connect API.

The location of the JAR files is determined by the Kafka Connect role’s plugin.path property. Kafka Connect discovers connectors by looking at this directory path on the host machines. By default, the plugin.path property is set to /var/lib/kafka. This means that, by default, any connector placed in this directory will be discovered by Kafka Connect. Cloudera recommends that you use the default path.

The installation steps differ for custom developed connectors and the FileStream connectors. This is because the JAR file for the FileStream connectors is by default available on CDP cluster hosts. Additionally, FileStream connectors can be installed with an alternate installation method that involves the usage of an advanced configuration snippet.

Installing custom developed Kafka Connect connectors

Learn how to install custom developed (third party) connectors in CDP.

The following steps assume that plugin.path is set to /var/lib/kafka, which is the default path.

Log in to a host that is running a Kafka Connect role.
Make the connector files available in or readable from /var/lib/kafka.
How you choose to complete this step will largely depend on your cluster environment. For example:
- You can download or copy the files directly to /var/lib/kafka.
- You can choose to place connector files in a location different from /var/lib/kafka and create symlinks that point to the location where the connector files are available.
Regardless of what method you choose, this step is considered complete once the JAR files are readable from /var/lib/kafka.
Restart all Kafka Connect roles:
1. In Cloudera Manager, select the Kafka service.
2. Go to Instances.
3. Select all Kafka Connect instances by checking the checkbox next to each instance.
4. Click Actions for selected > Restart.
5. Click Restart to confirm.
  
  The roles are restarted once a Finished status is displayed.

The connector is installed and is available for deployment. You are now able to deploy and manage the new connector from the Streams Messaging Manager (SMM) UI.

Deploy an instance of the connector using SMM. For more information, see Creating a connector using Kafka Connect in SMM.

Installing FileStream connectors

Learn how to install the FileStream example connectors (FileStreamSourceConnector and FileStreamSinkConnector) that are shipped with Cloudera Runtime but are not installed by default. You can choose between two installation methods.

The JAR file for the FileStream connectors is shipped with Cloudera Runtime and is readily available on the cluster hosts. However, the file is not added to the Kafka Connect plugin.path directory by default. This is because the connectors are meant to be used for demonstrating the capabilities of Kafka Connect and are not production ready. As a result, even though these connectors do come packaged with Runtime, they must be installed before they can be deployed. The JAR file is located at /opt/cloudera/parcels/CDH/jars/connect-file-[***KAFKA COMPONENT VERSION***].jar.

You have two options when installing the FileStream connectors. You can install the connectors by copying or symlinking the JAR files to the plugin.path directory. Alternatively, you can add the location of the JAR file to the Kafka Connect role's CLASSPATH environment variable using an advanced configuration snippet.

The main difference between the two installation methods is that copying or symlinking the file requires that you log in to each Kafka Connect host in your cluster. Using an advanced configuration snippet, on the other hand, enables you to install the connector on all hosts by changing a single property in Cloudera Manager.

Although using an advanced configuration snippet is more convenient than copying or symlinking, be aware that setting an advanced configuration snippet is considered an advanced configuration practice. Therefore, Cloudera advises caution if you choose to install the FileStream connectors using an advanced configuration snippet.

Go to Cloudera Runtime component versions and note down the component version of Apache Kafka. You need to specify the version during installation.
The version is made up of three parts. It contains the upstream Apache Kafka version (first three digits), the Runtime version (digits four to six), and the Runtime build number (last three digits denominated with a dash). For example: 3.1.1.7.1.8.0-801.
The following steps assume that plugin.path is set to /var/lib/kafka, which is the default path.

Install the FileStream connectors:
- Copy or symlink
- Using an advanced configuration snippet
Complete the following actions on all Kafka Connect hosts.

Log in to a host that is running a Kafka Connect role.

Copy or symlink the JAR file of the connectors to /var/lib/kafka.

To copy the file directly to /var/lib/kafka use the following command:
cp /opt/cloudera/parcels/CDH/jars/connect-file-[***KAFKA COMPONENT VERSION***].jar /var/lib/kafka/

To create a symlink in /var/lib/kafka that points to /opt/cloudera/parcels/CDH/jars/connect-file-[***KAFKA COMPONENT VERSION***].jar use the following command:
ln -sf /opt/cloudera/parcels/CDH/jars/connect-file-[***KAFKA COMPONENT VERSION***].jar /var/lib/kafka/connect-file-[***KAFKA COMPONENT VERSION***].jar

Replace [***KAFKA COMPONENT VERSION***] with the component version of Apache Kafka. Regardless of what method you choose, this step is considered complete once the JAR file is readable from /var/lib/kafka.
In Cloudera Manager, select the Kafka service.

Go to Configuration.

Find the Kafka Connect Environment Advanced Configuration Snippet (Safety Valve) property.

Click to add a property and enter the following key and value pair:

Key: CLASSPATH

Value: /opt/cloudera/parcels/CDH/jars/connect-file-[***KAFKA COMPONENT VERSION***].jar

Replace [***KAFKA COMPONENT VERSION***] with the component version of Apache Kafka.

Click Save Changes.
Restart all Kafka Connect roles:
1. In Cloudera Manager, select the Kafka service.
2. Go to Instances.
3. Select all Kafka Connect instances by checking the checkbox next to each instance.
4. Click Actions for selected > Restart.
5. Click Restart to confirm.
  
  The roles are restarted once a Finished status is displayed.

The FileStream connectors are installed and available for deployment. You are now able to deploy and manage the FileStream connectors from the Streams Messaging Manager (SMM) UI.

Deploy an instance of the FileStream Source or FileStream Sink connector using SMM. For more information, see Creating a connector using Kafka Connect in SMM.