Installing Kafka Connect connector plugins

Learn how to install third-party connectors in Kafka Connect. Third-party connectors are installed by building a new Kafka image that includes the connector artifacts. In CSM Operator, you build new images with Strimzi by configuring the KafkaConnect resource.

By default the Strimzi Cluster Operator deploys a Kafka Connect cluster using the Kafka image shipped in CSM Operator. The Kafka image contains the connector plugins that are included by default in Apache Kafka.

Additional, third-party connectors are not included. If you want to deploy and use a third-party connector, you must build a new Kafka image that includes the connector plugins that you want to use. Your new image will be based on the default Kafka image that is shipped in CSM Operator. If the connector plugins are included in the image, you will be able to deploy instances of these connectors using KafkaConnector resources.

To build a new image, you add various properties to your KafkaConnect resource. These properties specify what connector plugin artifacts to include in the image as well the target registry where the image is pushed.

If valid configuration is included in the resource, Strimzi automatically builds a new Kafka image that includes the specified connector plugins. The image is built when you deploy your KafkaConnect resource. Specifically, Strimzi downloads the artifacts, builds the image, uploads it to the specified container registry, and then deploys Kafka Connect cluster.

The images built by Strimzi must be pushed to a container registry. Otherwise, they cannot be used to deploy Kafka Connect. You can use a public registry like quay.io or Docker Hub. Alternatively, you can push to your self-hosted registry. What registry you use will depend on your operational requirements and best practices.

If you are deploying multiple Kafka Connect clusters, Cloudera recommends using a unique image (different tag) for each of your clusters. Images behind tags can change and a change in an image should not affect more than a single cluster.

Building a new Kafka image automatically with Strimzi

You can configure your KafkaConnect resource so that Strimzi automatically builds a new container image that includes your third-party connector plugins. Configuration is done in spec.build.

When you specify spec.build.plugins properties in your KafkaConnect resource, Strimzi automatically builds a new Kafka image that contains the specified connector plugins. The image is pushed to the container registry specified in spec.build.output. The newly built image is automatically used in the Kafka Connect cluster that is deployed by the resource.

  1. Create a Docker configuration JSON file named docker_secret.json which contains your credentials to both the Cloudera container repository and your own repository where the images will be pushed.
    {
        "auths": {
            "container.repository.cloudera.com": {
                "username": "[***CLOUDERA USERNAME***]",
                "password": "[***CLOUDERA PASSWORD***]"
            },
            "[***YOUR REGISTRY***]": {
                "username": "[***USERNAME***]",
                "password": "[***PASSWORD***]"
            }
        }
    }
  2. In the namespace where the KafkaConnect resource will be created, create a secret with the Docker credentials.
    kubectl create secret docker-registry [***SECRET NAME***] --from-file=.dockerconfigjson=docker_secret.json
  3. Configure your KafkaConnect resource.
    The resource configuration has to specify a container registry in spec.build.ouput. Third-party connector plugins are added to spec.build.plugin.

    The following example adds the Kafka FileStreamSource and FileStreamSink example connectors and uploads the newly built image to a secured registry of your choosing.

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaConnect
    metadata:
      name: my-connect-cluster
      annotations:
        strimzi.io/use-connector-resources: "true"
    spec:
      version: 3.7.0.1.1
      replicas: 3
      bootstrapServers: my-cluster-kafka-bootstrap.kafka:9092
      config:
        group.id: my-connect-cluster
        offset.storage.topic: my-connect-cluster-offsets
        config.storage.topic: my-connect-cluster-configs
        status.storage.topic: my-connect-cluster-status
      build:
        output:
          type: docker
          image: [***YOUR REGISTRY***]/[***IMAGE***]:[***TAG***]
          pushSecret: [***SECRET NAME***]
        plugins:
          - name: kafka-connect-file
            artifacts:
              - type: maven
                group: org.apache.kafka
                artifact: connect-file
                version: 3.7.0
  4. Deploy the resource.
    kubectl apply --filename [***YAML CONFIG***] --namespace [***NAMESPACE***]
  5. Wait until images are built and pushed. The Kafka Connect cluster is automatically deployed afterwards.
    During this time you can monitor the deployment process with kubectl get and kubectl logs.
    kubectl get pods --namespace [***NAMESPACE***]

    The output lists a pod called [***CONNECT CLUSTER NAME***]-connect-build. This is the pod responsible for building and pushing your image.

    NAME                                          READY   STATUS    RESTARTS   
    #...
    my-connect-cluster-connect-build             1/1     Running     0     

    You can get additional information by checking the log of this pod.

    kubectl logs [***CONNECT CLUSTER NAME***]-connect-build --namespace [***NAMESPACE***]

    You should see various INFO entries related to building and pushing the image.

    Once the image is successfully built and pushed, the pod that built the image is deleted.

    Afterwards, your Kafka Connect cluster is deployed.

  6. Verify that the cluster is deployed.
    kubectl get kafkaconnect [***CONNECT CLUSTER NAME***] --namespace [***NAMESPACE***]
    If cluster deployment is successful, you should see an output similar to the following.
    NAME                 DESIRED REPLICAS   READY
    #...
    my-connect-cluster   3                  True
    
  7. Verify that connector plugins are available.
    You can do this by listing the contents of /opt/kafka/plugins in any Kafka Connect pod.
    kubectl exec -it \
      --namespace [***NAMESPACE***] \
      [***CONNECT CLUSTER NAME***]-connect-[***ID***] \
      --container [***CONNECT CLUSTER NAME***]-connect \
      -- /bin/bash -c "ls /opt/kafka/plugins"
    
Kafka Connect is deployed with an image that contains third-party connectors. Deploying the third-party connectors you added is now possible with KafkaConnector resources.
Deploy a connector using a KafkaConnector resource. See, Deploying connectors.

Configuring the target registry

The Kafka image built by Strimzi is uploaded to a container registry of your choosing. The target registry where the image is uploaded is configured in your KafkaConnect resource with spec.build.output.

#...
kind: KafkaConnect
spec:
  build:
    output:
      type: docker
      image: [***YOUR REGISTRY***]/[***IMAGE***]:[***TAG***]
      pushSecret: [***SECRET NAME***]
  • type - specifies the type of image Strimzi outputs. The value you specify is decided by the type of your target registry. The property accepts docker or imagestream as valid values.
  • image - specifies the full name of the image. The name includes the registry, image name, as well as tags.
  • pushSecret - specifies the name of the secret that contains the credentials required to connect to the registry specified in image. This property is optional and required only if the registry requires credentials for access.

Configuring connector plugins to add

The Kafka image built by Strimzi includes the connector plugins that you reference in the spec.build.plugin property of your KafkaConnect resource.

Each connector plugin is specified as an array.
#...
spec:
  build:
    plugins:
      - name: kafka-connect-file
        artifacts:
          - type: maven
            group: org.apache.kafka
            artifact: connect-file
            version: 3.7.0

Each connector plugin must have a name and a type. The name must be unique in the Kafka Connect deployment.

Various artifact types are supported including jar, tgz, zip, maven, and other.

The type of the artifact defines what required and optional properties are supported. At minimum, for all types, you must specify a location where the artifact is downloaded from. For example, with maven type artifacts, you specify the Maven group and artifact. For jar type artifacts you specify a URL.

You can specify artifacts for other types of plugins, like data converters or transforms, not just connectors.

Rebuilding a Kafka image

It is possible that the base image or the plugin behind the URL changed over time. You can trigger Strimzi to rebuild the image by applying the strimzi.io/force-rebuild=true annotation on the Kafka Connect StrimziPodSet resource.

kubectl annotate strimzipodsets.core.strimzi.io --namespace [***NAMESPACE***] \
  [***CONNECT CLUSTER NAME***]-connect \
  strimzi.io/force-rebuild=true