Setting up Atlas Kafka import tool

You can use the Atlas-Kafka import tool or an Apache Kafka import action to load Apache Atlas with Kafka topics.

You must first set up the tool and later run the tool manually to create or update the Kafka topic entities in Atlas. The tool uses client configuration to fetch the required configuration, like the Atlas endpoint and Zookeeper.

The Kafka type definition in Atlas contains multiple fields. The import tool fills out the following field values:

  • clusterName - Contains the value provided by the atlas.metadata.namespace property in the application configuration file. The default value is cm.

  • topic, name, description, URI - Contains the topic name.

  • qualifiedName - Contains the topic name and the cluserName which is joined by ‘@’. For instance, my-topic@cm. This serves as the unique identifier for topics.

  • partitionCount - Displays the number of partitions of the topic.

To set up the Atlas - Kafka import tool, follow these steps:

  1. Select the Atlas service in Cloudera Manager.
  2. Deploy client configs: Actions > Deploy Client Configuration.
  3. If Apache Ranger is enabled, create a Ranger policy with the user running the tool in cm_atlas to allow the <user> to create, update, delete, and read Kafka entities:
    • Log into Ranger UI.
    • Navigate to cm_atlas policies.
    • Select the policy which has create_entity and delete_entity permissions.
    • Add the Kerberos user that is used with the import-kafka.sh tool.
  4. SSH into a host where the client configuration is deployed.
  5. Run kinit as Kafka if Kerberos is used to set up the Ticket Granting Ticket (TGT).
  6. Set JAVA_HOME.
  7. Run the command /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-kafka.sh to import Kafka topics into Atlas.