Command Line Installation
Also available as:
PDF
loading table of contents...

Configuring Apache Sqoop Hook

Apache Sqoop has added a SqoopDataPublisher class that publishes data to Atlas after import jobs are completed. Today, only hiveImport is supported in sqoopHook. This is used to add entities in Atlas using the model defined in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. Complete the following instructions in your Sqoop set-up to add the Sqoop hook for Atlas in the <sqoop-conf>/sqoop-site.xml file:

  1. Add the Sqoop job publisher class. Currrently only one publishing class is supported.

    <property>
         <name>sqoop.job.data.publish.class</name>
         <value>org.apache.atlas.sqoop.hook.SqoopHook</value>
    </property>
  2. Add the Atlas cluster name:

    <property>
         <name>atlas.cluster.name</name>
         <value><clustername></value>
    </property>
  3. Copy the application and client properties from the Atlas config address.

  4. Define atlas.cluster.name and atlas.rest.address properties in the Sqoop configuration file sqoop-site.xml file.

  5. Add ATLAS_HOME to the /usr/hdp/<version>/sqoop/bin.

    export ATLAS_HOME=${ATLAS_HOME:-/usr/hdp/2.5.5.0-1245/atlas}
  6. Add the following information to the $SQOOP_HOME/bin/configure-sqoop file after the line ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper} .

    if [ -e "$ATLAS_HOME/hook/sqoop" -a -e "$ATLAS_HOME/hook/hive" ]; then
      for f in $ATLAS_HOME/hook/sqoop/*.jar; do
        SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
      done
      for f in $ATLAS_HOME/hook/hive/*.jar; do
        SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
      done
    fi
  7. Copy the Atlas <atlas-conf>/application.properties file and the <atlas-conf>/client.properties file to the <sqoop-conf>/ directory.

  8. Link <atlas-home>/hook/sqoop/*.jar in sqoop lib.

Limitations

Currently, only hiveImport jobs are published to Atlas by the Sqoop hook.