Configuring Apache Sqoop Hook
Apache Sqoop has added a SqoopDataPublisher
class that publishes data
to Atlas after import jobs are completed. Today, only hiveImport is supported in
sqoopHook. This is used to add entities in Atlas using the model defined in
org.apache.atlas.sqoop.model.SqoopDataModelGenerator. Complete the following
instructions in your Sqoop set-up to add the Sqoop hook for Atlas in the
<sqoop-conf>/sqoop-site.xml
file:
Add the Sqoop job publisher class. Currrently only one publishing class is supported.
<property> <name>sqoop.job.data.publish.class</name> <value>org.apache.atlas.sqoop.hook.SqoopHook</value> </property>
Add the Atlas cluster name:
<property> <name>atlas.cluster.name</name> <value><clustername></value> </property>
Copy the application and client properties from the Atlas config address.
Define
atlas.cluster.name
andatlas.rest.address
properties in the Sqoop configuration filesqoop-site.xml
file.Add ATLAS_HOME to the
/usr/hdp/<version>/sqoop/bin
.export ATLAS_HOME=${ATLAS_HOME:-/usr/hdp/2.5.0.0-1245/atlas}
Add the following information to the
$SQOOP_HOME/bin/configure-sqoop
file after the lineZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}
.if [ -e "$ATLAS_HOME/hook/sqoop" -a -e "$ATLAS_HOME/hook/hive" ]; then for f in $ATLAS_HOME/hook/sqoop/*.jar; do SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f; done for f in $ATLAS_HOME/hook/hive/*.jar; do SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f; done fi
Copy the Atlas
<atlas-conf>/application.properties
file and the<atlas-conf>/client.properties
file to the<sqoop-conf>/
directory.Link
<atlas-home>/hook/sqoop/*.jar
in sqoop lib.
Limitations
Currently, only hiveImport
jobs are published to Atlas by the Sqoop hook.