Configuring a Sqoop Action to Use Tez to Load Data into a Hive Table
You can use the Tez execution engine to load data into a Hive table using the
--hive-import
option,
In the code example in each step, replace the sample text in [square brackets] with the appropriate information for your configuration.
Create a workflow directory.
hdfs dfs -mkdir -p
[/user/dummy/app]
Create a
lib
directory in the workflow directory.hdfs dfs -mkdir -p
[/user/dummy/app/lib]
Copy the database JDBC driver jar file to the
lib
directory.hadoop fs -copyFromLocal
[/usr/share/java/mysql-connector-java.jar]
[/user/dummy/app/lib]
Copy the
hive-site.xml
andtez-site.xml
files to a location accessible by the workflow. For example:hadoop fs -copyFromLocal
[/etc/oozie/conf/action-conf/hive/hive-site.xml /user/dummy/app]
hadoop fs -copyFromLocal[/etc/oozie/conf/action-conf/hive/tez-site.xml /user/dummy/app]
In the Sqoop action of the workflow, do the following:
Add
hive-site
andtez-site
resources in the<file>
element of the Sqoop action in the workflow.<file>/user/dummy/app/hive-site.xml#hive-site.xml</file> <file>/user/dummy/app/tez-site.xml#tez-site.xml</file>
Include the
--hive-import
option in the<command>
element.<command>import --connect
[jdbc:mysql://db_host:port/database]
--username[user]
--password[pwd]
--driverc[om.mysql.jdbc.Driver]
--table[table_name]
--hive-import -m 1 </command>
Add the following into the
job.properties
file.oozie.use.system.libpath=true oozie.action.sharelib.for.sqoop=sqoop,hive
More information regarding the Sqoop parameters can be found in the Apache documentation at https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive
Example Workflow Action
Replace all sample text in [square brackets] in the example below with the appropriate workflow name, URI, paths, file names, etc. for your configuration.
<action name="sqoop-node"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <command>import --connect[jdbc:mysql://db_host:port/database]
--username[user]
--password[pwd]
--driver[com.mysql.jdbc.Driver]
--table[table_name]
--hive-import -m 1</command> <file>[/user/dummy/app/hive-site.xml#hive-site.xml]
</file> <file>[/user/dummy/app/tez-site.xml#tez-site.xml]
</file> </sqoop> <ok to="end"/> <error to="killJob"/> </action>