Enabling Oozie workflows that access Ozone storage
This section provides some examples of how to enable some Oozie workflows to use Ozone storage.
First you create Ozone volume and
bucket:
kinit <admin user>
ozone sh volume create /admin
ozone sh bucket create /admin/oozie
After you
create volume and bucket, ensure that your user has the appropriate privileges on the buckets
or keys in Ranger. For more details, see Using Ranger with Ozone.
Oozie Fs action
Learn how to enable some Oozie workflows to use Ozone storage through Fs action.
-
Create a workflow XML file, to create, move, and delete directories, and add
files.
In this example the XML file name is fs_wf.xml.
<workflow-app name="oozie_ozone_wf" xmlns="uri:oozie:workflow:0.5"> <start to="create-dir"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="create-dir"> <fs> <mkdir path='ofs://ozone1/admin/oozie/dir1'/> <mkdir path='ofs://ozone1/admin/oozie/dir2'/> <touchz path='ofs://ozone1/admin/oozie/dir1/file1'/> </fs> <ok to="move-file"/> <error to="Kill"/> </action> <action name="move-file"> <fs> <move source='ofs://ozone1/admin/oozie/dir1/file1' target='ofs://ozone1/admin/oozie/dir2/file1'/> </fs> <ok to="del-dir"/> <error to="Kill"/> </action> <action name="del-dir"> <fs> <delete path='ofs://ozone1/admin/oozie/dir1'/> </fs> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
-
Upload the workflow file to Ozone.
## Create a directory on ozone to store the workflow.xml ozone fs -mkdir ofs://ozone1/admin/oozie/wf ozone fs -put fs_wf.xml ofs://ozone1/admin/oozie/wf/workflow.xml
-
Create a properties file.
In this example the properties file name is fs_job.properties.
user.name=admin oozie.wf.application.path=ofs://ozone1/admin/oozie/wf oozie.use.system.libpath=True
-
Run the Oozie workflow.
## Run Oozie job oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config fs_job.properties -run
Oozie Hive2 action
Learn how to test creating and inserting a Hive table on Ozone.
-
Create a workflow file, to run a Hive script, and modify cluster details in the
workflow file as necessary.
In this example the name of the workflow file is hive_ozone_wf.xml.
<workflow-app name="ozone_hive_wf" xmlns="uri:oozie:workflow:0.5"> <credentials> <credential name="hive2" type="hive2"> <property> <name>hive2.jdbc.url</name> <value>jdbc:hive2://schal-ooz20-2.schal-ooz20.root.hwx.site:2181/default;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=WTtrsQKjCimqLaArf5oe9TUBxvBSDODfDfZ13Tubkfh;zooKeeperNamespace=hiveserver2</value> </property> <property> <name>hive2.server.principal</name> <value>hive/_HOST@ROOT.HWX.SITE</value> </property> </credential> </credentials> <start to="hive2-test"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="hive2-test" cred="hive2"> <hive2 xmlns="uri:oozie:hive2-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <jdbc-url>jdbc:hive2://schal-ooz20-2.schal-ooz20.root.hwx.site:2181/default;principal=hive/_HOST@ROOT.HWX.SITE;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=WTtrsQKjCimqLaArf5oe9TUBxvBSDODfDfZ13Tubkfh;zooKeeperNamespace=hiveserver2</jdbc-url> <script>ofs://ozone1/admin/oozie/hive_wf/hive_script.sql</script> </hive2> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
-
Create a hive script, to create and insert data into a table on Ozone.
In this example the hive script is hive_script.sql.
CREATE EXTERNAL TABLE `oozie_test`(`code` string, `description` string, `total_emp` int, `salary` int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'ofs://ozone1/hive/warehouse/default.db/oozie_test'; insert into table default.oozie_test values ('oh-0001','Oozie Hive Insert test',1000,110000);
-
Create a directory on Ozone to store the workflow.xml
file.
For example:
ozone fs -mkdir ofs://ozone1/admin/oozie/hive_wf ozone fs -put hive_ozone_wf.xml ofs://ozone1/admin/oozie/hive_wf/workflow.xml ozone fs -put hive_script.sql ofs://ozone1/admin/oozie/hive_wf/ ## Run Oozie job oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config hive_ozone_job.properties -run
-
Create a properties file, and modify cluster details in the properties
file.
In this example the properties file is
hive_ozone_job.properties
.nameNode=hdfs://schal-ooz20-2.schal-ooz20.root.hwx.site:8020 jobTracker=schal-ooz20-2.schal-ooz20.root.hwx.site:8032 mapreduce.job.user.name=admin user.name=admin oozie.wf.application.path=ofs://ozone1/admin/oozie/hive_wf oozie.use.system.libpath=True
-
Run the Oozie job.
oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config hive_ozone_job.properties -run
-
Verify that the table is created in Hive.
## Open beeline shell and run the following select * from default.oozie_test where code='oh-0001';
Oozie Spark action
Learn how to test inserting and selecting from a table created on Ozone using Spark engine.
-
Create a workflow file, to run a PySpark job, and modify cluster details in the
workflow file as necessary.
In this example the workflow file is spark_ozone_wf.xml.
<workflow-app name="spark_ozone_wf" xmlns="uri:oozie:workflow:0.5"> <credentials> <credential name="hcat" type="hcat"> <property> <name>hcat.metastore.uri</name> <value>thrift://schal-ooz20-2.schal-ooz20.root.hwx.site:9083</value> </property> <property> <name>hcat.metastore.principal</name> <value>hive/_HOST@ROOT.HWX.SITE</value> </property> </credential> </credentials> <start to="spark-test"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="spark-test" cred="hcat"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn</master> <mode>cluster</mode> <name>Spark Ozone Example</name> <jar>spark_ozone_test.py</jar> <spark-opts>--num-executors 2 --executor-cores 2 --executor-memory 4g --driver-memory 2g </spark-opts> </spark> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
-
Create a PySpark script to insert data into a table created on Ozone.
In this example the PySpark script is spark_ozone_test.py.
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Spark Ozone Example").getOrCreate() spark.sql("select * from default.oozie_test where code='opi-0001'").show() spark.sql("insert into table default.oozie_test values ('opi-0001','Oozie PySpark Insert test',1000,110000)") spark.sql("select * from default.oozie_test where code='opi-0001'").show()
-
Create a directory on Ozone to store the workflow.xml
file.
ozone fs -mkdir -p ofs://ozone1/admin/oozie/spark_wf/lib ozone fs -put spark_ozone_wf.xml ofs://ozone1/admin/oozie/spark_wf/workflow.xml ozone fs -put spark_ozone_test.py ofs://ozone1/admin/oozie/spark_wf/lib/
-
Create a properties file, and modify cluster details in the properties file as
necessary.
In this example the properties file is spark_ozone_job.properties.
nameNode=hdfs://schal-ooz20-2.schal-ooz20.root.hwx.site:8020 jobTracker=schal-ooz20-2.schal-ooz20.root.hwx.site:8032 mapreduce.job.user.name=admin user.name=admin oozie.wf.application.path=ofs://ozone1/admin/oozie/spark_wf oozie.use.system.libpath=True
-
Run the Oozie job.
oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config spark_ozone_job.properties -run
-
Verify that the data is inserted into the table.
## Open beeline shell and run the following select * from default.oozie_test where code='opi-0001';