Enabling Oozie workflows that access Ozone storage

This section provides some examples of how to enable some Oozie workflows to use Ozone storage.

First you create Ozone volume and bucket:
kinit admin
ozone sh volume create /admin
ozone sh bucket create /admin/oozie

After you create volume and bucket, ensure that your user has the appropriate privileges on the buckets or keys in Ranger. For more details, see Using Ranger with Ozone.

Oozie Fs action

Learn how to enable some Oozie workflows to use Ozone storage through Fs action.

  1. Create a workflow XML file, to create, move, and delete directories, and add files.
    In this example the XML file name is fs_wf.xml.
    <workflow-app name="oozie_ozone_wf" xmlns="uri:oozie:workflow:0.5">
        <start to="create-dir"/>
        <kill name="Kill">
            <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <action name="create-dir">
            <fs>
                  <mkdir path='ofs://ozone1/admin/oozie/dir1'/>
                  <mkdir path='ofs://ozone1/admin/oozie/dir2'/>
                  <touchz path='ofs://ozone1/admin/oozie/dir1/file1'/>
            </fs>
            <ok to="move-file"/>
            <error to="Kill"/>
        </action>
        <action name="move-file">
            <fs>
                  <move source='ofs://ozone1/admin/oozie/dir1/file1' target='ofs://ozone1/admin/oozie/dir2/file1'/>
            </fs>
            <ok to="del-dir"/>
            <error to="Kill"/>
        </action>
        <action name="del-dir">
            <fs>
                  <delete path='ofs://ozone1/admin/oozie/dir1'/>
            </fs>
            <ok to="End"/>
            <error to="Kill"/>
        </action>
        <end name="End"/>
    </workflow-app>
  2. Upload the workflow file to Ozone.
    ## Create a directory on ozone to store the workflow.xml
    ozone fs -mkdir ofs://ozone1/admin/oozie/wf
    ozone fs -put fs_wf.xml ofs://ozone1/admin/oozie/wf/workflow.xml
  3. Create a properties file.
    In this example the properties file name is fs_job.properties.
    user.name=admin
    oozie.wf.application.path=ofs://ozone1/admin/oozie/wf
    oozie.use.system.libpath=True
  4. Run the Oozie workflow.
    ## Run Oozie job
    oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config fs_job.properties -run

Oozie Hive2 action

Learn how to test creating and inserting a Hive table on Ozone.

  1. Create a workflow file, to run a Hive script, and modify cluster details in the workflow file as necessary.
    In this example the name of the workflow file is hive_ozone_wf.xml.
    <workflow-app name="ozone_hive_wf" xmlns="uri:oozie:workflow:0.5">
      <credentials>
        <credential name="hive2" type="hive2">
          <property>
            <name>hive2.jdbc.url</name>
            <value>jdbc:hive2://schal-ooz20-2.schal-ooz20.root.hwx.site:2181/default;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=WTtrsQKjCimqLaArf5oe9TUBxvBSDODfDfZ13Tubkfh;zooKeeperNamespace=hiveserver2</value>
          </property>
          <property>
            <name>hive2.server.principal</name>
            <value>hive/_HOST@ROOT.HWX.SITE</value>
          </property>
        </credential>
      </credentials>
        <start to="hive2-test"/>
        <kill name="Kill">
            <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <action name="hive2-test" cred="hive2">
            <hive2 xmlns="uri:oozie:hive2-action:0.1">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <jdbc-url>jdbc:hive2://schal-ooz20-2.schal-ooz20.root.hwx.site:2181/default;principal=hive/_HOST@ROOT.HWX.SITE;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=WTtrsQKjCimqLaArf5oe9TUBxvBSDODfDfZ13Tubkfh;zooKeeperNamespace=hiveserver2</jdbc-url>
                <script>ofs://ozone1/admin/oozie/hive_wf/hive_script.sql</script>
            </hive2>
            <ok to="End"/>
            <error to="Kill"/>
        </action>
        <end name="End"/>
    </workflow-app>
  2. Create a hive script, to create and insert data into a table on Ozone.
    In this example the hive script is hive_script.sql.
    CREATE EXTERNAL TABLE `oozie_test`(`code` string, `description` string, `total_emp` int, `salary` int)
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE 
    LOCATION 'ofs://ozone1/hive/warehouse/default.db/oozie_test';
    insert into table default.oozie_test values ('oh-0001','Oozie Hive Insert test',1000,110000);
  3. Create a directory on Ozone to store the workflow.xml file.
    For example:
    ozone fs -mkdir ofs://ozone1/admin/oozie/hive_wf
    ozone fs -put hive_ozone_wf.xml ofs://ozone1/admin/oozie/hive_wf/workflow.xml
    ozone fs -put hive_script.sql ofs://ozone1/admin/oozie/hive_wf/
    
    ## Run Oozie job
    oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config hive_ozone_job.properties -run
  4. Create a properties file, and modify cluster details in the properties file.
    In this example the properties file is hive_ozone_job.properties.
    nameNode=hdfs://schal-ooz20-2.schal-ooz20.root.hwx.site:8020
    jobTracker=schal-ooz20-2.schal-ooz20.root.hwx.site:8032
    mapreduce.job.user.name=admin
    user.name=admin
    oozie.wf.application.path=ofs://ozone1/admin/oozie/hive_wf
    oozie.use.system.libpath=True
  5. Run the Oozie job.
    oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config hive_ozone_job.properties -run
  6. Verify that the table is created in Hive.
    ## Open beeline shell and run the following
    select * from default.oozie_test where code='oh-0001';

Oozie Spark action

Learn how to test inserting and selecting from a table created on Ozone using Spark engine.

  1. Create a workflow file, to run a PySpark job, and modify cluster details in the workflow file as necessary.
    In this example the workflow file is spark_ozone_wf.xml.
    <workflow-app name="spark_ozone_wf" xmlns="uri:oozie:workflow:0.5">
       <credentials>
        <credential name="hcat" type="hcat">
          <property>
            <name>hcat.metastore.uri</name>
            <value>thrift://schal-ooz20-2.schal-ooz20.root.hwx.site:9083</value>
          </property>
          <property>
            <name>hcat.metastore.principal</name>
            <value>hive/_HOST@ROOT.HWX.SITE</value>
          </property>
        </credential>
      </credentials>
        <start to="spark-test"/>
        <kill name="Kill">
            <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <action name="spark-test" cred="hcat">
            <spark xmlns="uri:oozie:spark-action:0.1">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <master>yarn</master>
                <mode>cluster</mode>
                <name>Spark Ozone Example</name>
                <jar>spark_ozone_test.py</jar>
                <spark-opts>--num-executors 2 --executor-cores 2 --executor-memory 4g --driver-memory 2g </spark-opts>
            </spark>
            <ok to="End"/>
            <error to="Kill"/>
        </action>
        <end name="End"/>
    </workflow-app>
  2. Create a PySpark script to insert data into a table created on Ozone.
    In this example the PySpark script is spark_ozone_test.py.
    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder.appName("Spark Ozone Example").getOrCreate()
    
    spark.sql("select * from default.oozie_test where code='opi-0001'").show()
    
    spark.sql("insert into table default.oozie_test values ('opi-0001','Oozie PySpark Insert test',1000,110000)")
    
    spark.sql("select * from default.oozie_test where code='opi-0001'").show()
  3. Create a directory on Ozone to store the workflow.xml file.
    ozone fs -mkdir -p ofs://ozone1/admin/oozie/spark_wf/lib
    ozone fs -put spark_ozone_wf.xml ofs://ozone1/admin/oozie/spark_wf/workflow.xml
    ozone fs -put spark_ozone_test.py ofs://ozone1/admin/oozie/spark_wf/lib/
  4. Create a properties file, and modify cluster details in the properties file as necessary.
    In this example the properties file is spark_ozone_job.properties.
    nameNode=hdfs://schal-ooz20-2.schal-ooz20.root.hwx.site:8020
    jobTracker=schal-ooz20-2.schal-ooz20.root.hwx.site:8032
    mapreduce.job.user.name=admin
    user.name=admin
    oozie.wf.application.path=ofs://ozone1/admin/oozie/spark_wf
    oozie.use.system.libpath=True
  5. Run the Oozie job.
    oozie -Djavax.net.ssl.trustStore={trustStoreFile} -Djavax.net.ssl.trustStorePassword={trustStorePassword} job -oozie https://{oozieHost}:{ooziePort}/oozie -config spark_ozone_job.properties -run
  6. Verify that the data is inserted into the table.
    ## Open beeline shell and run the following
    select * from default.oozie_test where code='opi-0001';