org.apache.hadoop.hive.ql.exec.tez
Class DagUtils

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.tez.DagUtils

public class DagUtils
extends Object

DagUtils. DagUtils is a collection of helper methods to convert map and reduce work to tez vertices and edges. It handles configuration objects, file localization and vertex/edge creation.


Method Summary
 void addCredentials(BaseWork work, org.apache.tez.dag.api.DAG dag)
          Set up credentials for the base work on secure clusters
 org.apache.hadoop.mapred.JobConf createConfiguration(HiveConf hiveConf)
          Creates and initializes a JobConf object that can be used to execute the DAG.
 org.apache.tez.dag.api.Edge createEdge(org.apache.hadoop.mapred.JobConf vConf, org.apache.tez.dag.api.Vertex v, org.apache.hadoop.mapred.JobConf wConf, org.apache.tez.dag.api.Vertex w, TezEdgeProperty edgeProp)
          Given two vertices and their respective configuration objects createEdge will create an Edge object that connects the two.
 org.apache.tez.dag.api.GroupInputEdge createEdge(org.apache.tez.dag.api.VertexGroup group, org.apache.hadoop.mapred.JobConf wConf, org.apache.tez.dag.api.Vertex w, TezEdgeProperty edgeProp)
          Given a Vertex group and a vertex createEdge will create an Edge between them.
 org.apache.tez.client.PreWarmContext createPreWarmContext(org.apache.tez.client.TezSessionConfiguration sessionConfig, int numContainers, Map<String,org.apache.hadoop.yarn.api.records.LocalResource> localResources)
           
 org.apache.hadoop.fs.Path createTezDir(org.apache.hadoop.fs.Path scratchDir, org.apache.hadoop.conf.Configuration conf)
          createTezDir creates a temporary directory in the scratchDir folder to be used with Tez.
 org.apache.tez.dag.api.Vertex createVertex(org.apache.hadoop.mapred.JobConf conf, BaseWork work, org.apache.hadoop.fs.Path scratchDir, org.apache.hadoop.yarn.api.records.LocalResource appJarLr, List<org.apache.hadoop.yarn.api.records.LocalResource> additionalLr, org.apache.hadoop.fs.FileSystem fileSystem, Context ctx, boolean hasChildren, TezWork tezWork)
          Create a vertex from a given work object.
 String getBaseName(org.apache.hadoop.yarn.api.records.LocalResource lr)
           
 org.apache.hadoop.fs.Path getDefaultDestDir(org.apache.hadoop.conf.Configuration conf)
           
 String getExecJarPathLocal()
           
 org.apache.hadoop.fs.FileStatus getHiveJarDirectory(org.apache.hadoop.conf.Configuration conf)
           
static DagUtils getInstance()
          Singleton
 String getResourceBaseName(org.apache.hadoop.fs.Path path)
           
static String[] getTempFilesFromConf(org.apache.hadoop.conf.Configuration conf)
           
 org.apache.hadoop.fs.Path getTezDir(org.apache.hadoop.fs.Path scratchDir)
          Gets the tez dir that belongs to the hive scratch dir
 org.apache.hadoop.mapred.JobConf initializeVertexConf(org.apache.hadoop.mapred.JobConf conf, BaseWork work)
          Creates and initializes the JobConf object for a given BaseWork object.
 org.apache.hadoop.yarn.api.records.LocalResource localizeResource(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dest, org.apache.hadoop.conf.Configuration conf)
           
 List<org.apache.hadoop.yarn.api.records.LocalResource> localizeTempFiles(String hdfsDirPathStr, org.apache.hadoop.conf.Configuration conf, String[] inputOutputJars)
          Localizes files, archives and jars from a provided array of names.
 List<org.apache.hadoop.yarn.api.records.LocalResource> localizeTempFilesFromConf(String hdfsDirPathStr, org.apache.hadoop.conf.Configuration conf)
          Localizes files, archives and jars the user has instructed us to provide on the cluster as resources for execution.
 void updateConfigurationForEdge(org.apache.hadoop.mapred.JobConf vConf, org.apache.tez.dag.api.Vertex v, org.apache.hadoop.mapred.JobConf wConf, org.apache.tez.dag.api.Vertex w)
          Given two vertices a, b update their configurations to be used in an Edge a-b
static org.apache.hadoop.fs.FileStatus validateTargetDir(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

createEdge

public org.apache.tez.dag.api.GroupInputEdge createEdge(org.apache.tez.dag.api.VertexGroup group,
                                                        org.apache.hadoop.mapred.JobConf wConf,
                                                        org.apache.tez.dag.api.Vertex w,
                                                        TezEdgeProperty edgeProp)
                                                 throws IOException
Given a Vertex group and a vertex createEdge will create an Edge between them.

Parameters:
group - The parent VertexGroup
wConf - The job conf of the child vertex
w - The child vertex
edgeProp - the edge property of connection between the two endpoints.
Throws:
IOException

updateConfigurationForEdge

public void updateConfigurationForEdge(org.apache.hadoop.mapred.JobConf vConf,
                                       org.apache.tez.dag.api.Vertex v,
                                       org.apache.hadoop.mapred.JobConf wConf,
                                       org.apache.tez.dag.api.Vertex w)
                                throws IOException
Given two vertices a, b update their configurations to be used in an Edge a-b

Throws:
IOException

createEdge

public org.apache.tez.dag.api.Edge createEdge(org.apache.hadoop.mapred.JobConf vConf,
                                              org.apache.tez.dag.api.Vertex v,
                                              org.apache.hadoop.mapred.JobConf wConf,
                                              org.apache.tez.dag.api.Vertex w,
                                              TezEdgeProperty edgeProp)
                                       throws IOException
Given two vertices and their respective configuration objects createEdge will create an Edge object that connects the two.

Parameters:
vConf - JobConf of the first vertex
v - The first vertex (source)
wConf - JobConf of the second vertex
w - The second vertex (sink)
Returns:
Throws:
IOException

createPreWarmContext

public org.apache.tez.client.PreWarmContext createPreWarmContext(org.apache.tez.client.TezSessionConfiguration sessionConfig,
                                                                 int numContainers,
                                                                 Map<String,org.apache.hadoop.yarn.api.records.LocalResource> localResources)
                                                          throws IOException,
                                                                 org.apache.tez.dag.api.TezException
Parameters:
sessionConfig - session configuration
numContainers - number of containers to pre-warm
localResources - additional resources to pre-warm with
Returns:
prewarm context object
Throws:
IOException
org.apache.tez.dag.api.TezException

getDefaultDestDir

public org.apache.hadoop.fs.Path getDefaultDestDir(org.apache.hadoop.conf.Configuration conf)
                                            throws LoginException,
                                                   IOException
Parameters:
conf -
Returns:
path to destination directory on hdfs
Throws:
LoginException - if we are unable to figure user information
IOException - when any dfs operation fails.

localizeTempFilesFromConf

public List<org.apache.hadoop.yarn.api.records.LocalResource> localizeTempFilesFromConf(String hdfsDirPathStr,
                                                                                        org.apache.hadoop.conf.Configuration conf)
                                                                                 throws IOException,
                                                                                        LoginException
Localizes files, archives and jars the user has instructed us to provide on the cluster as resources for execution.

Parameters:
conf -
Returns:
List local resources to add to execution
Throws:
IOException - when hdfs operation fails
LoginException - when getDefaultDestDir fails with the same exception

getTempFilesFromConf

public static String[] getTempFilesFromConf(org.apache.hadoop.conf.Configuration conf)

localizeTempFiles

public List<org.apache.hadoop.yarn.api.records.LocalResource> localizeTempFiles(String hdfsDirPathStr,
                                                                                org.apache.hadoop.conf.Configuration conf,
                                                                                String[] inputOutputJars)
                                                                         throws IOException,
                                                                                LoginException
Localizes files, archives and jars from a provided array of names.

Parameters:
hdfsDirPathStr - Destination directoty in HDFS.
conf - Configuration.
inputOutputJars - The file names to localize.
Returns:
List local resources to add to execution
Throws:
IOException - when hdfs operation fails.
LoginException - when getDefaultDestDir fails with the same exception

getHiveJarDirectory

public org.apache.hadoop.fs.FileStatus getHiveJarDirectory(org.apache.hadoop.conf.Configuration conf)
                                                    throws IOException,
                                                           LoginException
Throws:
IOException
LoginException

validateTargetDir

public static org.apache.hadoop.fs.FileStatus validateTargetDir(org.apache.hadoop.fs.Path path,
                                                                org.apache.hadoop.conf.Configuration conf)
                                                         throws IOException
Throws:
IOException

getExecJarPathLocal

public String getExecJarPathLocal()
                           throws URISyntaxException
Throws:
URISyntaxException

getBaseName

public String getBaseName(org.apache.hadoop.yarn.api.records.LocalResource lr)

getResourceBaseName

public String getResourceBaseName(org.apache.hadoop.fs.Path path)
Parameters:
pathStr - - the string from which we try to determine the resource base name
Returns:
the name of the resource from a given path string.

localizeResource

public org.apache.hadoop.yarn.api.records.LocalResource localizeResource(org.apache.hadoop.fs.Path src,
                                                                         org.apache.hadoop.fs.Path dest,
                                                                         org.apache.hadoop.conf.Configuration conf)
                                                                  throws IOException
Parameters:
src - path to the source for the resource
dest - path in hdfs for the resource
conf -
Returns:
localresource from tez localization.
Throws:
IOException - when any file system related calls fails.

createConfiguration

public org.apache.hadoop.mapred.JobConf createConfiguration(HiveConf hiveConf)
                                                     throws IOException
Creates and initializes a JobConf object that can be used to execute the DAG. The configuration object will contain configurations from mapred-site overlaid with key/value pairs from the hiveConf object. Finally it will also contain some hive specific configurations that do not change from DAG to DAG.

Parameters:
hiveConf - Current hiveConf for the execution
Returns:
JobConf base configuration for job execution
Throws:
IOException

initializeVertexConf

public org.apache.hadoop.mapred.JobConf initializeVertexConf(org.apache.hadoop.mapred.JobConf conf,
                                                             BaseWork work)
Creates and initializes the JobConf object for a given BaseWork object.

Parameters:
conf - Any configurations in conf will be copied to the resulting new JobConf object.
work - BaseWork will be used to populate the configuration object.
Returns:
JobConf new configuration object

createVertex

public org.apache.tez.dag.api.Vertex createVertex(org.apache.hadoop.mapred.JobConf conf,
                                                  BaseWork work,
                                                  org.apache.hadoop.fs.Path scratchDir,
                                                  org.apache.hadoop.yarn.api.records.LocalResource appJarLr,
                                                  List<org.apache.hadoop.yarn.api.records.LocalResource> additionalLr,
                                                  org.apache.hadoop.fs.FileSystem fileSystem,
                                                  Context ctx,
                                                  boolean hasChildren,
                                                  TezWork tezWork)
                                           throws Exception
Create a vertex from a given work object.

Parameters:
conf - JobConf to be used to this execution unit
work - The instance of BaseWork representing the actual work to be performed by this vertex.
scratchDir - HDFS scratch dir for this execution unit.
list -
appJarLr - Local resource for hive-exec.
additionalLr -
fileSystem - FS corresponding to scratchDir and LocalResources
ctx - This query's context
Returns:
Vertex
Throws:
Exception

addCredentials

public void addCredentials(BaseWork work,
                           org.apache.tez.dag.api.DAG dag)
Set up credentials for the base work on secure clusters


createTezDir

public org.apache.hadoop.fs.Path createTezDir(org.apache.hadoop.fs.Path scratchDir,
                                              org.apache.hadoop.conf.Configuration conf)
                                       throws IOException
createTezDir creates a temporary directory in the scratchDir folder to be used with Tez. Assumes scratchDir exists.

Throws:
IOException

getTezDir

public org.apache.hadoop.fs.Path getTezDir(org.apache.hadoop.fs.Path scratchDir)
Gets the tez dir that belongs to the hive scratch dir


getInstance

public static DagUtils getInstance()
Singleton

Returns:
instance of this class


Copyright © 2014 The Apache Software Foundation. All rights reserved.