Managing Hive and Impala Lineage Properties

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

For Hive and Impala, query information is not extracted by the Navigator Metadata Server. Instead, query information is written by the services to log files. The directory containing the log files is monitored by the Cloudera Manager Agent. Periodically the log files are collected by the Cloudera Manager Agent and forwarded to the Navigator Metadata Server.

Enabling and Disabling Hive and Impala Lineage

The Enable Lineage Collection property determines whether lineage logs should be collected by the Cloudera Manager Agent. Both Hive and Impala lineage collection is enabled by default. To control whether the Impala Daemon role logs to the lineage log and whether the Cloudera Manager Agent collects the Hive and Impala lineage entries:
  1. Go to the Hive or Impala service.
  2. Click the Configuration tab.
  3. Type lineage in the Search box.
  4. Select or deselect the Enable Lineage Collection checkbox.
  5. (Impala only) Select or deselect the Enable Impala Lineage Generation checkbox.
  6. Click Save Changes to commit the changes.
  7. Restart the service.
If you deselect either Impala checkbox, Impala lineage is disabled.

Configuring Hive on Spark and Impala Daemon Lineage Logs

The following properties apply to the Hive and Impala lineage log files:
  • Hive Lineage Log Directory - The directory in which Hive lineage log files are written.
  • Hive Maximum Lineage Log File Size - The maximum size in MB of the Hive lineage log file before a new file is created.
  • Enable Impala Lineage Generation - Indicates whether Impala lineage logs should be generated.
  • Impala Daemon Lineage Log Directory - The directory in which Impala lineage log files are written.
  • Impala Daemon Maximum Lineage Log File Size - The maximum size in number of queries of the lineage log file before a new file is created.
If the value of a log directory property is changed, and service is restarted, the Cloudera Manager Agent starts monitoring the new log directory. In this case it is possible that not all events are published from the old directory. To avoid losing lineage information when this property is changed, perform the following steps:
  1. Stop the affected service.
  2. Copy the lineage log files and (for Impala only) the impalad_lineage_wal file from the old log directory to the new log directory. This needs to be done on the HiveServer2 host and all the hosts where Impala Daemon roles are running.
  3. Start the service.

To edit lineage log properties:

  1. Go to the service.
  2. Click the Configuration tab.
  3. Type lineage in the Search box.
  4. Edit the lineage log properties.
  5. Click Save Changes to commit the changes.
  6. Restart the service.