Using Audit Events to Understand Cluster Activity

Actions performed on the cluster by users and by services are tracked by Cloudera Navigator. Using the filters available on the Audit page of the Cloudera Navigator console lets you quickly find answers to a variety of questions such as:

The examples below step through these common use cases.

To work with Audit Events in the Cloudera Navigator console, you must have an account with the user role of Auditing Viewer or Navigator Administrator .

To view Audit Events in the Cloudera Navigator console and apply filters:
  1. Log in to the Cloudera Navigator console.
  2. Click the Audits tab. The default view displays a list of all Audit Events for the past hour (note the date range field above the listing, below the Actions button).
  3. Configure and apply filters as needed to find specific events by username, by source, for selected time frames, and so on.

What Did a Specific User Do on a Specific Day?

A data engineer complains that a job was not running as well as expected one afternoon and wants to find out what might have occurred.

To filter the audit details to these specifics, you need only the user name and the time frame. The example isolates actions by the user name cmjobuser for a one-hour period on November 3, 2016:
  1. Limit the list of audit events to those for a specific user by defining a username filter:
    1. Click Filters to open the filter selector.
    2. Select Username from the Select Property... drop-down selector.
    3. Set the evaluation operator (=, like, !=) to = (equals sign).
    4. Type the username (cmjobuser for this example).

    5. Click Apply. The filter is applied to the audit events and the list displays only those events for the selected username.
  2. Limit the Audit Events list for the selected username to only those events that occurred on a specific date at a specific time:
    1. Click the arrow next to the current date range field to open the date and time range options selector. Time frame options range from the last 2, 6, 12, or 24 hours to the last 7 or last 30 days, but also let you set a custom range.

    2. Click the Custom Range link to open the date and time range selector.
    3. Use the field controls to set the start date and time and the end date and time for your custom time frame, as shown in this example:

    4. Click Apply.
The date and time range is applied to the Audit Events filtered by username and displays the resulting list in reverse chronological order (end date and time at the top of the list). The Audit Events for username cmjobuser during the one-hour period from 3:00 to 4:00 PM on November 3, 2016 are shown below.

Who Deleted Files from Hive Warehouse Directory?

A data steward is wondering what happened to some critical files in the data warehouse—they seem to have been deleted. The organization uses the Hive warehouse located in the default directory:
  • /user/hive/warehouse
To find out who deleted files from this directory, create a filter that uses the Source property in conjunction with the Delete operation.
  1. Limit the list of Audit Events to only those for the /user/hive/warehouse directory:
    1. Click Filters.
    2. Select Source from the Select Property... drop-down selector.
    3. Set the evaluation operator to like.
    4. Type /user/hive/warehouse in the entry field for the Source property.
    5. Click Apply. The source filter added to the list of filters and applied to Audit Events, returning an updated Audit Events list.
  2. Find all delete operations that occurred on the source (the Hive warehouse directory, in this example):
    1. Click Filters. The Source filter displays the filter previously configured. Click the plus sign (+) at the end of the filter definition to add new filter setting fields.
    2. Select Operation from the Select Property... drop-down selector.
    3. Set the evaluation operator to =.
    4. Type delete in the entry field for Operation property.

    5. Click Apply. The filter to select delete operations is added to the list of filters and applied to Audit Events. An updated Audit Events list is returned, showing only delete operations for sources similar to /user/hive/warehouse.

The results show both completed and attempted delete operations (attempted but unsuccessful are in red). All these events are associated with username navigator_user from the same IP address and using the same service (hdfs) over the course of a day (June 9, 2016).



With this information, a Navigator Administrator can go to the Administration tab of the Cloudera Navigator console and find out who is associated with the navigator_user account.

What Happened to Data in the Database?

The data files associated with a database are typically partitioned into files or directories by date, often by year. The system admin discovers that the data from 2015 is missing from the production database—only data prior to 2010 has been archived off the system, so data from 2015 should still be there.

Files for data created in 2015 include 2015 in their filenames.

To determine what happened to the data stored in folders and files in the year 2015, do the following:
  1. Filter the list of events for sources containing 2015:
    1. Click Filters.
    2. Select Source from the Select Property... drop-down selector. Sources can be paths to HDFS files or directories.
    3. Set the evaluation operator to like.
    4. Type 2015 in the entry field for the Source property.
    5. Click Apply. The source filter added to the list of filters and applied to Audit Events, returning an updated Audit Events list.
  2. Filter the list of events for the delete operation:
    1. Click Filters. The Source filter displays the filter previously configured. Click the plus sign (+) at the end of the filter definition to add new filter setting fields.
    2. Select Operation from the Select Property... drop-down selector.
    3. Set the evaluation operator to =.
    4. Type delete in the entry field for the Operation property.

    5. Click Apply. The operation filter added to the list of filters and applied to Audit Events, returning an updated list containing Audit Events for delete operations from sources like 2015.
  3. Set the date range to one year:
    1. Click the date-time field at the top right of the Audit Events page.
    2. To set the range to be the last year, click Custom Range. The Selected Range field is enabled for input.
    3. In the left date field, use the field controls to specify a date one year ago.
    4. Click Apply.

The result of all filters and the selected time frame reveals that the hdfs username deleted directories containing 2015 in their names:


Who ran which operation against a table?

When a change occurs to a data asset, it can cause problems to reports or other processes that use the data asset. You can use the Audit events to find when and what caused a change in a data asset. This example describes how to use audits to determine what caused a schema change to a table so you can identify the user or process that may be causing unwanted changes.

To find all operations against a specific table in a general time range:

  1. Filter the list of events for the table.
    1. Click Filters.
    2. Select Table Name from the Select Property... drop-down selector.
    3. Set the evaluation operator to like.
    4. Type the part or all of the table name in the entry field for the Table Name property.
    5. Click Apply. The source filter added to the list of filters and applied to Audit Events, returning an updated Audit Events list.
  2. Set the date range to the general time frame where the change occurred:
    1. Click the date-time field at the top right of the Audit Events page.
    2. To set the range, click Custom Range. The Selected Range field is enabled for input.
    3. In the left date field, use the field controls to specify the earliest date where the change may have occurred.

      To include the entire day, set the time to indicate the beginning of the day, such as 12:01 AM.

    4. In the right date field, specify the latest date where the change may have occurred.

      To include the entire day, set the time to indicate the end of the day, such as 11:59 PM.

    5. Click Apply.
  3. Review the results, looking at both the operation and the service name.
  4. If the results are too large, consider adding additional filters:
    • By Service Name. For example, to show all Hive operations, use Service Name like hive.
    • By Operation. For example, to show only Hive operations that change the table metadata, use Operation like alter.