Workload Slice & Dice

Slice & Dice enables you to analyze enterprise data warehouse (EDW) and Hadoop workloads by risk, query similarity, or by custom attributes specified in the query workload file that you upload to Navigator Optimizer.

To use Slice & Dice, first upload a CSV workload file. (See Formatting CSV Files for information about what to include in CSV workload files.) When Navigator Optimizer finishes analyzing the queries in the workload, it automatically displays the Dashboard with the Workload Slice & Dice tab already selected. On this tab, you can view the workload by attributes you specified in the CSV file before you uploaded it to Navigator Optimizer.

Typical Use Cases

Offloading Workloads from an Enterprise Data Warehouse to Hadoop

Slice Workloads by Risk:

When you are analyzing workloads to offload to Hadoop, Slice and Dice can identify low-risk queries that you might be able to run on Hadoop without optimization. For example, over 75% of the queries in this workload are low risk to offload to Impala:

Using Risk Attribute on the Workload Slice & Dice Tab of the Dashboard

Slice Workloads by Custom Attributes:

By toggling the Group queries by list, you can view risks associated with the custom attributes you specified in the workload file that you uploaded. In the following example, "REPORTNAME” was specified as one of the attributes, so Navigator Optimizer can filter on it:

Using Custom Attributes on the Workload Slice & Dice Tab of the Dashboard

Navigator Optimizer identifies sets of similar queries that are identical in structure and involve the same tables, but reference different columns, joins, filters, or group by clauses. This enables you to find clusters of queries and tables that are used together and might need to be offloaded as a set.

For details about slicing workloads by similar queries for offload projects, see Use Case: Identify Similar Query Sets to Offload First.

Hive and Impala Workload Optimization Analysis

Slice Workloads by CPU, Memory, Reads, and Writes:

Along with query text, pull CPU times, memory usage, and file system reads/writes from database or log files and use a function like floor (log10()) to group the values into categories. For example, applying floor(log10(CPU_time)) reduces the number of distinct values for CPU_time into a reasonable set for analysis. Create a workload file with these custom categories and upload it to Navigator Optimizer. Then you can toggle the Group queries by list to identify the queries that consume the most CPU or that cause the highest number of disk reads:

Using CPU, Memory, Reads, and Writes Attributes on the Workload Slice & Dice Tab of the Dashboard

For details about slicing workloads to optimize Hive or Impala workloads, see Use Case: Evaluate Query Sets by CPU Time, Memory Usage, and File System Reads/Writes.

Slice & Dice Features That Support More Granular Analysis

Slice and Dice Workloads by "Compatibility"

To support efficient strategies for offloading sets of low risk queries to Hive or Impala, Navigator Optimizer now supports slicing query workloads by compatibility and by risk to the target platform specified. First, slice by Compatiblity:



Then view the segments of the workload that present low, medium, or high risk to the target platform where you want to migrate the workload. This enables you to identify low risk queries quickly so you can add them to the design page for analysis:



Slice and Dice Workloads by Compatibility in the Dashboard

Watch this feature in action:

Video: Slice and Dice Workloads by Compatibility

Click the YouTube link at the lower right corner of the video player to view the video at full-screen on YouTube.

Similar Queries Grouped by Level of Risk

Navigator Optimizer now slices groups of similar queries into three levels of risk: low, medium, and high. After uploading the workload, on the Dashboard, choose Group queries by Similar Queries and Navigator Optimizer automatically groups the similar queries into subcategories of risk:



Drill down to view more information about queries in a risk group:



Sort Queries by Incompatibility, Elapsed Time, or Uniqueness

Sort Similar Queries

Similar query groups can be grouped by Incompatibility Count, Total Elapsed Time, or Total Unique Queries.

Watch this feature in action:

Video: Sort Similar Queries by Incompatibility, Elapsed Time, or Uniqueness

Click the YouTube link at the lower right corner of the video player to view the video at full-screen on YouTube.

Sort Custom Tagged Queries

You can also sort workloads that contain custom tags with the same categories: Incompatibility Count, Total Elapsed Time, or Total Unique Queries.
  1. Upload a CSV file with custom tagged columns. See Formatting CSV Files.
  2. In the Dashboard, slice the workload by one of the custom tags:



  3. Then sort the custom tag group by incompatibility, elapsed time, or uniqueness: