Correlation heatmap

Cloudera Data Visualization enables you to create Correlation Heatmap visuals. Correlation Heatmaps use colored cells, typically in a monochromatic scale, to show a 2D correlation matrix (table) between two discrete dimensions or event types. The values of the first dimensions appear as rows of the table, while the values of the second dimension are represented by the columns of the table. The color value of the cells is proportional to the number of measurements that match the dimensional values. This enables you to quickly identify incidence patterns, and to recognize anomalies.

Correlation Heatmap visuals are similar to Chords because they both compare exactly two dimensions. Correlation heatmaps are ideal for comparing the measurement for each pair of dimension values.

The following steps demonstrate how to create a new correlation visual on the SFPD Incidents dataset, which is based on data previously imported into Cloudera Data Visualization from the sfpd_incidents.csv datafile [data source default.sfpd_incidents].

For an overview of shelves that specify this visual, see Shelves for correlation heatmaps.

  1. Start a new visual based on the SFPD Incidents dataset.
    For instructions, see Creating a visual.
  2. In the VISUALS menu, find and click Correlation Heatmap.
    The shelves of the visual changed. They are now Dimensions, Measures, Tooltips, X Trellis, Y Trellis, and Filters. Both Dimensions and Measures are mandatory.
  3. To show specific items, populate the shelves from the available fields (Dimensions and Measures) in the DATA menu.
    1. Under Dimensions, select pddistrict and drag it to the Dimensions shelf.
    2. Under Dimensions, select descript and drag it to the Dimensions shelf.
    3. Under Measures, select Record Count and drag it to the Measures shelf.

      Record Count is defined as a sum of events. If you hover over it with your mouse, you can see a black detail bubble with sum(1) contents.

  4. Click REFRESH VISUAL.
    The default correlation heatmap visual appears.
    You can see that this dataset has a very large number of possible values that represent the columns of the table. If you scroll to the right, you will see some cells rendered in dark shades of green.
  5. To examine a shorter list of categories, you can add some filtering to the visual.
    1. Under Dimensions, select datetime and drag it to the Filters shelf .
    2. Repeat it with category and descript.
    3. On the Filters shelf, click the (down arrow) on the descript field, and then click Pick values from a list.
    4. Select a number of values.
      In this example, 7 distinct options were picked.
  6. Click REFRESH VISUAL.
    You can see that this smaller matrix also shows the entire range of color values.
  7. Click the pencil/edit icon next to the title of the visualization to enter a name for the visual.

    In this example, the title is changed to 'SFPD Incidents - Correlation Heat Map'. You can also add a brief description of the visual as a subtitle below the title of the visualization.

  8. At the top left corner of the Dashboard Designer, click SAVE.