Creating a basic scatter visual

Cloudera Data Visualization enables you to create a basic Scatter visual.

The following steps demonstrate how to create a new Scatter visual representation, showing the trend of life-expectancy and GDP per capita for selected countries.

  1. Start a new visual based on the World Life Expectancy dataset. The data source for this is samples.world_life_expectancy.
    For instructions, see Creating a visual.
  2. In the visuals menu, find and click Scatter.
    The shelves of the visual changed. They are now X Axis, Y Axis, Colors, Size, Transition, Tooltips, and Filters. The mandatory shelves for Scatter visuals are X Axis and Y Axis. The fields placed on these two shelves may be easily swapped by switching X Axis and Y Axis.
  3. Populate the shelves from the available fields of Dimensions and Measures in the DATA menu.
    1. Under Measures, select gdp_per_capita and drag it to the X shelf.
    2. Under Measures, select life_expectancy and drag it to the Y shelf.
  4. In both cases, remove the aggregate to see individual data points.
    1. On the shelf, on the sum(gdp_per_capita) field, click the (right arrow) icon to open the FIELD PROPERTIES menu.
    2. Expand Aggregates and click the check mark next to Sum to remove the aggregate.
    3. Repeate it for the sum(life_expectancy) field
      The shelves now contain the modified fields.
  5. Click REFRESH VISUAL.

    The Scatter visual appears.

    While you can see the general shape of data and a few outliers, there is too little distinguishing information to help us understand the trends.
  6. Use the Colors shelf to see if a pattern emerges.

    Under Dimensions, select country and drag it to the Colors shelf .

  7. Click REFRESH VISUAL and examine the resulting graph.
    You can now see the trend for some of the countries very clearly, over time. The visualization is less than ideal, yet obvious details show:
    • Zimbabwe, in gray, shows an appreciable increase in life expectancy, but relative to other countries, very little economic improvement (measured as GDP per capita).
    • United States, in purple, shows the expected improvement in both longevity and in GDP.
    • Kuwait, in light green, is a true outlier that shows phenomenal increase in GDP per capita, but suffers a noticeable decrease in life expectancy from about the mid-century to present day.
  8. Add aggregation back to the measures on the X Axis and Y Axis shelves. This time, you want to see the average of the dimensions.
    1. On the shelf, on the gdp_per_capita field, click the (right arrow) icon to open the FIELD PROPERTIES menu.
    2. Expand Aggregates and select Average.
    3. Repeate it for the life_expectancy field
      The shelves now contain the modified fields.
  9. Optional: Turn on the Changing legend style and removing the legend feature.
  10. Click REFRESH VISUAL.
    The Scatter visual appears.
    The distribution is an average of ALL years covered by the dataset, from 1990 through 2010. Still, some outliers are already clearly visible: Qatar and Kuwait for exceptionally high GDP per capita, Tokelau for high life expectancy at very low GDP per capita, and Sierra Leone, with the lowest life expectancy in the world.
    Figure 1. Contrasting Average Life Expectancy and GDP Per Capita, World-Wide
  11. Click the pencil/edit icon next to the title of the visualization to enter a name for the visual.

    In this example, the title is changed to 'World Population - Scatter'. You can also add a brief description of the visual as a subtitle below the title of the visualization.

  12. At the top left corner of the Dashboard Designer, click SAVE.

It is useful to remember at this point that GDP per capita is actually influenced by the population of the country.

As a next step, look at how you can show the population variation on this graph, by Adding size to a scatter visual.