Accessing data using Apache Druid
Also available as:

Visualize data using Superset

In the Superset UI, you connect to Druid data by filling out a dialog containing the fully qualified domain names (FQDN) of nodes that run Druid components. You specify a slice of data to visualize and query Druid. The visualization appears in the Superset UI.

This task introduces you to the Superset Web UI, which appears after you sign in. From the UI, you can navigate to the Apache documentation to obtain information not covered in this documentation, such as defining a list of users who can access views, opening Superset functions to certain groups of users, setting up permissions, and viewing user statistics. For more information about authentication in Superset, see the Flask documentation (link below).
  • You are running the Druid and Superset services in Ambari.
  • You ingested data, such as the Wikipedia data from the Wikiticker example, into Druid.

    The data is records of edits to Wikipedia data.

  1. In Ambari, in Services > Superset > Summary > Quick Links, click Superset.
  2. In Superset Sign In, enter the Superset Admin name admin and enter the Superset Admin password that you set up.
    The Superset Web UI appears.
  3. Select Sources > Druid Clusters.
  4. Select Sources > Refresh Druid Metadata
    In List Druid Data Source, the wikipedia data source appears.
  5. Click the data source wikipedia.
    The Data Source & Chart Type pane appears on the left. The canvas for query results appears on the right.
    At the top of the canvas, the UI includes controls for viewing the query in JSON and downloading the query in JSON or CSV format:
  6. In Data Source & Chart Type, build a query that slices your Wikipedia data. For example, get the top 10 most-edited articles between September 12 and 13, 2015 by setting the following values.
    Visualization Type Distribution - Bar Chart
    Time Granularity All
    Time - Since 9/12/2015
    Time - Until 9/13/2015
    Query - Metrics COUNT(*)
    Query - Series page
    Query - Row limit 10
    In Since and Until, click Free form and enter a date in the format shown above.
  7. Click Run Query.
    A bar chart appears showing the top 10 articles for the time frame you specified.
  8. On the canvas, change the default title of the visualization from undefined - untitled to Most Edits by Page Name, for example.
  9. Click Save a Slice , specify a file name, and click OK.
  10. In Data Source & Chart Type, create a table view that aggregates edits per channel by changing the following values, run the query, and save the slice:
    Visualization Type Distribution - Bar Chart
    Time Granularity 1 hour
    Time - Since 9/12/2015
    Time - Until 9/13/2015
    Group by channel
    Metrics SUM(added)
    Sort By SUM(added)
    The resulting table shows the number of edits per channel: