Using profiling

The Profiling tab provides statistics and visualizations for selected fields in a dataset. You can explore dimensions and measures, view data types, refresh results, and manage cards to better understand the dataset.

Enable data profiling in Site Settings > Data > Enable Data Profiling. For more information, see Managing data-related site settings.

  1. Click DATA on the main navigation bar.
    The Data view appears, open on the Datasets tab.
  2. Find the dataset that you want to profile, either by scrolling through the list or using the search function.
  3. Click the dataset you want to examine.
    The dataset side navigation pane opens for the selected dataset, displaying the Dataset Detail page.
  4. Click Dataset Profiling in the left navigation panel.
    The Dataset Profiling view opens on the Profiling tab. This tab provides the option to select fields from the dimensions and measures of the dataset.
  5. Select the fields that you want to examine and click Explore.
    Cards appear with statistics and visualizations for each selected field. If the query is running too long, you can cancel it by clicking Stop. You can rerun a query and get updated results by clicking Refresh.

    The following information is shown for the selected dataset fields.

    Dimension values:
    Most frequent
    The string that appears the most often in the dataset, helping to identify the most common value.
    Least frequent
    The string that appears the least often in the dataset, helping to identify rare or unusual values.
    Number of Empty string
    The count of cells that contain an empty string, for example cells that are not null but contain no characters.
    Empty cells
    The number of cells that are completely empty, including both null values and empty strings.
    Record count
    The total number of records in the dataset.
    Duplicated rows
    The number of rows that are exactly identical, helping to identify redundant data. A high number of duplicates often signifies low variability in the dataset.
    Distinct count
    The number of unique values in the dataset, showing how varied the data is.
    Longest word
    The string with the highest number of characters in the dataset, providing insight into the potential size of values.
    Shortest word
    The string with the fewest number of characters, indicating the shortest data entry.
    MIN
    The minimum value in the dataset based on alphabetical order, useful for sorting or range analysis.
    MAX
    The maximum value in the dataset based on alphabetical order, helping to define the upper range of the data.

    The count histogram for dimensions shows how many times each category appears, that is the number of occurrences (counts) of each category (dimension). This visualization is useful for understanding the quality, distribution, and structure of your dataset.

    Measure values:
    MIN
    The smallest numeric value in the dataset, showing the lower bound of your data.
    MAX
    The largest numeric value in the dataset, showing the upper bound of your data.
    Avg (Average)
    The sum of all numeric values divided by the total number of records, giving an overall sense of the central value.
    Standard Dev
    Standard deviation that shows how spread out the numbers are in the dataset. A higher value indicates more variability in the data and a lower value indicates that the data points are closer to the average.
    Number of Zeroes
    The count of records where the value is exactly zero, indicating gaps or null-equivalents in the data.
    Empty cells
    The number of cells that are completely empty, including both null values and missing values.
    Record count
    The total number of records in the dataset.
    Duplicated rows
    The number of rows where all values are identical, helping to identify redundant data. A high number of duplicates often signifies low variability in the dataset.
    Distinct count
    The number of unique values in the dataset, showing how varied the data is.
    Most frequent
    The numeric value that appears the most often in the dataset, helping to identify the most common value.
    Least frequent
    The numeric value that appears the least often in the dataset, helping to identify rare or unusual values.
    The frequency histogram for measures shows the distribution of numeric values (measures) within specific ranges (buckets), giving insight into their spread and common ranges.