Datasets in Cloudera Data Visualization

Datasets are the foundation and starting point for visualizing your data. They are defined based on the connections to your data and provide access to the specific tables in the data store.

A dataset is the logical representation of the data you want to use to build visuals. It acts as a logical pointer to a physical table or a defined structure in your data source. Datasets may represent the contents of a single data table or a data matrix from several tables that may reside in different data stores on the same connection.

Beyond providing access to data, datasets offer several features to facilitate data usage and visualization, including (but not limited to):
  • Table joins allow you to supplement the primary data with information from various other data sources. For more information, see Data modeling.

  • Derived fields/attributes support the creation of flexible expressions, both for dimensions and aggregates. For more information, see Creating calculated fields.

  • Hiding fields allows you to remove unnecessary fields or obscure sensitive data without affecting the underlying tables. For more information, see Hiding dataset fields from applications.

  • Changing data types of field attributes can help proper data type handling or correct processing of numeric codes (like event IDs). For more information, see Changing data type.

  • Changing default aggregation of fields at the dataset level prevents common mistakes when building visuals by setting appropriate field aggregations at the dataset level. For more information, see Changing field aggregation.

  • Providing user-friendly names for columns or derived attributes can simplify the visualization process by applying meaningful names to columns or derived attributes, reducing the need for manual aliases. For more information, see Automatically renaming dataset fields and Custom renaming dataset fields.

  • Using dataset versioning allows you to manage different versions of your datasets, which is especially useful when dealing with frequent updates or iterative changes. Versioning helps you to track changes and roll back to an earlier version if an error is introduced or an earlier version is preferred, ensuring data consistency and reliability. It also adds a layer of security and control, ensuring that data integrity is maintained even as datasets evolve. For more information, see Tracking dataset versions.