Datasets in Cloudera Data Visualization

Datasets are the foundation and starting point for visualizing your data. They are defined based on the connections to your data and provide access to the specific tables in the data store.

A dataset is the logical representation of the data you want to use to build visuals. It acts as a logical pointer to a physical table or a defined structure in your data source. Datasets may represent the contents of a single data table or a data matrix from several tables that may reside in different data stores on the same connection.

Beyond providing access to data, datasets offer several features to facilitate data usage and visualization, including (but not limited to):

Table joins allow you to supplement the primary data with information from various other data sources. For more information, see Data modeling.
Derived fields/attributes support the creation of flexible expressions, both for dimensions and aggregates. For more information, see Creating calculated fields.
Hiding fields allows you to remove unnecessary fields or obscure sensitive data without affecting the underlying tables. For more information, see Hiding dataset fields from applications.
Changing data types of field attributes can help proper data type handling or correct processing of numeric codes (like event IDs). For more information, see Changing data type.
Changing default aggregation of fields at the dataset level prevents common mistakes when building visuals by setting appropriate field aggregations at the dataset level. For more information, see Changing field aggregation.
Providing user-friendly names for columns or derived attributes can simplify the visualization process by applying meaningful names to columns or derived attributes, reducing the need for manual aliases. For more information, see Automatically renaming dataset fields and Custom renaming dataset fields.
Using dataset versioning [Technical Preview] allows you to manage different versions of your datasets, which is especially useful when dealing with frequent updates or iterative changes. Each time a dataset is modified, a new version is automatically created, preserving a snapshot of its previous state.
Versioning enables you to:
- Track changes over time to understand how your data structure and configuration evolve.
- Restore a previous version if an error is introduced or an earlier version is preferred, helping maintain data consistency and reliability.
- Compare versions to identify what has been added, removed, or modified between two versions, providing clear visibility into the impact of changes before deciding to restore an earlier state.
Dataset versioning supports collaboration by ensuring changes made by different users are versioned and reversible. It also adds a layer of control and security, helping preserve data integrity as datasets evolve.

note
Dataset versioning is currently available as a technical preview in Cloudera Data Visualization.