Tracking dataset versions [Technical Preview]

Cloudera Data Visualization supports dataset versioning, an important feature for tracking changes and ensuring the integrity of your datasets over time.

Dataset versioning allows you to manage multiple versions of your datasets, which is particularly useful when dealing with frequent updates or iterative changes. Every time a dataset is modified, a new version is automatically created, allowing you to track the evolution of your data structure and content over time. If an error is introduced or an earlier version is preferred, dataset versioning allows you to revert to a previous version. This feature also supports a collaborative environment by ensuring that changes made by different team members are versioned, and previous states can be restored if necessary.

To use this feature, you must first enable dataset version control in the site settings. For instructions on how to enable and configure dataset version control, see Managing version control site settings.

  1. On the main navigation bar, click DATA.
    The Data view opens, displaying the Datasets tab.
  2. Find the dataset that you want to review, by scrolling through the list or using the search function.
  3. Click the dataset name to open its details.
    The dataset side navigation pane opens for the selected dataset, displaying the Dataset Detail page.
  4. Click Version Control in the side navigation to view version history.

    The Version Control page shows the details of the current dataset version along with any previous versions, if available.

    Current version: The active version of the dataset, which can either be the most recent saved version after modifications or a previous version that has been reinstated as the active version. This version reflects the dataset's state, including any changes or configurations applied up to the point of its selection. The first version of a dataset is always the active or current version, and it remains the current version, even if unnamed, until it is replaced by a new version.

    Previous versions: Each dataset modification (for example editing a dataset field, the data model or the time model, or adding new segments) triggers the creation of a new version of the dataset. A snapshot of the dataset’s state prior to the change is saved as a previous version, and the new modification becomes the current version.

    Named and unnamed versions: By default, new dataset versions are assigned a timestamp as their name and are considered unnamed versions. To retain any version, you must assign it a name. Unnamed versions are deleted based on the settings configured in the version control site settings. You can rename a dataset version by clicking the edit icon next to its name.

  5. Optional: To change the current dataset version, identify the version on the Version Control page that you want to reinstate as the current version and click Change Version.
    The chosen version will now become the active version, replacing the previous current version.