Tracking dataset versions [Technical Preview]

Cloudera Data Visualization supports dataset versioning, a feature that helps track changes, enable collaborative dataset development, and ensure the integrity of your datasets over time.

Dataset versioning allows you to manage multiple versions of a dataset as it evolves, which is particularly useful when dealing with frequent updates or iterative changes. Every time a dataset is modified, a new version is automatically created, preserving the previous state. Dataset versioning provides a collaborative environment by ensuring that changes made by different team members can be tracked, and previous states can be restored if necessary.

Key features of dataset versioning:
  • Track changes: Monitor how data structures and configurations evolve over time.
  • Restore previous versions: Revert to earlier versions to correct errors or undo unwanted changes.
  • Compare versions: Review structural differences between dataset versions to understand the impact of changes before switching.
Follow these steps to dataset versioning in Cloudera Data Visualization.

To use this feature, you must first enable dataset version control in the site settings. For instructions on how to enable and configure dataset version control, see Managing version control site settings.

Accessing dataset versioning

  1. On the main navigation bar, click DATA.

    The Data view opens, displaying the Datasets tab.

  2. Find the dataset that you want to review, by scrolling through the list or using the search function.

  3. Click the dataset name to view its details.

    The dataset side navigation pane opens for the selected dataset, displaying the Dataset Detail page.

  4. Click Version Control in the side navigation to view version history.

Viewing version history

The Version Control page shows the details of the current dataset version along with any previous versions, if available. Metadata includes version name, dataset name, IDs, creation and update timestamps, and user info.

  • Current version: The first version of a dataset is always the active or current version, and it remains the current version, even if unnamed, until it is replaced by a new version. It is marked with a green Current Version tag.

  • Previous versions: Each dataset modification (for example editing a dataset field, the data model or the time model, or adding new segments) triggers the creation of a new version of the dataset. A snapshot of the dataset’s state prior to the change is saved as a previous version, and the new modification becomes the current version.

Managing versions

  • Sorting: Click any column header to sort the version list.

  • Filtering: Use the search bar to find versions by name, ID, or creator.

  • Naming: Versions are unnamed by default and display a timestamp as the name. Unnamed versions may be automatically deleted based on version control site settings. To preserve a version, click the icon next to its name and assign a custom name.

You can also manage versions using the action icons located at the end of each version row.

Reverting to a previous version

To make a previous version the current one:

  1. Click the icon next to the version you want to reinstate.

  2. Review the version details in the Change Dataset Versions modal window.

  3. Click Resume to apply the change or Cancel to discard the action.

Deleting dataset versions

Click the icon to remove a previous dataset version.For bulk deletion, select multiple dataset versions and click Delete to clean up version history.

Comparing dataset versions

To view the differences between the current and a previous version of a dataset, click the icon next to the previous version you want to compare with the current one.

The Compare Datasets page shows the changes using color-coded sections to help you assess the impact of modifications before deciding to restore a previous version.

  • Added (green): Entries that exist in the selected version but not in the current one.

  • Removed (red): Entries that are present in the current version but not in the selected one.

  • Modified (yellow): Entries that exist in both versions but have differences.

Each section can be expanded to view a detailed list of the changes.