Creating Datasets

You can group data assets into datasets. This enables you to organize data based on business classifications, purpose, protection requirements, or more. Examples of datasets are: customer profiles, sales assets, financials, PII, and HR data.

  1. From the Datasets page, click Add Datasets.
    The Add page appears.
  2. Enter the following information.
    Field Name Description Example Values
    Name Enter an appropriate dataset name. This name cannot be duplicated across the system. (Mandatory) Customer Profiles, Sales Assets, Financials
    Description Describe the purpose or intent of the dataset. (Mandatory) Contains customer profiles: data assets for US and WW.
    Data Lake Assign the dataset to a data lake. Choose from a list of available data lakes. (Mandatory) dss_bbsh_clust3
    Tags Add tags to your dataset for context and subsequent lookup. Tags enable you to quickly catalog, search and retrieve asset collections in Cloudera Data Catalog, as well as, share such information with others in the future. (Optional) se, pii, geo, finance
    Public/Private Select Public if you want other users to have access to this dataset. Select Private if only you want to have access to this dataset. Public/Private
  3. Click Next.
    The Dataset Details page appears for the new dataset.
  4. Click Add Assets to add related data assets into your dataset.
    The Asset Search page appears.
  5. Search for assets using the search bar.
    1. Use filters to search for specific assets based on the attributes of assets. Click Filter to display the filters available.
      • Created: Select the time to refine the search on the basis of when the asset has been created.
      • Owner: Enter the name of the owner to refine the search on the basis of the owners of the assets.
      • DB Name: Enter the name of the database.
      • Tag: Enter the names of the tags after selecting its type (Table/Column).
    2. Select one more than one filter if needed.
    3. Click Search to view the assets.
    4. Click Reset to reset the filters and search again.
    5. From the list, click to select the assets that you like to add to your dataset.
  6. Search for assets using the Advanced tab, if needed. Advanced search uses facets of technical and business metadata about the assets, such as those captured in Apache Atlas, to help users define and build collections of interest. Advanced search conditions are a subset of attributes for the Apache Atlas type hive_table.
  7. Click Add.
    The assets are added to the dataset and the Search page is refreshed.
  8. Close the Search tab by clicking Done.
    The Datasets Details page appears.
  9. Click Save to finish editing your dataset.