Collections
A collection is a group of flow definitions that provides both sorting and access control within the Catalog.
Collections are a layer of role-based access control in Cloudera Data Flow. You may consider them as logical containers that you can use to govern access to flow definitions. Flow definitions in the Cloudera Data Flow Catalog are by default visible to all users with at least DFCatalogViewer role. Collections help you with sorting and compartmentalizing them for better visibility. Collections also provide fine grained, role-based access control over flow definitions within the Catalog, preventing accidental or malicious modification or deletion. To prevent accidental data loss, only empty collections may be deleted. Flow definitions may be assigned to one collection at a time, but they can be reassigned between collections.
ReadyFlows are a special kind of flow definition in the sense that they cannot be assigned to a collection. On the other hand, if you create a flow draft based on a ReadyFlow, edit it in Flow Designer and publish it as a new flow definition to the Catalog, this new flow definition can be assigned to a collection.
Flow definitions can have one of two states in the Catalog:
- Unassigned - the flow definition is not assigned to any particular collection, it is freely accessible for every user or group with the appropriate user role.
-
Assigned to a collection - the flow definition is only accessible to users or groups with access to a particular collection. This assignment is exclusive, you cannot assign the same flow definition to more than one collection at a time.
The roles associated with collections are additive, they work in conjunction with user permissions controlling the types of actions a user or group is allowed to perform in a Cloudera on cloud environment.
Projects introduces the following user roles:
- DFCollectionsAdmin
- This role grants permission to manage DataFlow Catalog collections within an tenant.
- DFCollectionsCreator
- This role grants permission to create DataFlow Catalog collections. When creating a collection, they automatically become the DFCollectionAdmin in that collection.
- DFCollectionAdmin
- This role is automatically assigned to users with DFCollectionsCreator role upon creating a collection. DFCollectionAdmins can add/remove users and groups, changing user and group roles (DFCollectionAdmin, DFCollectionMember, or DFCollectionViewer), modify the collection name and description, and delete the collection.
- DFCollectionMember
- This role grants permission to publish, import, delete, view, and reassign Flow Definitions in a flow definition Catalog collection.
- DFCollectionViewer
- This role grants read-only permissions on a specific flow definition Catalog collection.
The example in figure Catalog roles example represents a Cloudera Data Flow Catalog containing unassigned 'Flow definition 1', 'ReadyFlow 1' (ReadyFlows cannot be associated with flow definition collections), collections 'Collection_A' with 'Flow definition2' and 'Collection_B' with 'Flow definition 3' associated with them respectively.
Users with DFCatalogViewer role for Cloudera Data Flow Catalog and CollectionViewer role for 'Collection_A' are able to view 'Flow definition 2' in 'Collection_A', plus 'Flow definition 1' and 'ReadyFlow 1' because those are unassigned. They are unable to view flow definitions associated with 'Collection_B'.