Network and security

Data Hub clusters run within the network that is part of your environment and inherit many environment’s security configurations:

Item Description
Network and subnet Data Hub clusters use the network and subnet specified or created on the environment level.
Security groups Data Hub clusters use security group settings specified or created on the environment level.
Security and governance Each Data Hub cluster is attached to a data lake running in the same environment. The data lake provides a way for you to centrally apply and enforce authentication, authorization, and audit policies across multiple ephemeral Data Hub clusters. Furthermore, it provides a protective ring around the data wherever that’s stored, be that in cloud object storage or HDFS.
Kerberos All Data Hub clusters are Kerberized. The IPA server running within the environment provides Kerberos to all Data Hub clusters.
Secure gateway for access to cluster UIs and endpoints Data Hub users can access cluster UIs and endpoints via a secure gateway with a proxy and web UI SSO powered by Apache Knox.
SSH access SSH access to Data Hub clusters is available to an admin user (root access) and additional users (basic level of access):
  • When registering an environment, your CDP administrator provides an SSH key that can later be used for root-level access to all Data Hub clusters. This allows the CDP administrator to access all clusters as root.

  • Additionally, it is possible for other users to set up a password and access Data Hub clusters via SSH by using that password.

  • A Power User can add and delete SSH keys for all users including machine users, and users can add and delete their own SSH keys. Once these SSH keys are uploaded and synced, they can be used to access workload cluster nodes. RSA or ED25519 keys are supported. See Managing SSH keys for more information.