Known issues

This section lists known issues that you might run into while using the Management Console service.

CDPD-50436 Kudu service user not authorized to access Hive warehouse in RAZ enabled AWS cluster

Problem: The Kudu service user is not authorized to access Hive warehouse locations in a RAZ enabled AWS cluster on cloud object stores, which under certain conditions can prevent Kudu tables from being created. This results in the following error:
ImpalaRuntimeException: Error creating Kudu table 'default.truckspeedevents'
CAUSED BY: NonRecoverableException: failed to create HMS catalog entry for table  [id=b764bceb167746b7bb3dc1e8722e66e6]: failed to create Hive MetaStore table: TException - service has thrown: MetaException(message=Got exception: java.nio.file.AccessDeniedException 

Workaround: Add "kudu" service user to the allow list for "Default: Hive warehouse locations" in the cm_s3 Ranger repository in your S3 cloud storage.

ENGESC-19520 CDP can't validate permissions

Problem: If in your AWS organization you have Service Control Policies (SCPs) where certain regions are enabled or disabled, CDP can't validate the permissions correctly due to an AWS SDK limitation.

Workaround: Disable the permission validations.

OPSAPS-64580 Control Plane installation is failing because of failure in Install Longhorn step

Problem: When using Cloudera Manager to install CDP Private Cloud Data Services, the installation may fail due to a failure in the Longhorn install step. You will see an error similar to the following in the stderr output:
++ openssl passwd -stdin -apr1 + echo 'cm-longhorn:$apr1$gp2nrbtq$1KYPGI0QNlFJ2lo5sV62l0' + kubectl -n longhorn-system create secret generic basic-auth --from-file=auth + rm -f auth + kubectl -n longhorn-system apply -f /opt/cloudera/cm-agent/service/ecs/longhorn-ingress.yaml Error from server (InternalError): error when creating "/opt/cloudera/cm-agent/service/ecs/longhorn-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://rke2-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=10s": x509: certificate signed by unknown authority

Workaround: Retry the installation by clicking the Resume button.

OPSX-735 Kerberos service may report errors if Cloudera Manager is not running

Problem: The Cloudera Manager Server in the base cluster must be running in order to generate Kerberos principals for Private Cloud. If there is downtime, you may observe Kerberos-related errors.

Workaround: Resolve downtime on Cloudera Manager. If you encountered Kerberos errors, you can retry the operation (such as retrying creation of the Virtual Warehouse).

OPSX-1405 Multiple users creating the same environment may result in an unusable environment

Problem: If two users try to create an environment with the same name at the same time, it might result in an unusable environment.

Workaround: Delete the existing environment and try again with only one user creating the environment.

OPSX-1412 Creating an Environment may fail

When multiple users try to create the same environment at the same time or use automation to create an environment with retries, environment creation may fail on collision with a previous request to create an environment.

Workaround: Delete the existing environment, wait 5 minutes, and try again.

OPSX-2062 Platform not shown on the Compute Cluster tab in Environments

Problem: In the Management Console Environments page the platform does not display on the Compute Clusters tab.

OPSX-2713 ECS Installation fails to perform First Run of services.

Problem: If an issue is encountered during the Install Control Plane step of Containerized Cluster First Run, installation will be re-attempted infinitely rather than the command failing.

Workaround: Since the control plane is installed and uninstalled in a continuous cycle, it is often possible to address the cause of the failure while the command is still running, at which point the next attempted installation should succeed. If this is not successful, abort the First Run command, delete the Containerized Cluster, address the cause of the failure, and retry from the beginning of the Add Cluster wizard. Any nodes that are re-used must be cleaned before re-attempting installation.

OPSX-2772 Unable to modify permissions for Account Administrator

Problem: When a user with administrative privileges accesses the User Management > Update Roles page in the Management Console, the user is presented with options to select various roles. Selecting or deselecting these roles does not change this user's privileges -- an administrative user, by default, has all privileges, and those privileges cannot be changed.

CB-11925: Knox Load Balancer API requests initiated from Knox Gateway hosts can fail with Connection timeout error

Problem: When logged into a Data Lake or Data Hub node that has a Knox Gateway service instance configured on it, making Knox API calls through the Knox load balancer can result in a connection timeout error. This is because for security reasons, the IP address of the request is preserved in the traffic passed through the load balancer. Preserving the IP address means that the load balancer will reject "loopback" traffic, meaning traffic that originates and is directed back to the same node.

Workaround: If Knox API calls need to be made while logged into a Knox gateway node, use the hostname of the node instead of the load balancer hostname in the API call.

The Knox load balancer hostname can be identified by the "-gateway" suffix in the first clause of the hostname with no numeric identifier. For example: <cluster-name>-gateway.<env-shortname>.<hash>.cloudera.site is the load balancer hostname, and <cluster-name>-gateway0.<env-shortname>.<hash>.cloudera.site is a direct node hostname.

OPSAPS-59129: CM reports the HDFS warning "Secure DataNode configuration is valid, but not recommended."

Problem: On Runtime versions 7.2.9 and earlier, Data Lake clusters that include the HDFS service show this warning in Cloudera Manager: "Secure DataNode configuration is valid, but not recommended." This warning is benign and can be ignored.

CDPSDX-2879 Ranger import fails when you create a Hive replication policy for a medium duty Data Lake cluster

Problem: When you create a Hive replication policy with the Include Sentry Permissions with Metadata or Skip URI Privileges option for a medium duty Data Lake cluster, Ranger import fails. Before you choose the Include Sentry Permissions with Metadata option for a Hive replication policy for a medium duty Data Lake cluster, contact Cloudera Support.

CB-10535, CB-10372 Generate CLI command for existing environment should show 3 commands instead of 1

Problem: If you try to obtain the CDP CLI commands from an existing environment > Actions > Show CLI commands, only the create environment command is displayed instead of all three commands required for registering an environment from CDP CLI.

Workaround: You can obtain the command for creating a Data Lake from Data Lake details. The command to obtain the set IDbroker mappings can be obtained from an existing environment or from CDP CLI help, but you need to modify it manually to set the same mappings as in the source environment.

CB-10706 SSO is not working for Solr/Namenode UI links

Problem: SSO login to an environment with a medium duty Data Lake breaks access to Solr and Namenode UI links.

Workaround: After you deploy a medium duty Data Lake, login to Gateway0 and run:

openssl rand -base64 12

Then login as root to both gateway nodes and run:

export KNOX_GATEWAY_DATA_DIR=/var/lib/knox/gateway/data
/opt/cloudera/parcels/CDH/lib/knox/bin/knoxcli.sh create-alias pac4j.password --cluster knoxsso --value “the value from above"

Then in Cloudera Manager, restart the Knox service.

DWX-6635 Tags are not being added to S3 buckets

Problem: S3 buckets that are part of your AWS environment registered in CDP are not being tagged during environment registration. This is because the PutBucketTagging policy is missing from the cross-accunt IAM role that CDP requires you to create for your environment’s credential.

Workaround: You can:
  • Manually add tags to your S3 buckets used for existing environments.
  • Add the PutBucketTagging policy to the IAM role used for your provisioning credential so that any environments registered in CDP in the future can automatically add S3 bucket tags.

CB-6924 Workaround for ZooKeeper external volume bug

Problem: In the current version of CDP, ZooKeeper might be configured to write to CDP's root disk which is too small to accommodate the ZooKeeper data. To correct this issue, you need to reconfigure ZooKeeper to write to an external volume and move any ZooKeeper data to that volume.

Workaround: To check if ZooKeeper is configured to use an external volume, complete the following:

  1. Open ZooKeeper and navigate to: ZooKeeper menu item -> Configuration tab -> Filter to Server.
  2. If the dataDir and dataLogDir fields contain /hadoopfs/fs1/zookeeper you do not need to do anything.
  3. If the fields contain any other values, you must reconfigure ZooKeeper.

To reconfigure ZooKeeper, complete the following:

  1. ssh into the machine where the ZooKeeper server is running .
  2. Run the following command to change the user:

    sudo -su zookeeper

  3. Run the following command:

    cp -R /var/lib/zookeeper/ /hadoopfs/fs1/zookeeper

  4. Open the cluster from the Cloudbreak user interface.
  5. Log into the Clouder Manager user interface.
  6. Find ZooKeeper on the Cloudera Manager page and navigate to the configuration with either the Search box or select it from the side menu: ZooKeeper menu item -> Configuration tab -> Filter to Server.
  7. Change the following properties:
    dataDir: /hadoopfs/fs1/zookeeper
    dataLogDir: /hadoopfs/fs1/zookeeper
  8. Save your changes
  9. Restart the Stale configuration

    You do not need to redeploy ZooKeeper.

CB-3876 Data Warehouse and Machine Learning create security groups

Problem: When during environment registration you choose to use your own security groups, the Data Warehouse and Machine Learning services do not use these security groups but create their own.

Workaround: For instructions on how to restrict access on the security groups created by the Data Warehouse service, refer to Restricting access to endpoints in AWS environments.

CRB-971 Data Warehouse creates IAM, S3, and DynamoDB resources

Problem: The Data Warehouse service creates its own S3 buckets, DynamoDB tables, and IAM roles and policies. It does not use the environment's S3 bucket(s), DynamoDB table, and IAM roles and policies.

Workaround: There is no workaround.

CB-4176 Data Lake cluster repair fails after manual stop

Problem: Data Lake cluster repair fails after an instance has been stopped manually via AWS console or AWS CLI.

Workaround: After stopping a cluster instance manually, restart it manually via the AWS console or AWS CLI, and then use the Sync option in CDP to sync instance state.

CB-2813 Environment with ML workspaces in it can be deleted

Problem: When deleting an environment that uses a customer-created VPC and subnets, there is no mechanism in place to check for any existing ML workspaces running within the environment. As a result, an environment can be deleted when ML workspaces are currently running in it.

Workaround: If using an environment created within your existing VPC and subnets, prior to deleting an environment, ensure that there are no ML workspaces running within the environment.

CB-3459 Subnet dependency error when deleting an environment

Problem: When deleting an environment that uses a VPC and subnets created by CDP, the environment deletion fails with an error similar to:
com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: AWS CloudFormation stack reached an error state: DELETE_FAILED reason: The subnet 'subnet-05606fd72fda58c8c' has dependencies and cannot be deleted. (Service: AmazonEC2; Status Code: 400; Error Code: DependencyViolation; Request ID: da9a7fe0-ac43-467e-9942-94f10e6bd2b7)
This error occurs if there are resources such as instances used for Data Warehouse, or Machine Learning cluster nodes that were not deleted prior to environment termination.

Workaround: Prior to terminating an environment, you must terminate all clusters running within that environment.

CB-4248 Expired certificate causes untrusted connection warning

Problem: CDP automatically generates an SSL certificate for every Data Lake and Data Hub cluster. There are two possibilities:
  • By default, CDP generates a trusted certificate valid for 3 months.
  • If generating a trusted certificate fails, CDO generates a self-signed certificate valid for 2 years.

In the first case, if your cluster stays active for over 3 months, the trusted certificate will expire and you will see an "untrusted connection", warning when trying to access cluster UIs from your browser.

Workaround: To fix this, you should generate a new certificate by using the following steps:
  1. Use the Renew certificate UI option:
    • For Data Lake - Click the Renew certificate button on the Data Lake details page.
    • For Data Hub - Click Actions > Renew certificate on Data Hub cluster details page.

    During certificate renewal, several related messages will be written to Event History. Once the certificate renewal has been completed, the following message appears: "Renewal of the cluster's certificate finished."

  2. Additionally, if your cluster was created prior to December 19, you need to perform the following manual steps:
    1. SSH to the Knox gateway host on your cluster.
    2. Run the hostname command to get your domain name.
    3. Run the following commands (just replace the domain name test-master.dev.cldr.work with your correct, fully-qualified domain name):
      sudo sh -c '/opt/salt_2017.7.5/bin/salt --out=newline_values_only 'test-master.dev.cldr.work' pillar.get gateway:userfacingcert > /etc/certs-user-facing/server.pem'
      sudo systemctl reload nginx.service

CB-23926: Autoscaling must be stopped before performing FreeIPA upgrade

Problem: During autoscaling, compute nodes are stopped and cannot receive the latest FreeIPA IP address update in a stopped state. This causes the Data Hub clusters to become unhealthy after the FreeIPA upgrade when autoscaling is enabled.

Workaround:
  1. Disable autoscaling before perfoming FreeIPA upgrade.
  2. Start all compute nodes, which are in a STOPPED state.
  3. Perform the FreeIPA upgrade.
  4. Enable autoscaling after FreeIPA upgrade is successful.