Known issues
This section lists known issues that you might run into while using the Management Console service.
CDPD-50436 Kudu service user not authorized to access Hive warehouse in RAZ enabled AWS cluster
ImpalaRuntimeException: Error creating Kudu table 'default.truckspeedevents'
CAUSED BY: NonRecoverableException: failed to create HMS catalog entry for table [id=b764bceb167746b7bb3dc1e8722e66e6]: failed to create Hive MetaStore table: TException - service has thrown: MetaException(message=Got exception: java.nio.file.AccessDeniedException
Workaround: Add "kudu" service user to the allow list for "Default: Hive warehouse
locations" in the cm_s3
Ranger repository in your S3 cloud storage.
ENGESC-19520 CDP can't validate permissions
Problem: If in your AWS organization you have Service Control Policies (SCPs) where certain regions are enabled or disabled, CDP can't validate the permissions correctly due to an AWS SDK limitation.
Workaround: Disable the permission validations.
OPSAPS-64580 Control Plane installation is failing because of failure in Install Longhorn step
++ openssl passwd -stdin -apr1 + echo 'cm-longhorn:$apr1$gp2nrbtq$1KYPGI0QNlFJ2lo5sV62l0' + kubectl -n longhorn-system create secret generic basic-auth --from-file=auth + rm -f auth + kubectl -n longhorn-system apply -f /opt/cloudera/cm-agent/service/ecs/longhorn-ingress.yaml Error from server (InternalError): error when creating "/opt/cloudera/cm-agent/service/ecs/longhorn-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://rke2-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=10s": x509: certificate signed by unknown authority
Workaround: Retry the installation by clicking the Resume button.
OPSX-735 Kerberos service may report errors if Cloudera Manager is not running
Problem: The Cloudera Manager Server in the base cluster must be running in order to generate Kerberos principals for Private Cloud. If there is downtime, you may observe Kerberos-related errors.
Workaround: Resolve downtime on Cloudera Manager. If you encountered Kerberos errors, you can retry the operation (such as retrying creation of the Virtual Warehouse).
OPSX-1405 Multiple users creating the same environment may result in an unusable environment
Problem: If two users try to create an environment with the same name at the same time, it might result in an unusable environment.
Workaround: Delete the existing environment and try again with only one user creating the environment.
OPSX-1412 Creating an Environment may fail
When multiple users try to create the same environment at the same time or use automation to create an environment with retries, environment creation may fail on collision with a previous request to create an environment.
Workaround: Delete the existing environment, wait 5 minutes, and try again.
OPSX-2062 Platform not shown on the Compute Cluster tab in Environments
Problem: In the Management Console Environments page the platform does not display on the Compute Clusters tab.
OPSX-2713 ECS Installation fails to perform First Run of services.
Problem: If an issue is encountered during the Install Control Plane step of Containerized Cluster First Run, installation will be re-attempted infinitely rather than the command failing.
Workaround: Since the control plane is installed and uninstalled in a continuous cycle, it is often possible to address the cause of the failure while the command is still running, at which point the next attempted installation should succeed. If this is not successful, abort the First Run command, delete the Containerized Cluster, address the cause of the failure, and retry from the beginning of the Add Cluster wizard. Any nodes that are re-used must be cleaned before re-attempting installation.
OPSX-2772 Unable to modify permissions for Account Administrator
Problem: When a user with administrative privileges accesses the
page in the Management Console, the user is presented with options to select various roles. Selecting or deselecting these roles does not change this user's privileges -- an administrative user, by default, has all privileges, and those privileges cannot be changed.CB-11925: Knox Load Balancer API requests initiated from Knox Gateway hosts can fail with Connection timeout error
Problem: When logged into a Data Lake or Data Hub node that has a Knox Gateway service instance configured on it, making Knox API calls through the Knox load balancer can result in a connection timeout error. This is because for security reasons, the IP address of the request is preserved in the traffic passed through the load balancer. Preserving the IP address means that the load balancer will reject "loopback" traffic, meaning traffic that originates and is directed back to the same node.
Workaround: If Knox API calls need to be made while logged into a Knox gateway node, use the hostname of the node instead of the load balancer hostname in the API call.
The Knox load balancer hostname can be identified by the "-gateway" suffix in the first clause of the hostname with no numeric identifier. For example: <cluster-name>-gateway.<env-shortname>.<hash>.cloudera.site is the load balancer hostname, and <cluster-name>-gateway0.<env-shortname>.<hash>.cloudera.site is a direct node hostname.
OPSAPS-59129: CM reports the HDFS warning "Secure DataNode configuration is valid, but not recommended."
Problem: On Runtime versions 7.2.9 and earlier, Data Lake clusters that include the HDFS service show this warning in Cloudera Manager: "Secure DataNode configuration is valid, but not recommended." This warning is benign and can be ignored.
CDPSDX-2879 Ranger import fails when you create a Hive replication policy for a medium duty Data Lake cluster
Problem: When you create a Hive replication policy with the Include Sentry Permissions with Metadata or Skip URI Privileges option for a medium duty Data Lake cluster, Ranger import fails. Before you choose the Include Sentry Permissions with Metadata option for a Hive replication policy for a medium duty Data Lake cluster, contact Cloudera Support.
CB-10535, CB-10372 Generate CLI command for existing environment should show 3 commands instead of 1
Problem: If you try to obtain the CDP CLI commands from an existing environment > Actions > Show CLI commands, only the create environment command is displayed instead of all three commands required for registering an environment from CDP CLI.
Workaround: You can obtain the command for creating a Data Lake from Data Lake details. The command to obtain the set IDbroker mappings can be obtained from an existing environment or from CDP CLI help, but you need to modify it manually to set the same mappings as in the source environment.
CB-10706 SSO is not working for Solr/Namenode UI links
Problem: SSO login to an environment with a medium duty Data Lake breaks access to Solr and Namenode UI links.
Workaround: After you deploy a medium duty Data Lake, login to Gateway0 and run:
openssl rand -base64 12
Then login as root to both gateway nodes and run:
export KNOX_GATEWAY_DATA_DIR=/var/lib/knox/gateway/data
/opt/cloudera/parcels/CDH/lib/knox/bin/knoxcli.sh create-alias pac4j.password --cluster knoxsso --value “the value from above"
Then in Cloudera Manager, restart the Knox service.
DWX-6635 Tags are not being added to S3 buckets
Problem: S3 buckets that are part of your AWS environment registered in CDP are not being tagged during environment registration. This is because the PutBucketTagging policy is missing from the cross-accunt IAM role that CDP requires you to create for your environment’s credential.
- Manually add tags to your S3 buckets used for existing environments.
- Add the PutBucketTagging policy to the IAM role used for your provisioning credential so that any environments registered in CDP in the future can automatically add S3 bucket tags.
CB-6924 Workaround for ZooKeeper external volume bug
Problem: In the current version of CDP, ZooKeeper might be configured to write to CDP's root disk which is too small to accommodate the ZooKeeper data. To correct this issue, you need to reconfigure ZooKeeper to write to an external volume and move any ZooKeeper data to that volume.
Workaround: To check if ZooKeeper is configured to use an external volume, complete the following:
- Open ZooKeeper and navigate to: ZooKeeper menu item -> Configuration tab -> Filter to Server.
- If the dataDir and dataLogDir fields
contain
/hadoopfs/fs1/zookeeper
you do not need to do anything. - If the fields contain any other values, you must reconfigure ZooKeeper.
To reconfigure ZooKeeper, complete the following:
ssh
into the machine where the ZooKeeper server is running .- Run the following command to change the user:
sudo -su zookeeper
- Run the following command:
cp -R /var/lib/zookeeper/ /hadoopfs/fs1/zookeeper
- Open the cluster from the Cloudbreak user interface.
- Log into the Clouder Manager user interface.
- Find ZooKeeper on the Cloudera Manager page and navigate to the configuration with either the Search box or select it from the side menu: ZooKeeper menu item -> Configuration tab -> Filter to Server.
- Change the following properties:
dataDir: /hadoopfs/fs1/zookeeper dataLogDir: /hadoopfs/fs1/zookeeper
- Save your changes
- Restart the Stale configuration
You do not need to redeploy ZooKeeper.
CB-3876 Data Warehouse and Cloudera AI create security groups
Problem: When during environment registration you choose to use your own security groups, the Data Warehouse and Cloudera AI services do not use these security groups but create their own.
Workaround: For instructions on how to restrict access on the security groups created by the Data Warehouse service, refer to Restricting access to endpoints in AWS environments.
CRB-971 Data Warehouse creates IAM, S3, and DynamoDB resources
Problem: The Data Warehouse service creates its own S3 buckets, DynamoDB tables, and IAM roles and policies. It does not use the environment's S3 bucket(s), DynamoDB table, and IAM roles and policies.
Workaround: There is no workaround.
CB-4176 Data Lake cluster repair fails after manual stop
Problem: Data Lake cluster repair fails after an instance has been stopped manually via AWS console or AWS CLI.
Workaround: After stopping a cluster instance manually, restart it manually via the AWS console or AWS CLI, and then use the Sync option in CDP to sync instance state.
CB-2813 Environment with Cloudera AI workbenches in it can be deleted
Problem: When deleting an environment that uses a customer-created VPC and subnets, there is no mechanism in place to check for any existing Cloudera AI workbenches running within the environment. As a result, an environment can be deleted when Cloudera AI workbenches are currently running in it.
Workaround: If using an environment created within your existing VPC and subnets, prior to deleting an environment, ensure that there are no Cloudera AI workbenches running within the environment.
CB-3459 Subnet dependency error when deleting an environment
com.sequenceiq.cloudbreak.cloud.exception.CloudConnectorException: AWS CloudFormation stack reached an error state: DELETE_FAILED reason: The subnet 'subnet-05606fd72fda58c8c' has dependencies and cannot be deleted. (Service: AmazonEC2; Status Code: 400; Error Code: DependencyViolation; Request ID: da9a7fe0-ac43-467e-9942-94f10e6bd2b7)
This
error occurs if there are resources such as instances used for Data Warehouse, or Cloudera AI cluster nodes that were not deleted prior to environment
termination.Workaround: Prior to terminating an environment, you must terminate all clusters running within that environment.
CB-4248 Expired certificate causes untrusted connection warning
- By default, CDP generates a trusted certificate valid for 3 months.
- If generating a trusted certificate fails, CDO generates a self-signed certificate valid for 2 years.
In the first case, if your cluster stays active for over 3 months, the trusted certificate will expire and you will see an "untrusted connection", warning when trying to access cluster UIs from your browser.
- Use the Renew certificate UI option:
- For Data Lake - Click the Renew certificate button on the Data Lake details page.
-
For Data Hub - Click Actions > Renew certificate on Data Hub cluster details page.
During certificate renewal, several related messages will be written to Event History. Once the certificate renewal has been completed, the following message appears: "Renewal of the cluster's certificate finished."
- Additionally, if your cluster was created prior to December 19, you need to perform the
following manual steps:
- SSH to the Knox gateway host on your cluster.
- Run the
hostname
command to get your domain name. - Run the following commands (just replace the domain name
test-master.dev.cldr.work
with your correct, fully-qualified domain name):sudo sh -c '/opt/salt_2017.7.5/bin/salt --out=newline_values_only 'test-master.dev.cldr.work' pillar.get gateway:userfacingcert > /etc/certs-user-facing/server.pem' sudo systemctl reload nginx.service
CB-23926: Autoscaling must be stopped before performing FreeIPA upgrade
Problem: During autoscaling, compute nodes are stopped and cannot receive the latest FreeIPA IP address update in a stopped state. This causes the Data Hub clusters to become unhealthy after the FreeIPA upgrade when autoscaling is enabled.
- Disable autoscaling before perfoming FreeIPA upgrade.
- Start all compute nodes, which are in a STOPPED state.
- Perform the FreeIPA upgrade.
- Enable autoscaling after FreeIPA upgrade is successful.