Restore event for an environment backup fails with an exception

Problem

When you delete an environment after the backup event, the restore operation for the environment fails and the Not able to fetch details from Cluster:... exception appears.

Cause

During the environment creation process, the environment service creates an internal Cloudera Manager user with Full Administrator role. The username is stored in the Control Plane database, and the password is stored in the vault. When you delete an environment, the internal Cloudera Manager user gets deleted. The exception appears only if the password is no longer valid or might be missing. One of the reasons why the password might go missing is that while fixing a vault corruption, the vault might have been rebuilt without fixing the Cloudera Manager credentials.

Solution

  1. Get the internal Cloudera Manager username using the following commands to determine whether the credential is valid.
    1. Login into the environment using the kubectl exec -it cdp-embedded-db-0 -n [***CONTROL PLANE NAMESPACE***] psql command.
    2. Connect to the environment database using the \c db-env; command.
    3. Run the following SQL query in the cdp-embedded-db-0 pod:

      SELECT e.environment_crn, c.value FROM environments e JOIN configs c ON e.environment_crn = c.environment_crn WHERE e.environment_name = '[***YOUR ENV NAME***]' AND c.attr = 'cmUser';

      Sample output:
        environment_crn                 | value
      ------------------------------------------------------
      crn:altus:environments:us-west-1:60e-46de-992-90b5-0ff943dae1c8:environment:test-saml2-env-1/48e9fcf-9220-4c8f-bc7d-caa96b1834f5 | __cloudera_internal_user__
      test-saml2-env-1-798414fe-faa6-43e1-ac9c-75c4d33ec294
      
      The __cloudera_internal_user__ test-saml2-env-1-798414fe-faa6-43e1-ac9c-75c4d33ec294 is the internal Cloudera Manager username in the sample output.
  2. Get the internal Cloudera Manager password using the following commands:
    1. Run the following commands to get the root token for the embedded vault:
      1. If you are using OCP:

        $ kubectl get secret vault-unseal-key -n [***VAULT-NAMESPACE***] -o jsonpath="{.data.init\.json}" | base64 -d {"keys":["[***VALUE***]"],"keys_base64":["[***value***]="],"recovery_keys":null,"recovery_keys_base64":null,"root_token":"[***VALUE***]"} command returns the vault root token.

      2. If you are using ECS:
          • [root@cm_server_db_host ~]# psql -U cm cm
          • select * from CONFIGS where attr like '%vault_root%';
          Sample output:
          config_id | role_id | attr | value | service_id | host_id | config_container_id | optimistic_lock_version | role_config_group_id | context | external_account_id | key_id
          ------------+---------+------------+------------------------------+---
          1546337327 | | vault_root | hvs.SvIrIhhffYEmVPEWN3TSEzks | 1546337154 |   | | 0 | | NONE ||
          

          The hvs.SvIrIhhffYEmVPEWN3TSEzks value in the above sample output is the vault token.

    2. kubectl exec -it vault-0 -n [***VAULT_NAMESPACE***] /bin/sh
    3. export VAULT_TOKEN=[***VAULT ROOT TOKEN***]
    4. ~ $ vault secrets list -detailed -tls-skip-verify
      Sample output:
      Path          Plugin       Accessor              Default TTL    Max TTL    Force No Cache    Replication    Seal Wrap    External Entropy Access    Options           Description                                                UUID                                    Version    Running Version          Running SHA256    Deprecation Status
      ----          ------       --------              -----------    -------    --------------    -----------    ---------    -----------------------    -------           -----------                                                ----                                    -------    ---------------          --------------    ------------------
      cubbyhole/    cubbyhole    cubbyhole_35ff7854    n/a            n/a        false             local          false        false                      map[]             per-token private secret storage                           f2fa15ec-49-cea2-88f6-e6807c30fba3    n/a        v1.13.1+builtin.vault    n/a               n/a
      identity/     identity     identity_b7aa2294     system         system     false             replicated     false        false                      map[]             identity store                                             17990faa-e0-727a-92a3-aaaa1ff43393    n/a        v1.13.1+builtin.vault    n/a               n/a
      kv/           kv           kv_2ba3b77c           system         system     false             replicated     false        false                      map[version:2]    key/value secret storage                                   98b14495-b6-6958-04bc-1ca7c55d4590    n/a        v0.14.2+builtin          n/a               supported
      secret/       kv           kv_218f4379           system         system     false             replicated     false        false                      map[version:2]    key/value secret storage                                   06371963-e6-56c1-7ab3-d6c438720dbf    n/a        v0.14.2+builtin          n/a               supported
      sys/          system       system_46e657a4       n/a            n/a        false             replicated     true         false                      map[]             system endpoints used for control, policy and debugging    8ca5d96f-a45e-155a-cfc1-25a56b6a0de5    n/a        v1.13.1+builtin.vault    n/a               n/a
      

      In this command output, kv/ is the secret path.

    5. ~ $ vault kv list -tls-skip-verify kv
      Sample output:
      Keys
      ----
      [***CONTROL PLANE NAMESPACE***]
      
    6. ~ $ vault kv list -tls-skip-verify kv/[***CONTROL PLANE NAMESPACE***]
      Sample output:
      Keys
      ----
      data/
      liftie/
      test
      
    7. ~ $ vault kv list -tls-skip-verify kv/<[***CONTROL PLANE NAMESPACE***]/data
      Sample output:
      Keys
      ----
      [***ENV NAME1***] 
      [***ENV NAME2***]

      Identify the environment for which the exception appeared.

    8. ~ $ vault kv list -tls-skip-verify kv/[***CONTROL PLANE NAMESPACE***]/[***ENTER THE ENV NAME WITH THE EXCEPTION***]
      Sample output:
      Keys
      ----
      [***RANDOM UUID***]
      
    9. ~ $ vault kv list -tls-skip-verify kv/[***CONTROL PLANE NAMESPACE***]/[***ENTER THE ENV NAME WITH THE EXCEPTION***]/[***RANDOM UUID***]
      Sample output:
      Keys
      ----
      cmPassword
      dockerConfigJson
      kubeconfig
      
    10. ~ $ vault kv get -tls-skip-verify kv/[***CONTROL PLANE NAMESPACE***]/[***ENTER THE ENV NAME WITH THE EXCEPTION***]/[***RANDOM UUID***]/cmPassword
      Sample output:
      ================ Secret Path ======================
      kv/[***CONTROL PLANE NAMESPACE***]/[***ENV NAME***]/[***RANDOM UUID***]/cmPassword
      
      
      ======= Metadata =======
      Key                Value
      ---                -----
      created_time       2023-11-15T04:32:36.477837897Z
      custom_metadata    <nil>
      deletion_time      n/a
      destroyed          false
      version            1
      
      ==== Data ====
      Key      Value
      ---      -----
      value    ae4cff8a-fcee-48e9-b381-4a16e883694a88c8d2
      

      The value is the cmPassword (Cloudera Manager password).

  3. Log into Cloudera Manager using the username (cloudera_internal_user) and password (cmPassword) that you obtained in the previous steps.
  4. Run the following commands as shown to regenerate the internal Cloudera Manager credentials in bash:
    1. [root@user ~]# uuidgen command creates the first universally unique identifier (UUID) which you use in the Cloudera Manager username.
      Sample output:
      dc7c7dd7-5a58-497a-a1d1-46cd
    2. [root@user ~]# uuidgen command creates another universally unique identifier (UUID) which is the Cloudera Manager password.
      Sample output:
      9a863dc4-be61-430f-ac87-a4eba0
  5. Assemble the new Cloudera Manager username using the information from the previous commands in the "__cloudera_internal_user__" + [***ENTER THE ENV NAME WITH THE EXCEPTION***] + "-" + [***FIRST_UUID***] format.
    For example, __cloudera_internal_user__cldrienv1-dc7c7dd7-5a58-497a-a1d1-46cd. In this assembled Cloudera Manager username, the prefix __cloudera_internal_user__ is followed by a string that contains the name of the environment with the exception cldrienv1 and the generated UUID dc7c7dd7-5a58-497a-a1d1-46cd separated by "-".

    The new Cloudera Manager password is the second UUID. For example, 9a863dc4-be61-430f-ac87-a4eba0

  6. Go to the Cloudera Manager > Support > API Explorer > UsersResource > POST /users REST API, and perform the following steps:
    1. Click Try it out, and substitute the Cloudera Manager username and password in the following JSON string:
      {
        "items": [
          {
            "name": "[***NEW_CM_INTERNAL_USER***]",
            "password": "[***NEW_CM_INTERNAL_USER_PASSWORD***]",
            "authRoles": [
              {
                "displayName": "Full Administrator",
                "name": "ROLE_ADMIN"
              }
            ]
          }
        ]
      }
      
    2. Copy the JSON string into the REQUEST BODY, and click Execute.
      You get 200 response code.
  7. Verify whether you can use the username and password to log into Cloudera Manager.
  8. Replace the stale Cloudera Manager user with the new username with the following commands:
    1. kubectl exec -it cdp-embedded-db-0 -n [***CONTROL PLANE NAMESPACE***] psql
    2. \c db-env;
    3. Run the following SQL queries in the cdp-embedded-db-0 pod:
      1. SELECT e.environment_crn, c.value FROM environments e JOIN configs c ON e.environment_crn = c.environment_crn WHERE e.environment_name = '[***YOUR ENV NAME***]' AND c.attr = 'cmUser';
      2. UPDATE configs SET value=’[***NEW CLOUDERA MANAGER INTERNAL USER***]’ WHERE environment_crn=’[***ENVIRONMENT CRN OF ENV WITH THE EXCEPTION***]’ AND attr=’cmUser’; command replaces the old Cloudera Manager username.
  9. Replace the stale Cloudera Manager password with the new password:
    1. Run the steps in Step 2 to find the Cloudera Manager user password credential path in the vault which should be in kv/[***CONTROL PLANE NAMESPACE***]/[***ENV-NAME***]/[***RANDOM UUID***]/cmPassword format.
    2. Run $ vault kv patch -tls-skip-verify kv/[***CONTROL PLANE NAMESPACE***]/[***ENV NAME WITH THE EXCEPTION***]/[***RANDOM UUID***]/cmPassword value=[***NEW_CM_INTERNAL_USER_PASSWORD***]
    3. Verify whether the cmPassword is changed using the $ vault kv get -tls-skip-verify kv/[***CONTROL PLANE NAMESPACE***]/[***ENV NAME WITH THE EXCEPTION***]/[***RANDOM UUID***]/cmPassword command.