Technical service bulletins

Learn about the technical service bulletins (TSBs) with the Cloudera Data Engineering (CDE) service on public clouds, the impact or changes to the functionality, and the workaround.

Technical Service Bulletin (2022-588)
Title: Kubeconfig and new version of aws-iam-authenticator

Regenerate Kubeconfig and in conjunction use a newer version of aws-iam-authenticator on AWS. Kubeconfig in Cloudera Data Platform (CDP) Public Cloud Data Services needs to be regenerated because the Kubeconfig generated before June 15, 2022 uses an old APIVersion (client.authentication.k8s.io/v1alpha1) which is no longer supported. This causes compatibility issues with aws-iam-authenticator starting from v0.5.7. To be able to use the new aws-iam-authenticator, the Kubeconfig needs to be regenerated.

Severity
  • High
Components affected
  • Cloudera Machine Learning (CML) Data Service
  • Cloudera Data Engineering (CDE) Data Service
  • Cloudera Data Flow (CDF) Data Service
Products affected
  • CDP Public Cloud
Releases affected
  • All existing and upcoming CDP Public Cloud releases using the affected components, listed above.
Users affected
  • Users of CML, CDE, and CDF using Kubeconfig generated before June 15, 2022 and a version of aws-iam-authenticator prior to v0.5.7.
Impact
  • Customers using a Kubeconfig generated before June 15, 2022 and an aws-iam-authenticator version prior to v0.5.7 may see Kubernetes clients not able to access the cluster successfully.
Action required

From June 15, 2022 onwards, existing customers on AWS using a previously generated Kubeconfig will have to:

  • Regenerate and use the new Kubeconfig, and
  • Use a new version of aws-iam-authenticator starting with v0.5.7.

The newly generated Kubeconfig changes the APIVerson of the user’s section as:

Old:

ApiVersion: "client.authentication.k8s.io/v1alpha1"

New:

ApiVersion: "client.authentication.k8s.io/v1beta1"

For the latest update on this issue see the corresponding Knowledge Base article: TSB 2022-588: Kubeconfig and new version of aws-iam-authenticator

Technical Service Bulletin (2022-587)
Issue: CDE 1.14 and above using Kubernetes 1.21 will fail service account token renewal after 90 days

Cloudera Data Engineering (CDE) on Amazon Web Services (AWS) running version CDE 1.14 and above using Kubernetes 1.21 will observe failed jobs after 90 days of service uptime [1].

[1] “For Amazon Elastic Kubernetes Service (EKS) clusters, the extended expiry period is 90 days. Your Amazon EKS cluster's Kubernetes API server rejects requests with tokens older than 90 days.”

Symptoms
  • CDE Jobs (Spark or Airflow) fail with "Service Account May Have Been Revoked. Unauthorized"

Components affected
  • CDE (specifically Calico and Livy)

Products affected
  • Cloudera Data Engineering (CDE) in Cloudera Data Platform (CDP) Public Cloud on AWS

Releases affected
  • CDE Public Cloud releases 1.14, 1.15, 1.16

Users affected
  • Users that have CDE enabled for at least 90 days

  • Users on Amazon Elastic Container Service for Kubernetes (EKS) with Kubernetes 1.21

Impact
  • All CDE jobs will fail (new and existing ones including scheduled).

  • Diagnostic bundles will also fail to download.

Action required
  • Workaround: Restart specific components to force regenerate the token. There are two options to do this:

Option 1) Using kubectl:
  1. Setup kubectl for CDE.

    https://community.cloudera.com/t5/Community-Articles/Support-Video-Enabling-kubectl-for-CDE/ta-p/314200

  2. Delete "calico-node" pods.

    kubectl delete pods --selector k8s-app=calico-node --namespace
                  kube-system
  3. Delete Livy pods for all Virtual Clusters (VC).

    kubectl delete pods --selector app.kubernetes.io/name=livy
                --all-namespaces

If for some reason only one Livy pod needs to be fixed:

  1. Find VC ID through the UI under "Cluster Details”.

  2. Delete Livy pod:

    export VC_ID=<VC ID>
    kubectl delete pods --selector app.kubernetes.io/name=livy --namespace
                  ${​VC_ID}

Option 2) Using Kubernetes dashboard

  1. On the "Service Details" page copy the "RESOURCE SCHEDULER" link.

  2. Replace "yunikorn" part with the "dashboard" and open the resulting link in the browser.

  3. In the top left corner find the namespaces dropdown and choose "All namespaces".

  4. Search for "calico-node".

  5. For each pod in the "Pods" table click on the "Delete" option from the "hamburger" menu.

  6. Search for "livy".

  7. For each pod in the "Pods" table click on the "Delete" option from the "hamburger" menu.

  8. If for some reason only one Livy pod needs to be fixed:

  1. Find VC ID through the UI under "Cluster Details" and only delete the pod with the name starting with VC ID.

Addressed in release
  • Fixed in CDE 1.17

For the latest update on this issue see the corresponding Knowledge Base article: TSB 2022-587: CDE 1.14 and above using Kubernetes 1.21 will fail service account token renewal after 90 days