Deploying an application on Cloudera AI Inference service

Learn how to select a specific Cloudera AI Inference service instance and create a new application. You must configure your application to run on port 8080. Cloudera AI Inference service will map this internal port to port 443 for external access.

To deploy an application on the Cloudera AI Inference service, the following prerequisites must be met:

  1. In the Cloudera console, click the Cloudera AI tile.
  2. Click Applications under Deployments on the left navigation menu.
    The Applications home page displays.
  3. Click the Deploy Application button.
    The Applications / Create Application page displays.
  4. From the Select Environment & Inference Service drop-down list, select your Cloudera environment and the Cloudera AI Inference service instance within which you want to create the application.
  5. In the Name textbox, enter a name for the application.
  6. In the Subdomain textbox, enter a subdomain name. The application will be deployed at https://<subdomain>.<ai_inference_domain>.
  7. In the Description textbox, add a description of the application.
  8. For the Select Source field, choose Git if your application is from a Git repository or Docker if it is a container image.
    • For Git repositories: Select this if your application is in a Git repository.
      • Git URL: Enter the repository URL. For example: https://github.com/cloudera/caii-apps-demo.git.
      • Authentication: If the repository is private, enter the associated username and SSH Key.
      • Entrypoint: Enter the path to your startup script. For example, cml/launch_app.py.
    • For Docker images: Select this if your application is a container image.
      1. Docker Image URL: Enter the full URL of the image.
      2. Authentication: If the image is private, enter the associated username and access token.
  9. For the Authentication Type field, select SSO for a web application or JWT for an API-based application.
  10. For the Resource Profile field, configure the following resource requirements for each application replica:
    1. In the Instance Type field, select the instance type from the drop-down list.
    2. In the GPU field, specify the number of GPUs each application replica requires.
    3. In the CPU field, specify the number of CPUs each application replica requires.
    4. In the Memory field, specify the amount of CPU memory each application replica requires.
  11. Using the Autoscale Range slider bar, specify the minimum and maximum number of replicas for the application. The autoscaler scales the replicas to manage the incoming workload.
  12. In the Environment Variables field, add any custom key and value pairs your application needs.
  13. In the Tags field, add any custom key and value pairs your application needs.
  14. Click the Create Application button to deploy the application.