Creating a Cloudera AI Inference service instance

The recommended way to create a Cloudera AI Inference service is to first generate the CLI input skeleton, customize the JSON file, and then pass the file to the creation command.

The following example present how to create a Cloudera AI Inference service by generating the CLI input skeleton, customizing the JSON file, and then passing the file to the creation command:

  1. Generate the JSON skeleton payload and save it to a file:

    $ cdp ml create-ml-serving-app --generate-cli-skeleton > /tmp/create-serving-app-input.json
  2. Customize the JSON file to use it when creating a Cloudera AI Inference service instance. The following are the sample JSON files of AWS and Azure:

    AWS JSON file example

    {
        "appName": "my-aws-caii-cluster",
        "environmentCrn": "[***CDP-ENVIRONMENT-CRN***]",
        “clusterCrn”: “[***COMPUTE-CLUSTER-CRN***]”,
        "provisionK8sRequest": {
            "instanceGroups": [
                {
                    "instanceType": "m5.4xlarge",
                    "instanceTier": "ON-DEMAND",
                    "instanceCount": 1,
                    "name": "[***OPTIONAL-LEAVE BLANK***]",
                    "rootVolume": {
                        "size": 256
                    },
                    "autoscaling": {
                        "minInstances": 0,
                        "maxInstances": 5,
                        "enabled": true
                    }
                },
                {
                    "instanceType": "p4de.24xlarge",
                    "instanceCount": 1,
                    "rootVolume": {
                        "size": 1024
                    },
                    "autoscaling": {
                        "minInstances": 0,
                        "maxInstances": 5,
                        "enabled": true
                    }
                }
            ],
            "environmentCrn": "[***CDP-ENVIRONMENT-CRN***]",
            "tags": [
                {
                    "key": "experience",
                    "value": "cml-serving"
                }
            ]
        },
        "usePublicLoadBalancer": true,
        "skipValidation": false,
        "loadBalancerIPWhitelists": [
            ""
        ],
        "subnetsForLoadBalancers": [
            ""
        ],
        "staticSubdomain": "mydomain"
     }

    Azure JSON file example

    {
        "appName": "my-azure-caii-cluster",
        "environmentCrn": "[***CDP-ENVIRONMENT-CRN***]",
        "clusterCrn": “[***COMPUTE-CLUSTER-CRN***]”,
        "provisionK8sRequest": {
            "instanceGroups": [
                {
                    "instanceType": "Standard_D4s_v3",
                    "instanceCount": 1,
                    "rootVolume": {
                        "size": 256
                    },
                    "autoscaling": {
                        "minInstances": 0,
                        "maxInstances": 5,
                        "enabled": true
                    }
                },
                {
                    "instanceType": "Standard_ND96asr_A100_v4",
                    "instanceCount": 1,
                    "rootVolume": {
                        "size": 1024
                    },
                    "autoscaling": {
                        "minInstances": 0,
                        "maxInstances": 5,
                        "enabled": true
                    }
                }
            ],
            "environmentCrn": "[***CDP-ENVIRONMENT-CRN***]",
            "tags": [
                {
                    "key": "experience",
                    "value": "cml-serving"
                }
            ]
        },
        "usePublicLoadBalancer": true,
        "skipValidation": false,
        "loadBalancerIPWhitelists": [
            ""
        ],
        "subnetsForLoadBalancers": [
            ""
        ],
        "staticSubdomain": "mydomain"
    }
    
  3. Use the JSON file created in the previous step to create the Cloudera AI Inference service instance:
    $ cdp ml create-ml-serving-app --cli-input-json file:///tmp/create-serving-app-input.json

    After a successful invocation of the create command, the CRN of the Cloudera AI Inference service instance that is created is displayed. The command adds the requested compute worker node groups to the existing Kubernetes cluster specified by the clusterCrn field in the request body, and installs the necessary software components.

    A typical configuration with two worker node groups would take about 15-20 minutes to complete creation.