Cloudera AI Inference service Overview

Cloudera AI Inference service provides a production-grade serving environment for hosting predictive and generative AI. It is designed to handle the challenges of production deployments, such as high availability, performance, fault tolerance, and scalability. The Cloudera AI Inference service allows data scientists and machine learning engineers to deploy their models quickly, without worrying about the infrastructure and maintenance. Cloudera AI Inference service supports running 100 or more model endpoints simultaneously, provided that the underlying compute resources are adequately and correctly sized.

Cloudera AI Inference service is built with tight integration of NVIDIA NIM and NVIDIA Triton Inference Server, providing industry-leading inference performance on NVIDIA GPUs. Model endpoint orchestration and management are built with the KServe model inference platform, a Cloud Native Computing Foundation (CNCF) open-source project. The platform provides standard-compliant model inference protocols, such as Open Inference Protocol for predictive models and OpenAI API for generative AI models.

Cloudera AI Inference service is seamlessly integrated with the Cloudera AI Registries enabling users to store, manage, and track their AI models throughout their lifecycle. This integration provides a central location to store model and application artifacts, metadata, and versions, making it easier to share and reuse AI artifacts across different teams and projects. It also simplifies the process of deploying models and applications to production, as users can select artifacts from the registry and deploy them with a single command.