Serving Applications on Cloudera AI Inference service (Technical Preview)

Cloudera AI Inference service provides a production-grade serving environment for hosting applications. This feature is designed to manage complex production deployment requirements, including high availability, autoscaling, scaling to zero, performance, and fault tolerance.

Applications deployed on Cloudera AI Inference service can scale alongside Model Endpoints, providing a scalable solution for various components, such as end-to-end applications, agents, Model Context Protocol (MCP) clients, and servers.

Currently, you can deploy application artifacts from Git or Docker repositories.