Key Features

The key features of Cloudera AI Inference service includes:

  • Easy to use interface: Streamlines the complexities of deployment and infrastructure, meaningfully reducing time to value for AI use cases.
  • Real-time predictions: Allows users to serve AI models in real-time, providing low latency predictions for client requests.
  • Monitoring and logging: Includes functionality for monitoring and logging, making it easier to troubleshoot issues and optimize workload performance.
  • Advanced deployment patterns: Includes functionality for advanced deployment patterns, such as canary and blue-green deployments, and supports A/B testing, enabling users to deploy new versions of models gradually and compare their performance before deploying them to production.
  • Optimized Performance: Integrates with NVIDIA NIM microservices and NVIDIA Triton Inference Server to accelerate inference performance on NVIDIA accelerated infrastructure.
  • Model access: Offers access to NVIDIA foundation models, tailored for NVIDIA hardware to increase inference throughput and to reduce latency.
  • REST API: Provides APIs for deploying, managing, and monitoring of model endpoints. These APIs enable integration with continuous integration and continuous deployment (CI/CD) pipelines and other tools used in the Machine Learning Operations (MLOps) and Large Language Model Operations (LLMOps) workflows.