Key Features
The key features of Cloudera AI Inference service includes:
- Easy to use interface: Streamlines the complexities of deployment and infrastructure, meaningfully reducing time to value for AI use cases.
- Real-time predictions: Allows users to serve AI models in real-time, providing low latency predictions for client requests.
- Monitoring and logging: Includes functionality for monitoring and logging, making it easier to troubleshoot issues and optimize workload performance.
- Advanced deployment patterns: Includes functionality for advanced deployment patterns, such as canary and blue-green deployments, and supports A/B testing, enabling users to deploy new versions of models gradually and compare their performance before deploying them to production.
- Optimized Performance: Integrates with NVIDIA NIM microservices and NVIDIA Triton Inference Server to accelerate inference performance on NVIDIA accelerated infrastructure.
- Model access: Offers access to NVIDIA foundation models, tailored for NVIDIA hardware to increase inference throughput and to reduce latency.
- REST API: Provides APIs for deploying, managing, and monitoring of model endpoints. These APIs enable integration with continuous integration and continuous deployment (CI/CD) pipelines and other tools used in the Machine Learning Operations (MLOps) and Large Language Model Operations (LLMOps) workflows.