OpenShift AI gives you the GPU infrastructure and ML tooling. What it doesn't give you is a way to offer that infrastructure as a product โ a catalog teams order from, a portal they log into, and a metering layer that tracks GPU consumption per team or per job.
What OpenShift AI gives you โ and what's missing
Cloud Orchestrator adds all three above OpenShift AI.
The Problem
AI teams, data scientists, and ML engineers need GPUs. The infrastructure is there. But without a delivery and billing layer, every request is manual โ and every wasted GPU-hour is money nobody is tracking.
Data scientists wait days for GPU access. By the time resources arrive, the sprint is over and the experiment is stale.
GPUs sit idle between jobs. Nobody knows which team is consuming what. Finance has no data for chargebacks or budget allocation.
Trained models need to be served. There's no standard way to offer inference endpoints as a managed service across teams or customers.
Without hard multi-tenancy, one team's workloads can see or interfere with another's. Data science workloads often involve sensitive model weights and training data.
What You Can Offer
Cloud Orchestrator's XaaS SDK lets you define any AI service as a catalog item โ with self-service access, quotas, and metering built in.
GPUaaS
Data scientists and ML engineers request GPU-backed clusters from the catalog โ A100s, H100s, or whatever your hardware is. Provisioned in minutes. Metered per GPU-hour. Automatically decommissioned when done.
InferenceaaS
Teams deploy trained models to managed inference endpoints โ isolated per team, auto-scaled, metered per request or per hour. No infrastructure management for the consuming team.
ModelaaS
A central model registry with controlled access โ teams publish models, other teams consume them as managed services. Version control, access policy, and usage tracking included.
AI Dev Environments
Jupyter or equivalent environments with GPU backing, provisioned per user or per team on demand. Idle timeout and auto-cleanup prevent GPU waste.
How It Works
Define GPU cluster sizes, GPU types, and time limits as catalog items. Teams order from the catalog โ Cloud Orchestrator provisions and enforces the limits.
Every GPU-hour consumed by every team is tracked. Chargeback to cost centres or billing to external customers โ the data is always available.
Each team's GPU workloads run in isolated environments. No cross-team visibility, no shared secrets, no accidental resource contention.
GPU allocations have hard quotas and optional auto-expiry. Idle resources are reclaimed automatically โ no wasted GPU-hours sitting unclaimed.
Cloud Orchestrator sits above OpenShift AI โ it doesn't replace it. The ML tooling, GPU Operator, and KServe stack remain unchanged underneath.
Define any AI service as a catalog item. If it runs on Kubernetes, Cloud Orchestrator can wrap it in a self-service, metered offering.
Related use cases
Start with a complimentary 2-hour design workshop. We design your service catalog, tenant model, and 90-day pilot scope โ with your team, on your infrastructure.