STAKATER AI Factory Sales Play
For organisations running OpenShift AI with GPU infrastructure who need to share it across teams — or sell it as a service — without rebuilding it for each new consumer.
You bought OpenShift AI for serious AI workloads. Expensive GPUs, model serving, pipelines, notebook environments. But access is limited to whoever knows the cluster admin. Everyone else files a ticket — or books AWS.
OpenShift AI with NVIDIA A100/H100 nodes. Jupyter notebooks, training pipelines, KServe inference endpoints. Production-grade — not a sandbox.
Consumer AI, network intelligence, fraud detection, customer analytics, internal tooling — all wanting GPU access from the same cluster. No operating model to share it safely.
For telcos: enterprise customers want sovereign AI compute — low latency, data residency, no Big Tech dependency. You have the infrastructure. You don't have the product.
The investment in AI infrastructure is significant. What's missing is the layer that turns a cluster into a service — so teams can self-serve instead of waiting in a queue.
What you've bought
Jupyter notebook servers, training pipelines, KServe model serving, MLflow experiment tracking. Operators deployed. Platform team manages it.
High-end GPU nodes for training and inference. MIG-capable. Expensive to procure and operate. Utilisation only visible to the platform team.
MinIO, Ceph, or NetApp for model artifacts, datasets, and pipeline outputs. One shared bucket space — no isolation between teams' data and models.
KServe, vLLM, or NVIDIA NIM for inference endpoints. Deployed by the platform team per request. No self-service. No tenant isolation between serving endpoints.
Enterprise identity. Teams authenticated. But no bridge to per-team workspace isolation, quota enforcement, or scoped access to models and datasets.
Data scientists request notebook profiles, GPU allocations, and storage buckets via ServiceNow. Average wait: days to weeks. Shadow cloud grows.
OpenShift AI is built for a platform team to run. It isn't built for 20 AI teams to share safely with resource isolation and self-service. That's the missing layer.
ResourceQuota limits CPU and memory, but not GPU scheduling priority. One team's training job starves another team's inference serving during a burst — with no visibility or isolation.
Data scientists cannot spin up their own notebook servers, storage buckets, or training jobs without a platform team member creating them. Every experiment starts with a ticket.
No published catalog of pre-approved notebook profiles, GPU tiers, and inference endpoint types. Every request is custom, every approval is ad hoc.
No per-team GPU usage tracking. Finance cannot charge back AI compute costs to business units. No data to justify the next GPU procurement — or to identify waste.
No workflow for promoting a model from experiment to production. No approval gate. No audit trail of which team deployed which model version to a serving endpoint.
No way to offer GPU compute or AI services to external customers. The infrastructure exists — there's nothing to sell it through.
GPU infrastructure is among the most expensive compute you will ever buy. Without the operating model to share it, you pay for the whole cluster and a handful of teams get to use it.
H100 nodes cost $30k–$40k each. A cluster of 20 nodes is $600k–$800k of hardware — before software, power, and cooling. Running at 35% utilisation because only a few teams have access isn't a GPU problem. It's an access model problem.
Data scientists who can't get internal GPU access spin up AWS p4d instances or Azure NDv4 clusters on corporate cards. Compliance doesn't know. Data leaves the perimeter. The internal cluster still sits idle.
AI teams run training jobs with no visibility into cost. A fine-tune that runs for 3 days on 8 GPUs could cost €2,000 in equivalent cloud compute — no one knows. No incentive to optimise.
For telcos: enterprise customers are actively looking for sovereign GPU compute. GDPR-compliant, low-latency, not AWS. You have the hardware. You have the network. You have no product to sell.
Sits on top of your existing OpenShift AI deployment. Adds multi-tenancy, self-service, GPU metering, and a service catalog — without replacing any of the RHOAI stack underneath.
KCP virtual control planes per team. GPU quotas enforced architecturally. Team A's training burst cannot starve Team B's serving endpoint. Hard resource boundaries — not just LimitRange.
Pre-approved, pre-configured items: notebook profiles (CPU-only, A100×1, A100×4, H100×8), fine-tuning job templates, inference endpoint tiers. Data scientists self-provision in under a minute.
Per-team GPU utilisation tracked continuously. Cost per training run, per serving endpoint, per notebook session. Exported to finance systems for chargeback. Procurement decisions based on data, not estimates.
Each team's models, datasets, and pipeline artifacts scoped to their workspace. Sharing is explicit and audited. Team A cannot access Team B's model registry or training data by accident or design.
Approval workflow for promoting a model from experiment to production serving. Audit trail of every deployment — which model, which version, which team, which approver. Compliance evidence generated automatically.
Extend the same operating model to external enterprise tenants. Offer GPU compute, fine-tuning capacity, and inference endpoints as a sovereign B2B AI service. Metered, billed, isolated.
RHOAI stays. Cloud Orchestrator adds the operating model above it. Teams self-serving in 6 weeks.
GPU topology mapped, team use cases classified, quota model agreed
We
· GPU cluster audit
· Team inventory + use case classification
· MIG partitioning design
You
· GPU admin access
· AI team leads engaged
· Use case priority list
Cloud Orchestrator + RHOAI integrated, first team workspace live
We
· Deploy on RHOAI cluster
· GPU quota enforcement
· Storage namespace isolation
You
· Dedicated admin namespace
· Object storage credentials
· SSO service account
3 AI teams self-provisioning, GPU metering active
We
· AI service catalog built
· Self-service notebook portal
· GPU-hour metering dashboard
You
· 3 pilot teams — varied maturity
· Data scientists in pilot
· Validate GPU scheduling
All teams onboarded, GPU chargeback to cost centers
We
· Full team onboarding
· Chargeback export to finance
· Model governance workflows
You
· Finance system integration
· Team onboarding comms
· Quota policy approval
New teams in under an hour, external AI services live
We
· External tenant onboarding
· New GPU class support
· Quarterly catalog review
You
· External customer pipeline
· GPU expansion plan
· New model type requests
AI platform deployments that stall almost always do so for the same reasons. Get these four things right in Assess and the factory runs.
How many GPUs per team? MIG partitioning or whole-GPU allocation? Priority classes for training vs inference? Agree this in Assess. Reworking GPU scheduling after teams are onboarded is painful and politically complex.
The self-service portal must work for people who don't know Kubernetes. If only platform engineers validate it, you'll miss the UX gaps that block actual adoption. Put real data scientists in front of it in week five.
GPU chargeback is the business case. If finance is engaged late, the metering is built but there's no receiver for the data. Get the cost centre mapping and finance system integration scoped in Assess, not month three.
Every team will request a custom notebook profile or a unique GPU allocation. The catalog model only scales if custom configurations go through the catalog approval process — not as ad hoc exceptions. Set that expectation before go-live.
Cloud Orchestrator adds the operating model above your RHOAI stack —
multi-tenant, self-service, GPU-metered, and ready to monetise in six weeks.
stakater.com