AI Factory Sales Play

You built the AI factory.
Now who gets to use it?


For organisations running OpenShift AI with GPU infrastructure who need to share it across teams — or sell it as a service — without rebuilding it for each new consumer.

The GPU cluster is shared.
Your AI teams are not.

You bought OpenShift AI for serious AI workloads. Expensive GPUs, model serving, pipelines, notebook environments. But access is limited to whoever knows the cluster admin. Everyone else files a ticket — or books AWS.

🧠

Real AI workloads on real hardware

OpenShift AI with NVIDIA A100/H100 nodes. Jupyter notebooks, training pipelines, KServe inference endpoints. Production-grade — not a sandbox.

🏢

Many teams, one cluster

Consumer AI, network intelligence, fraud detection, customer analytics, internal tooling — all wanting GPU access from the same cluster. No operating model to share it safely.

📡

The external opportunity

For telcos: enterprise customers want sovereign AI compute — low latency, data residency, no Big Tech dependency. You have the infrastructure. You don't have the product.

OpenShift AI is the engine.
The factory floor is empty.

The investment in AI infrastructure is significant. What's missing is the layer that turns a cluster into a service — so teams can self-serve instead of waiting in a queue.

What you've bought

OpenShift AI (RHOAI)

Jupyter notebook servers, training pipelines, KServe model serving, MLflow experiment tracking. Operators deployed. Platform team manages it.

GPU nodes — A100 / H100

High-end GPU nodes for training and inference. MIG-capable. Expensive to procure and operate. Utilisation only visible to the platform team.

Object storage for models + data

MinIO, Ceph, or NetApp for model artifacts, datasets, and pipeline outputs. One shared bucket space — no isolation between teams' data and models.

Model serving infrastructure

KServe, vLLM, or NVIDIA NIM for inference endpoints. Deployed by the platform team per request. No self-service. No tenant isolation between serving endpoints.

Active Directory / SSO

Enterprise identity. Teams authenticated. But no bridge to per-team workspace isolation, quota enforcement, or scoped access to models and datasets.

ITSM — ticket-based access

Data scientists request notebook profiles, GPU allocations, and storage buckets via ServiceNow. Average wait: days to weeks. Shadow cloud grows.

Great infrastructure.
No operating model to share it.

OpenShift AI is built for a platform team to run. It isn't built for 20 AI teams to share safely with resource isolation and self-service. That's the missing layer.

Per-team GPU isolation

ResourceQuota limits CPU and memory, but not GPU scheduling priority. One team's training job starves another team's inference serving during a burst — with no visibility or isolation.

Self-service AI environments

Data scientists cannot spin up their own notebook servers, storage buckets, or training jobs without a platform team member creating them. Every experiment starts with a ticket.

AI service catalog

No published catalog of pre-approved notebook profiles, GPU tiers, and inference endpoint types. Every request is custom, every approval is ad hoc.

GPU-hour metering

No per-team GPU usage tracking. Finance cannot charge back AI compute costs to business units. No data to justify the next GPU procurement — or to identify waste.

Model governance per team

No workflow for promoting a model from experiment to production. No approval gate. No audit trail of which team deployed which model version to a serving endpoint.

External monetisation

No way to offer GPU compute or AI services to external customers. The infrastructure exists — there's nothing to sell it through.

H100s running at 35%.
While teams book AWS.

GPU infrastructure is among the most expensive compute you will ever buy. Without the operating model to share it, you pay for the whole cluster and a handful of teams get to use it.

Stranded GPU investment

H100 nodes cost $30k–$40k each. A cluster of 20 nodes is $600k–$800k of hardware — before software, power, and cooling. Running at 35% utilisation because only a few teams have access isn't a GPU problem. It's an access model problem.

Shadow GPU spend

Data scientists who can't get internal GPU access spin up AWS p4d instances or Azure NDv4 clusters on corporate cards. Compliance doesn't know. Data leaves the perimeter. The internal cluster still sits idle.

No chargeback = no accountability

AI teams run training jobs with no visibility into cost. A fine-tune that runs for 3 days on 8 GPUs could cost €2,000 in equivalent cloud compute — no one knows. No incentive to optimise.

Revenue opportunity missed

For telcos: enterprise customers are actively looking for sovereign GPU compute. GDPR-compliant, low-latency, not AWS. You have the hardware. You have the network. You have no product to sell.

Cloud Orchestrator adds the
commercial layer above RHOAI.

Sits on top of your existing OpenShift AI deployment. Adds multi-tenancy, self-service, GPU metering, and a service catalog — without replacing any of the RHOAI stack underneath.

Per-team GPU workspaces

KCP virtual control planes per team. GPU quotas enforced architecturally. Team A's training burst cannot starve Team B's serving endpoint. Hard resource boundaries — not just LimitRange.

AI service catalog

Pre-approved, pre-configured items: notebook profiles (CPU-only, A100×1, A100×4, H100×8), fine-tuning job templates, inference endpoint tiers. Data scientists self-provision in under a minute.

GPU-hour metering

Per-team GPU utilisation tracked continuously. Cost per training run, per serving endpoint, per notebook session. Exported to finance systems for chargeback. Procurement decisions based on data, not estimates.

Model + data isolation

Each team's models, datasets, and pipeline artifacts scoped to their workspace. Sharing is explicit and audited. Team A cannot access Team B's model registry or training data by accident or design.

Model promotion governance

Approval workflow for promoting a model from experiment to production serving. Audit trail of every deployment — which model, which version, which team, which approver. Compliance evidence generated automatically.

External AI services (for telcos)

Extend the same operating model to external enterprise tenants. Offer GPU compute, fine-tuning capacity, and inference endpoints as a sovereign B2B AI service. Metered, billed, isolated.

From GPU cluster to AI factory as a service

RHOAI stays. Cloud Orchestrator adds the operating model above it. Teams self-serving in 6 weeks.

1
Assess Week 1–2

GPU topology mapped, team use cases classified, quota model agreed

We

· GPU cluster audit

· Team inventory + use case classification

· MIG partitioning design

You

· GPU admin access

· AI team leads engaged

· Use case priority list

2
Foundation Week 3–4

Cloud Orchestrator + RHOAI integrated, first team workspace live

We

· Deploy on RHOAI cluster

· GPU quota enforcement

· Storage namespace isolation

You

· Dedicated admin namespace

· Object storage credentials

· SSO service account

3
Pilot Week 5–7

3 AI teams self-provisioning, GPU metering active

We

· AI service catalog built

· Self-service notebook portal

· GPU-hour metering dashboard

You

· 3 pilot teams — varied maturity

· Data scientists in pilot

· Validate GPU scheduling

4
Production Month 2–3

All teams onboarded, GPU chargeback to cost centers

We

· Full team onboarding

· Chargeback export to finance

· Model governance workflows

You

· Finance system integration

· Team onboarding comms

· Quota policy approval

5
Scale Month 3+

New teams in under an hour, external AI services live

We

· External tenant onboarding

· New GPU class support

· Quarterly catalog review

You

· External customer pipeline

· GPU expansion plan

· New model type requests

What makes the
AI factory actually work.

AI platform deployments that stall almost always do so for the same reasons. Get these four things right in Assess and the factory runs.

GPU quota model agreed before build

How many GPUs per team? MIG partitioning or whole-GPU allocation? Priority classes for training vs inference? Agree this in Assess. Reworking GPU scheduling after teams are onboarded is painful and politically complex.

Data scientists in the pilot — not just platform engineers

The self-service portal must work for people who don't know Kubernetes. If only platform engineers validate it, you'll miss the UX gaps that block actual adoption. Put real data scientists in front of it in week five.

Finance engaged for GPU chargeback from day one

GPU chargeback is the business case. If finance is engaged late, the metering is built but there's no receiver for the data. Get the cost centre mapping and finance system integration scoped in Assess, not month three.

Catalog discipline — say no to bespoke

Every team will request a custom notebook profile or a unique GPU allocation. The catalog model only scales if custom configurations go through the catalog approval process — not as ad hoc exceptions. Set that expectation before go-live.

The factory floor is built.
Open the doors.


Cloud Orchestrator adds the operating model above your RHOAI stack —
multi-tenant, self-service, GPU-metered, and ready to monetise in six weeks.



stakater.com