What Hardware Runs Local AI
sovereign infrastructure

What Hardware Runs Local AI

Local AI hardware used to mean a server room, and now it mostly means a box the size of a hardcover book. In the past two years the machines that run serious AI models on-premise have collapsed in size, power draw, and price.

That collapse is the quiet enabling fact behind sovereign AI infrastructure: owning the machine became a realistic office purchase instead of a datacenter project. Understanding what class of machine this is comes before any vendor conversation.

A Petaflop on a Desk

The headline change is that petaflop-class AI machines now sit on a desk and plug into an ordinary wall outlet. NVIDIA's DGX Spark, the reference example of the class, delivers up to one petaflop of AI compute with 128 GB of unified memory in a six-inch-square box weighing 1.2 kilograms.1

For scale, that throughput once belonged to racks with dedicated power and cooling. The Spark draws 240 watts, less than many gaming PCs, and sells for $4,699 after launching at $3,999 in October 2025.1, 2, 3 That retail figure is the component, not the system: what a packaged deployment includes is its own page.

The point is not this one product but the class it defined. Several manufacturers now build machines around the same chip, and the class as a whole is what an office actually shops for.

Memory Is the Gatekeeper

Memory capacity, not raw speed, decides which AI models a machine can run at all. A model must fit entirely in memory to run; a model that does not fit does not run slowly, it does not run.

Unified memory is what makes the desktop class work. The CPU and GPU share one 128 GB pool, so the whole allotment is available to the model instead of the small dedicated memory on an ordinary graphics card.

That capacity holds models up to roughly 200 billion parameters for inference, which covers the strong open models most office work calls for.1 Read the memory number first on any spec sheet; it defines which models your office can own.

The Three Practical Tiers

Office-scale local AI hardware sorts into three tiers, and most SMBs only ever need the first. Tier one is a single high-memory desktop box of the class above, and it serves a working office.

Tier two is a workstation carrying discrete GPUs: faster under load, but louder, hungrier, and costlier, worth it mainly when concurrency runs high. Tier three is rack servers, a different world of machine rooms, cooling, and dedicated staff that most small organizations never need to enter.

What the Box Does

All day, the box answers: it runs inference and retrieval for many users over documents it has already ingested. Training models from scratch is the datacenter story; running them is the office story, and it is a far lighter job.

One machine timeshares its GPU across the whole office, which is why a single box supports more staff than intuition suggests. FactoryOS ships on exactly this class of machine, one box serving an office through shared GPU scheduling, so a busy peak means a short queue, not a crash.

Power, Noise, and Space

Forget the server-room mental model; this class of machine needs a desk corner, a standard outlet, and a network cable. At 240 watts there is no electrical work, no dedicated cooling, and no more noise than an ordinary desktop PC.1

That practicality is part of the purchase decision. The machine sits in the room where the work happens, and nothing about the building has to change to accommodate it.

What It Cannot Run

An honest spec sheet ends with a limit: the very largest frontier models still exceed any desktop box. Those trillion-parameter-class systems run only in datacenters and are rented by the token, not owned.

That is a scoping question, not a disqualifier. Most office work runs well on models that fit in 128 GB, and knowing which tasks genuinely need a frontier model is how you size the purchase rather than rule it out.

Recent Articles

What Air-Gapped AI Actually Means

Vendors call encrypted connections air-gapped. The real thing has no network path at all. What the gap removes, what it costs, and who actually needs it.

When Your AI Model Gets Retired

Cloud AI models retire on the vendor's schedule, and your tested workflows inherit the churn. What deprecation really costs, and how ownership inverts it.

Private Cloud vs On-Premise AI

Private cloud means a locked room in someone else's building. What actually changes between VPC, colo, and on-premise AI, and who holds the keys.

Vendor Lock In and Your AI Data

Vendor lock-in is rarely a deliberate choice. It is switching cost paid in your own data, and the wall around the exit rises one reasonable step at a time.

Why Owned AI Becomes a Platform

Rent a tool and you solve one problem. Own the infrastructure and every later problem reuses the same foundation. Why owned AI is a platform, not an app.

When to Use Local AI vs Frontier Models

The choice is not local AI or frontier models. It is which work runs where, and what changes for privacy, speed, reasoning, and cost when you split them.

What Sovereign AI Infrastructure Actually Means

Vendors call almost anything sovereign. Strip the marketing and one definition holds: you own the whole stack and nothing essential answers to anyone else.

What Does Cloud AI Actually Cost Per Year

A 50-person office using cloud AI spends $90,000 to $180,000 over five years. Here is how that number is built, token by token.

Popular Articles

What Sovereign AI Infrastructure Actually Means

Vendors call almost anything sovereign. Strip the marketing and one definition holds: you own the whole stack and nothing essential answers to anyone else.

What Does Cloud AI Actually Cost Per Year

A 50-person office using cloud AI spends $90,000 to $180,000 over five years. Here is how that number is built, token by token.

Why Owned AI Becomes a Platform

Rent a tool and you solve one problem. Own the infrastructure and every later problem reuses the same foundation. Why owned AI is a platform, not an app.

Vendor Lock In and Your AI Data

Vendor lock-in is rarely a deliberate choice. It is switching cost paid in your own data, and the wall around the exit rises one reasonable step at a time.

When to Use Local AI vs Frontier Models

The choice is not local AI or frontier models. It is which work runs where, and what changes for privacy, speed, reasoning, and cost when you split them.

What Air-Gapped AI Actually Means

Vendors call encrypted connections air-gapped. The real thing has no network path at all. What the gap removes, what it costs, and who actually needs it.

Private Cloud vs On-Premise AI

Private cloud means a locked room in someone else's building. What actually changes between VPC, colo, and on-premise AI, and who holds the keys.

When Your AI Model Gets Retired

Cloud AI models retire on the vendor's schedule, and your tested workflows inherit the churn. What deprecation really costs, and how ownership inverts it.

Other Categories