What Hardware Runs Local AI
Local AI hardware used to mean a server room, and now it mostly means a box the size of a hardcover book. In the past two years the machines that run serious AI models on-premise have collapsed in size, power draw, and price.
That collapse is the quiet enabling fact behind sovereign AI infrastructure: owning the machine became a realistic office purchase instead of a datacenter project. Understanding what class of machine this is comes before any vendor conversation.
A Petaflop on a Desk
The headline change is that petaflop-class AI machines now sit on a desk and plug into an ordinary wall outlet. NVIDIA's DGX Spark, the reference example of the class, delivers up to one petaflop of AI compute with 128 GB of unified memory in a six-inch-square box weighing 1.2 kilograms.1
For scale, that throughput once belonged to racks with dedicated power and cooling. The Spark draws 240 watts, less than many gaming PCs, and sells for $4,699 after launching at $3,999 in October 2025.1, 2, 3 That retail figure is the component, not the system: what a packaged deployment includes is its own page.
The point is not this one product but the class it defined. Several manufacturers now build machines around the same chip, and the class as a whole is what an office actually shops for.
Memory Is the Gatekeeper
Memory capacity, not raw speed, decides which AI models a machine can run at all. A model must fit entirely in memory to run; a model that does not fit does not run slowly, it does not run.
Unified memory is what makes the desktop class work. The CPU and GPU share one 128 GB pool, so the whole allotment is available to the model instead of the small dedicated memory on an ordinary graphics card.
That capacity holds models up to roughly 200 billion parameters for inference, which covers the strong open models most office work calls for.1 Read the memory number first on any spec sheet; it defines which models your office can own.
The Three Practical Tiers
Office-scale local AI hardware sorts into three tiers, and most SMBs only ever need the first. Tier one is a single high-memory desktop box of the class above, and it serves a working office.
Tier two is a workstation carrying discrete GPUs: faster under load, but louder, hungrier, and costlier, worth it mainly when concurrency runs high. Tier three is rack servers, a different world of machine rooms, cooling, and dedicated staff that most small organizations never need to enter.
What the Box Does
All day, the box answers: it runs inference and retrieval for many users over documents it has already ingested. Training models from scratch is the datacenter story; running them is the office story, and it is a far lighter job.
One machine timeshares its GPU across the whole office, which is why a single box supports more staff than intuition suggests. FactoryOS ships on exactly this class of machine, one box serving an office through shared GPU scheduling, so a busy peak means a short queue, not a crash.
Power, Noise, and Space
Forget the server-room mental model; this class of machine needs a desk corner, a standard outlet, and a network cable. At 240 watts there is no electrical work, no dedicated cooling, and no more noise than an ordinary desktop PC.1
That practicality is part of the purchase decision. The machine sits in the room where the work happens, and nothing about the building has to change to accommodate it.
What It Cannot Run
An honest spec sheet ends with a limit: the very largest frontier models still exceed any desktop box. Those trillion-parameter-class systems run only in datacenters and are rented by the token, not owned.
That is a scoping question, not a disqualifier. Most office work runs well on models that fit in 128 GB, and knowing which tasks genuinely need a frontier model is how you size the purchase rather than rule it out.