How FactoryOS Listens and Speaks

How FactoryOS Listens and Speaks

Voice is a first-class layer in FactoryOS, not a feature attached to one part of the product. Speech-to-text and text-to-speech both run on the system itself, available from the personal assistant chat, from knowledge chat, and from any workflow that wants to use them. Voice has its own settings tab and its own engines underneath.

Voice Is Not a Bolt-On

Most AI products treat voice as a wrapper over a cloud transcription service. FactoryOS treats voice as an in-house capability. Whatever you say to the system never leaves it, and whatever the system says back was generated on it.

That distinction matters more in some industries than others. For a law firm dictating client notes, a clinic recording patient encounters, or a finance team talking through M&A drafts, voice is the moment most likely to leak the most sensitive material. Keeping the whole path local closes that door.

Speech In and Speech Out

Speech-to-text runs on Whisper, the well-known open-source transcription model, available in several local builds the system can pick from depending on what's installed and how fast it needs to be. Text-to-speech runs on Supertonic (a neural engine with its own catalog of voices) and Piper (a second neural engine that uses downloadable voice model files), with lightweight fallbacks like espeak available for low-resource setups.

The voice layer isn't tied to any one of these. It detects what's installed, picks the best available, and lists the rest as options. New STT and TTS engines can be added as better local models become available, and the system swaps to them once configured. The piece on top — how you talk to the assistant and how it talks back — stays the same regardless of which engine is doing the work.

Two Ways It Listens

When voice is on, it runs in one of two modes:

- Push-to-talk. The microphone opens only while you hold a key, the way a radio works. Best for shared offices and meeting rooms. - Open conversation. The microphone stays listening throughout, so you can speak freely without pressing anything. Best for private offices or working alone.

You pick the mode from the voice settings, and you can change it any time. Different surfaces can run different modes — your personal assistant might be open conversation at your desk while a focused chat in a shared space stays push-to-talk.

Open conversation is sometimes called "always listening," and that phrasing makes some people uneasy because it conjures a cloud service hoarding audio. The microphone is open locally and only locally — that audio never leaves the box, and the privacy section below covers exactly where every byte of it ends up.

Voices You Pick or Train

The text-to-speech side comes with multiple voices, and the set is expandable. Each TTS engine ships with its own catalog — Supertonic has styles like M1 and F1; Piper draws from a library of downloadable voice models — and new voices can be added to either engine. The system catalogs them all in the same picker.

If you want to go further, FactoryOS can train a voice on a sample of your own speech. It takes about thirty minutes of recorded audio and a meaningful chunk of GPU time to do the training, but once it's done, the trained voice slots in like any other option. Some people enjoy the novelty of hearing their own voice come back from the assistant; others find it uncanny and stick to the stock voices. Either is fine.

Changing the voice doesn't require restarting anything. Pick a new one and the next utterance uses it.

Admins Set the Menu

Voice has its own permissions, separate from how individual users configure it. An admin can turn voice off entirely for everyone on the box, allow it for some roles and not others, or choose which STT and TTS engines are available system-wide. Those settings define the menu users get to see.

Inside that menu, individual users pick what they prefer — which engine, which voice, which mode — but only from the options the admin has enabled. A company comfortable with Whisper but wanting to standardize on Piper for output, for instance, can simply not turn Supertonic on; users still have voice, just with the engine the org chose for them.

That two-tier shape matches how the rest of FactoryOS works — defaults set on top, individual control below, both resolved through the same [permissions cascade](how-factoryos-decides-who-sees-what).

Where Voice Shows Up

Voice is wired into multiple places. The personal assistant chat in the top bar is the most common surface — that's where the morning briefing might be read aloud, or where you might ask a question hands-free. The knowledge chat that's scoped to a project or channel has its own voice toggle. Workflows can include speech-to-text and text-to-speech as steps, letting an automated flow take audio in or deliver audio out without anyone writing code.

Each surface decides independently whether it accepts voice. Some default to text and let you flip voice on when it suits the moment; others might stay text-only depending on context.

Always Local, Always Optional

The full voice path runs on the box. Microphone audio is transcribed locally; generated speech is synthesized locally; nothing crosses a network boundary unless you've explicitly set up an integration that does. Open-conversation mode keeps the microphone open, but the audio it picks up has nowhere else to go — there is no upload, no remote transcription, no cloud index of what you said yesterday. The deeper privacy story — what touches disk, what's logged, what an auditor would see — is covered in [where your voice data actually goes](where-your-voice-data-actually-goes).

Voice is also opt-in. The capability sits dormant until you flip a toggle, and even when on, microphones aren't recording in the background — they're listening only inside the modes that explicitly require it. Someone who prefers typing can use FactoryOS for years and never trigger the voice layer, and someone who likes talking can use voice everywhere it's wired in. The choice belongs to the person at the keyboard — inside whatever menu the admin has set.

Recent Articles

How FactoryOS Builds Charts and Diagrams

Ask FactoryOS for a chart, diagram, table, or image and it renders on your own hardware. Eight local renderers on one canvas, nothing sent to a cloud service.

How the Knowledge Graph Remembers Over Time

FactoryOS's brain remembers what changes over time. Confidence, resilience, and expiration dates make its graph behave more like real memory.

Visual Workflows That Run on Their Own

Draw a workflow on a visual canvas, send it to FactoryOS's runtime, and watch it fire on its own. AI calls, schedules, button triggers, and a full run log.

How Your Personal Assistant Stays Yours

FactoryOS's personal assistant talks only to you. Daily briefing, contact lookup, a character you customize, and a voice you pick, all on your hardware.

Where Your Assistant Gets Its Face

Persona Studio is where you sculpt your FactoryOS assistant's look. Chat to refine, lock a seed, pin favorites, and let the system restage the rest.

How FactoryOS Decides Who Sees What

FactoryOS uses one permission engine for both features and data. Three layers, three states, seven default roles, and every page individually gateable.

How FactoryOS Pilots a Real Browser

FactoryOS hands the model a real Chrome browser, strips the page noise, and carries out a research mission you describe in plain language.

How FactoryOS Retrieves the Right Context

A model is only as good as the context you give it. How FactoryOS stacks keyword, vector, fusion, reranking, and embeddings to retrieve the right passages.

Popular Articles

What Separates an AI OS from a Wrapper

There are three things people call AI tools. Here is what separates a chat wrapper, an agentic tool, and a full AI operating system

Inside the Factory Knowledge Graph

Some answers live in the relationships between documents, not any one of them. How the Factory Knowledge Graph stores time-aware facts you can reason over.

How FactoryOS Retrieves the Right Context

A model is only as good as the context you give it. How FactoryOS stacks keyword, vector, fusion, reranking, and embeddings to retrieve the right passages.

How FactoryOS Pilots a Real Browser

FactoryOS hands the model a real Chrome browser, strips the page noise, and carries out a research mission you describe in plain language.

Why Heavy Load Means Delays Not Crashes

A single machine has a finite GPU. How a queue runner and priority scheduling make heavy load show up as a delay, not a crash or a surprise bill.

How FactoryOS Builds Charts and Diagrams

Ask FactoryOS for a chart, diagram, table, or image and it renders on your own hardware. Eight local renderers on one canvas, nothing sent to a cloud service.

How Your Personal Assistant Stays Yours

FactoryOS's personal assistant talks only to you. Daily briefing, contact lookup, a character you customize, and a voice you pick, all on your hardware.

How FactoryOS Decides Who Sees What

FactoryOS uses one permission engine for both features and data. Three layers, three states, seven default roles, and every page individually gateable.

Other Categories