Does Your AI Vendor Train on Your Data
compliance privacy

Does Your AI Vendor Train on Your Data

Whether your AI vendor trains on your data depends almost entirely on which door you walked in through. AI vendor training data policies split sharply between consumer tiers and business contracts, and most small firms are standing on the consumer side without realizing it.

The honest answer is more nuanced than either the marketing or the alarm suggests. Here is what actually happens to the text your team pastes into cloud AI.

Consumer Tiers Train by Default

OpenAI may use conversations from consumer ChatGPT to train its models unless you opt out in settings -- on the individual tier, your content is training material by default.1 That is the tier your staff reaches for when nobody has issued them anything better.

Anthropic draws the line differently, tying consumer training to a model-improvement setting, while conversations flagged for safety review can be used to improve its safeguards either way.2 Every consumer product handles this a little differently, which is exactly the problem: your exposure depends on a toggle your employees have probably never seen.

Enterprise Terms Are Genuinely Better

Enterprise and API terms are genuinely better, and it would be unfair to pretend otherwise. OpenAI does not train on inputs or outputs from its business products by default, and Anthropic makes the same commitment for its commercial offerings.1, 2

If your team uses cloud AI at all, a business contract is the floor, not a luxury. The real question is what a no-training promise does and does not cover.

Training Is Not the Only Exposure

A no-training clause does not mean your data never sits on the vendor's systems. Deleted chats and API logs are typically retained for up to 30 days, content flagged by safety systems can be pulled for human review, and every deletion schedule carries the same exception: unless legally required to retain it.2, 3

Your prompt can be trained on nothing and still be stored, still be readable, and still be discoverable. For a compliance officer those are three separate questions, and the training toggle answers only the first -- the same split governs where your voice recordings end up.

Courts Can Override Deletion Promises

That legal exception stopped being theoretical in May 2025, when a federal magistrate judge in The New York Times' copyright suit ordered OpenAI to preserve consumer ChatGPT and API output logs that would otherwise have been deleted -- including chats users had erased expecting them gone within 30 days.3

The district court affirmed the order that June; the going-forward obligation ended in September, but the data already preserved stayed preserved, and in November the court ordered 20 million de-identified consumer chats produced to the plaintiffs.4 Enterprise and zero-data-retention customers were excluded, which proves the point both ways: the contract tier mattered, and the vendor's own deletion policy did not.

You Inherit Every Terms Revision

The policy you evaluated at signing is not the policy you will be under next year. Cloud AI terms get revised, settings get renamed and reshuffled, and each revision applies to your data going forward whether or not anyone read the notice email.

That standing review burden is part of what renting your AI actually costs. Someone in your organization now owns the job of re-reading a vendor's data terms forever.

Architecture Beats Vendor Promises

For regulated data, the cleanest answer is architectural: data that never leaves the building needs no vendor promise at all. It is the same logic that makes HIPAA compliance an architecture problem rather than a paperwork one, and the reason privilege-sensitive work sits so uneasily with third-party processing.

An on-premise system like FactoryOS removes the vendor from the data path entirely -- there is no training policy to audit because there is no third party in the path. No retention window, no flagged-content review, no preservation order can reach data that never crossed your walls.

Match the Tier to the Stakes

The fair conclusion is proportionality, not panic. For low-stakes drafting, an enterprise cloud contract with training off is a defensible position; for patient records, client files, and anything a regulator can ask about, a promise is weaker than a boundary.

So read your vendor's data terms this week and note which tier your team actually uses. Then ask one question: if those terms changed tomorrow, would you find out before your data did?

Recent Articles

Why Your AI Needs an Audit Trail

When an auditor asks what your AI saw and did, you need a record, not a recollection. What a real AI audit trail captures and why on-prem logs are complete.

How Long Your AI Keeps Your Data

Delete does not mean deleted. Cloud AI retention is vendor policy plus court orders, as ChatGPT users learned. The retention you can actually enforce.

Your Company Is Not One Trust Domain

Private AI keeps outsiders out but lets the wrong colleague in. Internal data sovereignty -- zero trust where AI actually retrieves -- is the wall it's missing.

Least Privilege as System Architecture

Least privilege is usually a policy people break in practice. How channels, default-deny, and per-user overrides move it into the architecture instead.

Where Your Voice Data Actually Goes

Dictation feels local, but most tools ship your audio to a server you never see. Where cloud voice goes, and why local processing closes the hole.

How HIPAA Mode Works in FactoryOS

HIPAA Mode flips FactoryOS into compliance posture with one switch -- more logging, 2FA required, external APIs locked down, settings frozen until unlocked.

Attorney Client Privilege and AI Tools

Privilege survives only while a matter stays confidential. Cloud AI is structurally a third party, which makes its architecture a duty-of-competence question.

Why HIPAA Compliance Is an Architecture Problem

HIPAA compliance is usually treated as a policy problem. What happens when the architecture makes certain breaches structurally impossible?

Popular Articles

Least Privilege as System Architecture

Least privilege is usually a policy people break in practice. How channels, default-deny, and per-user overrides move it into the architecture instead.

Why HIPAA Compliance Is an Architecture Problem

HIPAA compliance is usually treated as a policy problem. What happens when the architecture makes certain breaches structurally impossible?

Attorney Client Privilege and AI Tools

Privilege survives only while a matter stays confidential. Cloud AI is structurally a third party, which makes its architecture a duty-of-competence question.

Where Your Voice Data Actually Goes

Dictation feels local, but most tools ship your audio to a server you never see. Where cloud voice goes, and why local processing closes the hole.

How HIPAA Mode Works in FactoryOS

HIPAA Mode flips FactoryOS into compliance posture with one switch -- more logging, 2FA required, external APIs locked down, settings frozen until unlocked.

Your Company Is Not One Trust Domain

Private AI keeps outsiders out but lets the wrong colleague in. Internal data sovereignty -- zero trust where AI actually retrieves -- is the wall it's missing.

Why Your AI Needs an Audit Trail

When an auditor asks what your AI saw and did, you need a record, not a recollection. What a real AI audit trail captures and why on-prem logs are complete.

How Long Your AI Keeps Your Data

Delete does not mean deleted. Cloud AI retention is vendor policy plus court orders, as ChatGPT users learned. The retention you can actually enforce.

Other Categories