compliance privacy

Does Your AI Vendor Train on Your Data

Whether your AI vendor trains on your data depends almost entirely on which door you walked in through. AI vendor training data policies split sharply between consumer tiers and business contracts, and most small firms are standing on the consumer side without realizing it.

The honest answer is more nuanced than either the marketing or the alarm suggests. Here is what actually happens to the text your team pastes into cloud AI.

Consumer Tiers Train by Default

OpenAI may use conversations from consumer ChatGPT to train its models unless you opt out in settings -- on the individual tier, your content is training material by default.¹ That is the tier your staff reaches for when nobody has issued them anything better.

Anthropic draws the line differently, tying consumer training to a model-improvement setting, while conversations flagged for safety review can be used to improve its safeguards either way.² Every consumer product handles this a little differently, which is exactly the problem: your exposure depends on a toggle your employees have probably never seen.

Enterprise Terms Are Genuinely Better

Enterprise and API terms are genuinely better, and it would be unfair to pretend otherwise. OpenAI does not train on inputs or outputs from its business products by default, and Anthropic makes the same commitment for its commercial offerings.^{1, 2}

If your team uses cloud AI at all, a business contract is the floor, not a luxury. The real question is what a no-training promise does and does not cover.

Training Is Not the Only Exposure

A no-training clause does not mean your data never sits on the vendor's systems. Deleted chats and API logs are typically retained for up to 30 days, content flagged by safety systems can be pulled for human review, and every deletion schedule carries the same exception: unless legally required to retain it.^{2, 3}

Your prompt can be trained on nothing and still be stored, still be readable, and still be discoverable. For a compliance officer those are three separate questions, and the training toggle answers only the first -- the same split governs where your voice recordings end up.

Courts Can Override Deletion Promises

That legal exception stopped being theoretical in May 2025, when a federal magistrate judge in The New York Times' copyright suit ordered OpenAI to preserve consumer ChatGPT and API output logs that would otherwise have been deleted -- including chats users had erased expecting them gone within 30 days.³

The district court affirmed the order that June; the going-forward obligation ended in September, but the data already preserved stayed preserved, and in November the court ordered 20 million de-identified consumer chats produced to the plaintiffs.⁴ Enterprise and zero-data-retention customers were excluded, which proves the point both ways: the contract tier mattered, and the vendor's own deletion policy did not.

You Inherit Every Terms Revision

The policy you evaluated at signing is not the policy you will be under next year. Cloud AI terms get revised, settings get renamed and reshuffled, and each revision applies to your data going forward whether or not anyone read the notice email.

That standing review burden is part of what renting your AI actually costs. Someone in your organization now owns the job of re-reading a vendor's data terms forever.

Architecture Beats Vendor Promises

For regulated data, the cleanest answer is architectural: data that never leaves the building needs no vendor promise at all. It is the same logic that makes HIPAA compliance an architecture problem rather than a paperwork one, and the reason privilege-sensitive work sits so uneasily with third-party processing.

An on-premise system like FactoryOS removes the vendor from the data path entirely -- there is no training policy to audit because there is no third party in the path. No retention window, no flagged-content review, no preservation order can reach data that never crossed your walls.

Match the Tier to the Stakes

The fair conclusion is proportionality, not panic. For low-stakes drafting, an enterprise cloud contract with training off is a defensible position; for patient records, client files, and anything a regulator can ask about, a promise is weaker than a boundary.

So read your vendor's data terms this week and note which tier your team actually uses. Then ask one question: if those terms changed tomorrow, would you find out before your data did?

Sources

OpenAI, "How your data is used to improve model performance" -- consumer ChatGPT content may be used for training unless you opt out; business products (ChatGPT Team, Enterprise, API) are not trained on by default. https://openai.com/policies/how-your-data-is-used-to-improve-model-performance/
Anthropic, "Is my data used for model training?" -- consumer training tied to the Model Improvement setting; safety-flagged conversations may be used to improve safeguards; commercial products handled under separate terms. https://privacy.claude.com/en/articles/10023580-is-my-data-used-for-model-training
OpenAI, "How we're responding to The New York Times' data demands in order to protect user privacy" -- preservation order scope, 30-day deletion norms, legal-hold storage, Enterprise/Edu/ZDR exclusions. https://openai.com/index/response-to-nyt-data-demands/
Terms.Law, "OpenAI v. New York Times stopped being just a copyright case the moment the court turned to your ChatGPT logs," November 12, 2025 -- order timeline: issued May 13, 2025, going-forward obligation ended September 26, 2025, production of 20 million de-identified chats ordered November 2025. https://www.terms.law/2025/11/12/openai-v-new-york-times-stopped-being-just-a-copyright-case-the-moment-the-court-turned-to-your-chatgpt-logs/

Does Your AI Vendor Train on Your Data

Consumer Tiers Train by Default

Enterprise Terms Are Genuinely Better

Training Is Not the Only Exposure

Courts Can Override Deletion Promises

You Inherit Every Terms Revision

Architecture Beats Vendor Promises

Match the Tier to the Stakes

Recent Articles

Why Your AI Needs an Audit Trail

How Long Your AI Keeps Your Data

Your Company Is Not One Trust Domain

Least Privilege as System Architecture

Where Your Voice Data Actually Goes

How HIPAA Mode Works in FactoryOS

Attorney Client Privilege and AI Tools

Why HIPAA Compliance Is an Architecture Problem

Popular Articles

Least Privilege as System Architecture

Your Company Is Not One Trust Domain

Where Your Voice Data Actually Goes

Why HIPAA Compliance Is an Architecture Problem

How HIPAA Mode Works in FactoryOS

How Long Your AI Keeps Your Data

Attorney Client Privilege and AI Tools

Why Your AI Needs an Audit Trail

Other Categories

Consumer Tiers Train by Default

Enterprise Terms Are Genuinely Better

Training Is Not the Only Exposure

Courts Can Override Deletion Promises

You Inherit Every Terms Revision

Architecture Beats Vendor Promises

Match the Tier to the Stakes

Get the newsletter

Recent Articles

Why Your AI Needs an Audit Trail

How Long Your AI Keeps Your Data

Your Company Is Not One Trust Domain

Least Privilege as System Architecture

Where Your Voice Data Actually Goes

How HIPAA Mode Works in FactoryOS

Attorney Client Privilege and AI Tools

Why HIPAA Compliance Is an Architecture Problem

Popular Articles

Least Privilege as System Architecture

Your Company Is Not One Trust Domain

Where Your Voice Data Actually Goes

Why HIPAA Compliance Is an Architecture Problem

How HIPAA Mode Works in FactoryOS

How Long Your AI Keeps Your Data

Attorney Client Privilege and AI Tools

Why Your AI Needs an Audit Trail

Other Categories