Does Your AI Vendor Train on Your Data
Whether your AI vendor trains on your data depends almost entirely on which door you walked in through. AI vendor training data policies split sharply between consumer tiers and business contracts, and most small firms are standing on the consumer side without realizing it.
The honest answer is more nuanced than either the marketing or the alarm suggests. Here is what actually happens to the text your team pastes into cloud AI.
Consumer Tiers Train by Default
OpenAI may use conversations from consumer ChatGPT to train its models unless you opt out in settings -- on the individual tier, your content is training material by default.1 That is the tier your staff reaches for when nobody has issued them anything better.
Anthropic draws the line differently, tying consumer training to a model-improvement setting, while conversations flagged for safety review can be used to improve its safeguards either way.2 Every consumer product handles this a little differently, which is exactly the problem: your exposure depends on a toggle your employees have probably never seen.
Enterprise Terms Are Genuinely Better
Enterprise and API terms are genuinely better, and it would be unfair to pretend otherwise. OpenAI does not train on inputs or outputs from its business products by default, and Anthropic makes the same commitment for its commercial offerings.1, 2
If your team uses cloud AI at all, a business contract is the floor, not a luxury. The real question is what a no-training promise does and does not cover.
Training Is Not the Only Exposure
A no-training clause does not mean your data never sits on the vendor's systems. Deleted chats and API logs are typically retained for up to 30 days, content flagged by safety systems can be pulled for human review, and every deletion schedule carries the same exception: unless legally required to retain it.2, 3
Your prompt can be trained on nothing and still be stored, still be readable, and still be discoverable. For a compliance officer those are three separate questions, and the training toggle answers only the first -- the same split governs where your voice recordings end up.
Courts Can Override Deletion Promises
That legal exception stopped being theoretical in May 2025, when a federal magistrate judge in The New York Times' copyright suit ordered OpenAI to preserve consumer ChatGPT and API output logs that would otherwise have been deleted -- including chats users had erased expecting them gone within 30 days.3
The district court affirmed the order that June; the going-forward obligation ended in September, but the data already preserved stayed preserved, and in November the court ordered 20 million de-identified consumer chats produced to the plaintiffs.4 Enterprise and zero-data-retention customers were excluded, which proves the point both ways: the contract tier mattered, and the vendor's own deletion policy did not.
You Inherit Every Terms Revision
The policy you evaluated at signing is not the policy you will be under next year. Cloud AI terms get revised, settings get renamed and reshuffled, and each revision applies to your data going forward whether or not anyone read the notice email.
That standing review burden is part of what renting your AI actually costs. Someone in your organization now owns the job of re-reading a vendor's data terms forever.
Architecture Beats Vendor Promises
For regulated data, the cleanest answer is architectural: data that never leaves the building needs no vendor promise at all. It is the same logic that makes HIPAA compliance an architecture problem rather than a paperwork one, and the reason privilege-sensitive work sits so uneasily with third-party processing.
An on-premise system like FactoryOS removes the vendor from the data path entirely -- there is no training policy to audit because there is no third party in the path. No retention window, no flagged-content review, no preservation order can reach data that never crossed your walls.
Match the Tier to the Stakes
The fair conclusion is proportionality, not panic. For low-stakes drafting, an enterprise cloud contract with training off is a defensible position; for patient records, client files, and anything a regulator can ask about, a promise is weaker than a boundary.
So read your vendor's data terms this week and note which tier your team actually uses. Then ask one question: if those terms changed tomorrow, would you find out before your data did?