You Installed Claude. Now the Hard Work Starts

If you only have a minute, here is what you need to know.

The contract with Anthropic, OpenAI, or Google is the easiest part of enterprise AI deployment. Everything that matters comes after, and most IT organizations are not ready for it.
Shadow AI is already running inside your organization. The fix is not a ban. It is deploying a sanctioned alternative with proper controls before you enforce the policy.
Your data classification policy does not answer "can this data go to a cloud AI API?" You need to add that dimension explicitly, including a per-vendor DPA registry that tells every team which providers are cleared for which data tiers.
An API key in a .env file is not access control. The fix is an AI gateway layer (LiteLLM, Portkey, or Azure API Management) that enforces SSO, logs every interaction, and routes spend to the right cost center.
AI spend does not behave like cloud compute. The fix is per-team token budgets, model routing policies, and hard guardrails on any agentic workflow that can loop without a human checkpoint.
Every control described here is deployable in weeks, not quarters. The question is whether you do it before or after the incident.

In early 2023, a large global electronics manufacturer lifted an internal ban on AI tools after engineers pushed back. Nineteen days later, three separate data leaks had been reported internally. Source code. Proprietary testing procedures. The transcript of a confidential meeting, recorded and submitted to the AI to generate notes.

None of those incidents involved a compromised system or a bad actor. They involved employees using a tool that made their work easier, the way employees always have.

The enterprise AI license is the easy part. Procurement signs the agreement. IT provisions the keys. Leadership sends the announcement. What that process almost never covers: who on your network is already using AI outside your agreement, what data is moving to models you did not select, whether your compliance posture extends to cloud AI APIs, and what your finance team will say when the token bill arrives.

This is not an argument against deploying AI. It is an argument for understanding what "deployed" actually means inside a real organization, before the problems arrive rather than after.

Shadow AI is already in your building. Here is how to address it.

That sequence is instructive because it is documented in detail across Bloomberg, TechCrunch, and CNBC, and cataloged in the AI Incident Database. Most organizations are living the same pattern with less clarity about what is happening.

Netskope, LayerX, and Harmonic Security have each independently found significant portions of enterprise AI usage flowing through personal, unmanaged accounts. Harmonic analyzed more than 22 million enterprise AI prompts in 2025. The data categories appearing in those prompts: source code, intellectual property, credentials, regulated data. None of it covered by enterprise vendor agreements. No data processing agreements, no data residency controls, no mechanism to delete what has been sent.

72% of security professionals are neutral or not confident in their organization's ability to secure AI systems. Only 26% have comprehensive AI security governance in place.

— CSA / Google Cloud, State of AI Security and Governance, December 2025 (n=300)

Your existing DLP controls do not catch this because they were not built for it. Traditional data loss prevention intercepts outbound files and flagged email attachments. It was not designed to catch an employee who opens a browser tab, signs into a personal Google account, and pastes a contract draft into a chat interface. The action looks like ordinary web browsing. The loss is real.

What to actually do

The worst response to shadow AI is a ban with no alternative. Bans do not stop the behavior. They push it off your network and out of your visibility. The sequence that works: sanction a managed option first, then enforce the policy.

Give employees a sanctioned tool before you restrict the unsanctioned ones. Anthropic's Claude for Enterprise, OpenAI's ChatGPT Enterprise, and Google's Gemini for Workspace all provide tenant-isolated deployments with SSO, audit logs, and data contractually excluded from training. These are not premium upsells. They are what makes the tool safe to use at work. Most employees will use the approved option if one exists.

Then add network-level visibility. Microsoft Defender for Cloud Apps, Netskope, and Zscaler all have AI application categories that let you detect, classify, and policy-gate AI traffic. Start in monitor mode. Understand the actual usage pattern before you tighten controls. For organizations where SSL inspection is not feasible, Island Enterprise Browser and LayerX both enforce controls at the browser layer without needing to intercept TLS.

Update your DLP rules for AI-specific vectors. Microsoft Purview can detect sensitive content being pasted or uploaded to third-party AI services, including ChatGPT and Claude, via endpoint DLP with the Purview browser extension. The goal is detection first, enforcement second, with an approved alternative already in place.

The policy conversation changes completely when employees have a sanctioned option. "Use the company-approved Claude account" is a reasonable ask. "Do not use AI at all" is a ban that will fail quietly.

Your data classification policy predates the model. Here is how to extend it.

Every enterprise has a data classification framework: Public, Internal, Confidential, Restricted. It was built for a world where data moved through email and file shares, and access was controlled by who could authenticate to those systems.

That framework does not answer the question every employee is asking when they open an AI interface: "Can I put this here?"

The question your policy needs to answer is not who can access this data. It is where this data is permitted to go.

These are different questions because cloud AI APIs have a different risk profile than the systems your policy was written for. The things that matter now: which provider is receiving the data, whether they have signed a DPA and what it actually covers, where inference happens, and whether your inputs are excluded from training. A policy built for file-share access does not have answers for any of those.

IBM's 2025 Cost of a Data Breach report (Ponemon Institute, 600 breached organizations) found that 63% had no AI governance policy in place or were still developing one at the time of their breach. Not companies that ignored AI. Companies that moved faster than their governance.

What to actually do

Add an AI column to your classification matrix. For each existing data tier, define three explicit answers: consumer AI chat, enterprise AI chat with a DPA, and API or agentic integrations with a DPA.

Data classification matrix showing which data tier is permitted in each AI context: consumer chat, enterprise chat with a DPA, and API or agentic integrations

This does not have to be perfect on day one. The value is replacing individual judgment calls with an actual policy.

Build a DPA inventory. For every AI provider your organization uses, track: Is a DPA signed? What data residency does it cover? Is training excluded? What is the retention period? The practical answer for the major providers: Anthropic Enterprise, OpenAI Enterprise, Google Gemini for Workspace, and Azure OpenAI all provide DPAs and training exclusion. Consumer tiers of the same products typically do not. That single distinction resolves most of the classification question.

For HIPAA, FedRAMP, or financial services data, tenant-isolated inference is not optional. Azure OpenAI processes within your Azure tenant and is covered under Microsoft's HIPAA BAA. Anthropic signs a BAA for qualifying Enterprise and API customers. If your regulatory requirement is strict enough, locally-hosted open-weight models via Azure AI or AWS Bedrock may be the only path. Know which category your regulated data falls into before a use case makes that decision for you.

Identity without audit trails is theater. Here is how to build the instrumentation.

The dominant enterprise AI deployment pattern: a team gets an API key, drops it in a .env file, and starts building. Requests go out, responses come back, work gets done.

Who made each request? What data did they send? What did the model return? What use case was that key provisioned for? Could you produce that information for a compliance audit tomorrow?

Most organizations cannot.

97% of organizations that suffered AI-related breaches lacked proper AI access controls.

Among those with formal AI policies, only 34% conducted regular audits for unsanctioned AI usage. Only 22% performed adversarial testing of their AI systems.

— IBM 2025 Cost of a Data Breach Report, Ponemon Institute (600 breached organizations)

What to actually do

Put an AI gateway between your teams and the model APIs. Instead of every team holding keys directly to Anthropic, OpenAI, and Google, route all model traffic through a central gateway that handles authentication, logging, rate limiting, and cost attribution.

Three options worth knowing:

LiteLLM Proxy is open-source and self-hosted. The core proxy is free. Full RBAC, SSO integration, and audit logging are enterprise-tier features, sold separately. Most flexible for organizations that want to own their infrastructure and are not paying for a SaaS layer.

Portkey offers a managed cloud tier and a fully open-source self-hosted option (Apache 2.0 as of March 2026). Acquired by Palo Alto Networks in April 2026, which strengthens its enterprise security credentials. Observability, RBAC, SOC 2, and HIPAA coverage are on the managed tier.

Azure API Management works well as an LLM gateway for organizations already on Azure. Entra ID handles authentication natively, and logs go directly into Azure Monitor and Microsoft Sentinel.

Any of these means the control surface shifts from a shared API key to an authenticated user session tied to your identity provider. That is the change that makes auditing possible.

For AI tools employees use directly, use the enterprise tiers with SSO. Claude Enterprise supports SAML and SCIM. ChatGPT Enterprise supports SAML. Gemini for Workspace inherits Google Workspace identity. Every chat session gets attributed to a real user in your directory, and access is governed by the same lifecycle policies as your other enterprise software.

Define your logging minimum. At a minimum: user identity, timestamp, model, token counts, use-case tag, and whether the response was accepted or regenerated. For regulated environments, add prompt content hashing or full capture depending on your retention policy. Those logs need to land in your SIEM, not in a vendor dashboard you might lose access to.

Assign a service account per application. One key for the HR summarization tool, a separate one for the code review tool. A compromised key scoped to a known use case is manageable. A compromised key with access to the organization's entire model budget is not. Rotate keys automatically through Azure Key Vault, AWS Secrets Manager, or HashiCorp Vault.

The difference between an AI deployment and an auditable one is a gateway and a log pipeline. Both are deployable in a week without changing how your teams build.

The token bill that blindsided finance. Here is how to govern it.

Cloud compute has a cost model your FinOps team understands. Provision a VM and you pay for uptime. Over-provision and you waste money proportionally. Visible, bounded, predictable.

AI API spend does not work that way.

An LLM API charges per token, not per hour. Input and output tokens have different rates. Different models differ by an order of magnitude on the same task. An agent that calls a tool, encounters an error, retries, calls a second tool, and loops through its reasoning before producing output may consume twenty times the tokens of a direct query. A workflow processing documents with large contexts can produce a bill that looks like nothing in your existing cloud cost model.

I covered the mechanics in my earlier piece on tokenomics. The pattern is consistent: a team deploys AI to a hundred developers or knowledge workers, nobody establishes cost awareness, and three months later finance flags a line item that has gone from $5,000 a month to $85,000. Nobody can explain it because nobody was watching.

A hard spend cap is the wrong response. A cap records what you spent. It says nothing about what you got. It also pushes developers to personal accounts to keep working, which recreates the shadow AI problem you were trying to solve. The right response is visibility first, then governance.

What to actually do

Start with tagging. Every API call should carry a cost center tag, a team tag, and a use-case tag. Azure OpenAI and AWS Bedrock both support cost allocation tags that surface in cloud billing dashboards. Run your traffic through LiteLLM and that metadata flows through to whatever observability system you connect it to. You cannot govern what you cannot see.

Implement a model routing policy. A classification task, a structured extraction, or a summarization of a short document does not need a frontier model. Route those to Claude Haiku, GPT-4o-mini, or a comparable small model. Industry data consistently shows 50-80% cost reduction when routing is applied across mixed workloads. The savings come from matching model capability to actual task complexity, not from cutting corners on things that need real reasoning. Define the tiers and enforce them.

Set alerts before limits. Azure Cost Management, AWS Budgets, and LiteLLM's budget manager all support threshold alerting. Configure alerts at 50%, 75%, and 90% of the monthly budget. Alert the team lead, not just finance. The goal is giving teams the information to adjust before a hard stop triggers a production incident.

Put guardrails on every agentic workflow. An agent that can call tools, search, and retry spends tokens in ways a single-turn chat never does. Every agentic workflow needs a maximum step count, a per-run token budget, and a human checkpoint for high-cost or irreversible actions. Without those, a single looping agent can run up thousands of dollars in hours. With them, the worst case is a graceful failure and a logged cost-limit exception.

The FinOps Foundation's AI working group has published a governance framework for exactly this problem. Its core metric: cost per outcome versus the cost of doing the same work without AI. That is the number that tells you whether you are overspending or, more often than people expect, significantly underspending on something that is delivering real value.

Five gates before broad rollout

None of the four problems above require stopping your AI program. They require sequencing it correctly.

Work through each gate honestly. Not "we have a policy" but "we have a policy, it is enforced, and we can demonstrate it." The gap between those two answers is where incidents happen.

A broad rollout that passes all five gates is not a slower rollout. It is the rollout that does not get reversed.

Before you flip the switch

The pressure to move fast is real. Boards are asking about AI. Competitors appear to be deploying it.

The risk of deploying without governance is also real, and it behaves differently from the risks your security team is used to managing. Shadow AI incidents do not announce themselves. Data does not come back from a provider's servers with a notification that it left. Token bills arrive after the spend. Compliance gaps show up in audits, not dashboards.

Every control in this article is available today. Most of it comes from vendors you already have agreements with. Microsoft Purview, Defender for Cloud Apps, Azure API Management, and Azure OpenAI address most of the enterprise surface without introducing new vendors. LiteLLM is deployable in a day. None of this takes months.

The license was easy. The controls are the work.

Matthew Kruczek is Managing Director at EY, leading Microsoft domain initiatives within Digital Engineering. Connect with Matthew on LinkedIn to discuss AI governance and enterprise deployment strategy for your organization.

References

CSA / Google Cloud. "State of AI Security and Governance Report." December 2025. (n=300) cloudsecurityalliance.org
IBM / Ponemon Institute. "Cost of a Data Breach Report 2025." July 30, 2025. (600 breached organizations) ibm.com
Bloomberg, TechCrunch, CNBC. Enterprise ChatGPT data leak coverage. May 2, 2023.
AI Incident Database. Incident #768: Enterprise ChatGPT Data Leaks. incidentdatabase.ai
Harmonic Security. "Enterprise AI Prompt Risk Analysis." 2025. infosecurity-magazine.com
Netskope. "Cloud and Threat Report 2025/2026."
FinOps Foundation. "FinOps for AI Tools and Services Considerations." finops.org