Three in four enterprise AI agents have been rolled back. Here's why.
Last week we wrote about how the boundaries between marketer, agency, platform, publisher, retailer, and AI company are dissolving. This week the key players continued with more announcements. Google rolled out five new AI ad formats inside AI Mode. OpenAI added customizable ad units in ChatGPT. Canva completed its rollout across all four major AI assistants. Publicis acquired LiveRamp for $2.2B. Google's AI Overviews are now cutting publisher traffic 33% globally.
The other half of this issue is what production looks like after the demos. Three in four enterprises running AI customer-comms agents have rolled at least one back. The most-governed companies fail more, not less. In the lead piece we take apart what's actually breaking. In Policy Corner, our new AI Policy Lead unpacks the federal ruling that made consumer AI chats discoverable and what it means for your agency contracts. And in Cost Watch we tee up next week's deep dive with one number: AI tokens now cost more than the human employees they were meant to replace in some Microsoft workflows.
Let's get into it.
— Vas
This Week's Signals
AI & Ad Surfaces
Google launches conversational ad units inside AI Mode. At Marketing Live (May 20), Google unveiled five AI ad formats: Conversational Discovery ads with Gemini-generated creative tailored to each query; Highlighted Answers inside AI Mode recommendation lists; AI-powered Shopping ads where Gemini writes custom product explainers; Business Agent for Leads, a Gemini chatbot embedded directly in ads; and Direct Offers surfacing inside AI Mode responses. The buying surfaces are AI Max and Performance Max. The ad unit itself is changing shape, not just the surface it shows up on. (Google)
OpenAI ships larger ChatGPT ad formats with custom CTAs. OpenAI is testing a new ChatGPT ad unit with larger imagery and advertiser-personalized CTAs (shop now, book now, sign up, learn more). E-commerce ads get a dedicated portrait or landscape format showing pricing and customer reviews. OpenAI is running this directly, not through StackAdapt or Kargo. Placement mechanics inside the chat surface weren't disclosed. (Digiday)
Seven in ten consumers say they can spot an AI-generated ad. 70% report they can usually tell, describing AI ads as "missing its soul." 74% are more likely to buy from an ad they believe was made entirely by humans. 69% worry advertising will become "a sea of AI slop." Acceptance is conditional: 68% are fine with AI when it makes ads more helpful or relevant, and Gen Z and Millennials care more about overall vibe (70%) than whether AI was used. From Canva's State of Marketing and AI 2026 report. (Martech)
AI & The Open Web
Publisher traffic from Google is in measurable decline. Zero-click searches are now ~60% of all Google queries and 69% for news. Global publisher traffic from Google is down 33% year-on-year through November 2025. Individual publishers report steeper drops: HubSpot 70-80%, Chegg 49%, DMG Media up to 89%. NPR called it an "extinction-level event." Google says AI Overviews drive more clicks, not fewer; Similarweb data doesn't support that claim. (TheNextWeb)
Canva completes its cross-assistant rollout. Canva's new Connected App for Gemini (May 20) follows its earlier integrations with Claude, ChatGPT, and Copilot. The strategic point isn't the Gemini deal. It's that Canva has decided to be infrastructure inside every major AI assistant, not a destination. Ecosystem head Anwar Haneef: "We're making design accessible wherever people start their work." (TheNextWeb)
LinkedIn suppresses AI slop from recommendations. LinkedIn is keeping flagged AI-generated posts visible to direct connections but suppressing them from feed recommendations. Targeted formats include "it's not X, it's Y," engagement bait, recycled thought leadership, and bot comments. LinkedIn claims 94% detection accuracy in early testing. AI-assisted content remains allowed if there's an original idea. VP Product Laura Lorenzetti: "Stop letting AI do all the thinking for you." (TheNextWeb)
Big Picture: What AI Governance Is Measuring Wrong
For two years, the dominant story in enterprise AI has been about getting to production. Pilot purgatory. The chasm. The 70% that never ships. New research closes that chapter and opens a worse one.
Seventy-four percent of enterprises that have deployed AI customer-communications agents on customer-facing channels have been forced to roll one back. Among organizations classified as having "fully mature" AI governance, the rate is higher: 81%.
That second number isn't what it looks like. Mature-governance organizations have audit logs, behavioral monitoring, PII scanning, and customer-trust signals. Less mature ones don't. The 81% is the share of mature-governance orgs that detected a problem worth rolling back. The orgs reporting clean records aren't safer. They have less visibility into what's happening on their own channels.
That's the first mismatch. The dashboard isn't measuring failure rate. It's measuring detection.
The category mismatch
Then there's what is actually failing. Among orgs that had a rollback, the cited causes break down this way.
Two years of AI safety conversation has been about hallucination. RAG was built for it. Constitutional AI is for it. Eval benchmarks measure it. Vendor pitches lead with it.
Production data says the bigger killer is data exposure. Common failure modes include cross-user PII leakage (the agent surfaces one customer's data to another, as in OpenAI's March 2023 ChatGPT incident), unauthorized retrieval (the agent pulls a record the user shouldn't have access to), and session persistence bugs. The discourse focused on whether the model fabricates. The production reality is that the bigger risk is the model retrieving the wrong thing, accurately.
The two failure modes have different solutions, owned by different teams, requiring different budgets. Hallucination lives at the model layer: grounding, retrieval augmentation, output filtering. Data exposure lives at the infrastructure layer: access control, data masking, session isolation, audit trails. The teams aren't the same. The vendors aren't the same. The budgets aren't the same.
The spending mismatch
The same research shows that trust, security, and compliance is the #1 spending category in enterprise AI programs globally. 75% of programs cite it as a top spend area. 63% cite AI development. Governance has the bigger budget.
It also has the visible-failure category wrong, the dashboard metric wrong, and 84% of engineering teams are spending at least half their time building guardrails the platform should provide natively. 35% spend most of their time there instead of on new capability.
What to do
Three changes worth proposing in your next AI risk review:
1. Pair every incident metric with a detection-coverage metric. "Number of rollbacks" is uninterpretable without "share of agent behavior we can monitor."
2. Reframe the threat model around data isolation, not content quality. Vendor evaluations need to ask about access control, session isolation, and audit completeness, not just hallucination rates.
3. Audit how much of your AI engineering time is going to platform-replacement work. If your team is building PII masking, audit trails, and context preservation from scratch, that cost compounds every time you add a channel or use case.
The right next question in any AI risk review isn't whether your governance is mature. It's whether anyone in the room could tell you, right now, what's happening on a customer channel and how fast they'd see it if something broke.
Source: Sinch, "The AI Production Paradox" (2026). Commissioned by Sinch, a CPaaS vendor that sells in the infrastructure layer the research identifies as the top predictor of success. n=2,527 enterprise decision-makers across 10 countries and 6 industries, fielded January 2026 via independent third-party panel. Full report with regional and vertical cuts publishes June 2026.
Policy Corner
The Heppner ruling and your agency contracts
In February 2026, Judge Rakoff (SDNY) ruled in United States v. Heppner that 31 documents a fraud defendant generated with Anthropic's Claude were not protected by attorney-client privilege. The reasoning was narrow: Claude isn't a lawyer, the consumer-tier privacy policy permits disclosure to third parties including government agencies, and the defendant wasn't acting at counsel's direction.
The implication marketers haven't worked through: the NDA you have with your agency may no longer be doing the confidentiality work you assumed it does. When the agency pastes brand strategy into a consumer AI chatbot, the AI provider is not a party to your NDA. The AI provider operates under its own terms, which in the consumer tier typically permit training on user inputs and disclosure under legal process. The agency may be in breach of the NDA. The AI provider isn't bound by anything you signed.
Our new AI Policy Lead unpacks how this reshapes agency contracts, procurement, and data classification, plus three other implications and the regulatory outlook.
Cost Watch
A parallel thread emerged this week.Fortune reported May 22 that Microsoft's own internal data is exposing the production-economics problem. In some workflows, AI tokens now cost more than the human employees they were meant to replace.CNBC on May 20 asked whether current pricing is sustainable enough to support OpenAI and Anthropic IPOs. Anthropic has moved off flat-rate enterprise pricing toward per-token billing this year. Uber reportedly burned through its entire 2026 AI coding tools budget in four months. The Microsoft / Claude Code consolidation below is partly a cost story. We'll examine inference economics in detail next week.
Other News
Microsoft pulls Claude Code from engineers' desks. By June 30, Microsoft's Experiences + Devices group (Windows, M365, Outlook, Teams, Surface) loses internal Claude Code access. Engineers move to GitHub Copilot CLI. Microsoft frames the change as "toolchain unification." Anthropic isn't banned: Claude still runs inside Copilot CLI, only the standalone Claude Code interface goes away. Engineers had been picking Anthropic's tool over Microsoft's own since the December 2025 rollout, which is part of why it's being pulled. (The Verge)
Publicis acquires LiveRamp for $2.2B. LiveRamp CEO Scott Howe has called the company the industry's "Switzerland" for its agnosticism across competing agencies and platforms. The acquisition tests whether that neutrality survives. Industry watchers expect agency clients to resist sharing client and proprietary data with a platform now owned by a holding-company competitor. Worth re-reading your LiveRamp contracts on data portability. (AdWeek)
Starbucks retires NomadGo AI inventory. Starbucks is winding down its NomadGo-built inventory tool in North America. The tablet camera + LiDAR setup failed at distinguishing similar products (oat milk vs dairy was a recurring miss). Starbucks framed it as a move toward "standardization." An internal note quoted an employee: "the thought behind it was great, but the execution was proving difficult." A visible operational AI rollback in the same week as the production-reality data above. (TheNextWeb)
Marketing Embeddings is read by 20,000+ CMOs, CTOs, and media leaders navigating AI's impact on marketing. Forward this to someone who needs to see it.