Packt

US · packtpub.com

Why Most AI Systems Fail After Deployment + Google’s Big Bet on Agentic Enterprise AI

The Operational Crisis in AI Systems — And How Google, AWS & NVIDIA Are Rebuilding the Stack

This email was sent

May 27, 2026 9:05am EDT

Is this your brand on Milled? Claim it.

Matte tone:

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Why Most AI Systems Fail After Deployment + Google’s Big Bet on Agentic Enterprise AI

The Operational Crisis in AI Systems — And How Google, AWS & NVIDIA Are Rebuilding the Stack

Merlyn Shelley

May 27

READ IN APP

Build Reliable GenAI Applications with AI Evals, Observability & Testing

Welcome to DataPro #173 👋

As we embark on this week’s edition, we want to extend a special welcome to all our new readers joining us from the AI Skills Conference organized by Community Sprints. We’re excited to have you as part of the Packt DataPro community — one of the industry’s largest and fastest-growing Data Science, Machine Learning, and AI newsletters with 125K+ data professionals worldwide.

At DataPro, we go beyond headlines. Every week, we break down the real operational bottlenecks shaping modern AI systems, explore expert-led solutions, and highlight the most important advancements across Data Science, ML engineering, GenAI, MLOps, and enterprise AI infrastructure.

As part of the Packt ecosystem, you’ll also gain access to exclusive perks, expert sessions, community connections, research-driven insights, curated learning resources, and future goodies designed specifically for AI practitioners, data scientists, engineers, and decision-makers. So, stay tuned — there’s a lot ahead.

This week’s edition also features an insightful expert-led deep dive from AI Engineer Esther Guru, who explains why most ML systems fail in production and what it actually takes to build reliable, scalable machine learning infrastructure beyond model training. From data drift and monitoring to operational reliability and deployment trade-offs, the session delivers practical lessons every ML engineer should understand before shipping AI systems into the real world.

Alongside that, this week’s research and engineering roundup covers some of the most important developments shaping the future of AI infrastructure and agentic systems:

Stability AI’s Stable Audio 3, introducing scalable latent diffusion for long-form audio generation
Google’s Agentic Enterprise vision, featuring Gemini 3.5, Gemini Omni, Antigravity, and Gemini Spark
OmniVoice Studio, an open-source local alternative to ElevenLabs for voice AI workflows
Google Agent Executor, a new distributed runtime for reliable long-running AI agents
NVIDIA’s Gated DeltaNet-2, improving long-context reasoning in linear attention architectures
Google AI Edge Portal, enabling on-device LLM benchmarking across 120+ Android devices
AWS Bedrock AgentCore, powering scalable multi-agent orchestration, observability, and agentic commerce infrastructure

From multimodal AI and distributed agents to edge inference, voice AI, and production ML systems, this edition captures the accelerating shift toward scalable, operational AI engineering.

Let’s dive in. 🚀

Before we get into this week’s developments, join Amy Chen and Sujeet Mishra this weekend for an intensive hands-on workshop on building reliable GenAI systems with AI evaluations, observability, testing, and production-grade workflows. Learn how leading AI teams debug hallucinations, evaluate prompts and agents, monitor LLM systems in production, and build scalable evaluation pipelines for real-world AI applications.

If you’re building RAG systems, copilots, AI agents, or enterprise GenAI products, this session will provide practical frameworks and workflows you can immediately apply to improve reliability, performance, and shipping confidence.

⚡ Few seats left — claim your 15% discount by registering through the link below (discount already enabled).

Cheers,

Merlyn Shelley,

Growth Lead, Packt.

This Week’s Sponsor

Grow your Mac app with Setapp

Get up to 30K unique impressions right after launch while Setapp handles distribution, billing, licensing, taxes, and customer support.

You build great software. Setapp helps you grow revenue and reach the right users faster.

Join Setapp

Why Most ML Projects Fail in the Real World

AI engineer Esther Guru explains what it really takes to build reliable ML systems beyond just training models.

Beyond the Model: Building Real World Machine Learning Systems | From Training to Deployment — YouTube

Machine learning has become one of the defining technologies of modern software. From recommendation engines and fraud detection systems to healthcare applications and financial forecasting, ML is now deeply embedded into the products people use every day.

But despite all the excitement around artificial intelligence, the reality of deploying machine learning systems is far more complicated than most organizations expect.

Many machine learning projects never make it into production. Others reach deployment but slowly fail over time because the real world changes faster than the systems behind them. In many cases, companies spend enormous amounts of money building sophisticated models only to discover that the actual problem was never about the model itself.

That was the focus of a recent Packt DataML Talk hosted by Abhishek Kaushik, where AI engineer and electrical engineer Esther Guru explored the realities of engineering machine learning systems for production.

Instead of focusing only on algorithms, Esther explained how real-world ML systems behave outside research environments, why production systems fail, and what engineers need to understand before deploying machine learning at scale.

The session offered an important reminder:

Machine learning in production is not just about building models. It is about building systems.

When Should You Use Machine Learning?

One of the biggest mistakes organizations make is assuming that every problem requires AI.

According to Esther, machine learning should only be used when a problem satisfies a few important conditions.

First, the system must actually involve learning. If a problem can already be solved through a fixed equation or a simple rule-based system, machine learning may only add unnecessary complexity.

For example:

Ohm’s Law already provides a direct mathematical relationship between voltage, resistance, and current. There is no need for a neural network to solve such a problem.

Similarly, if a sports website simply wants to display the top-performing football players based on statistics, a sorting query is enough. Building a machine learning model for that task would be excessive.

Machine learning becomes useful when relationships between variables are too complex for traditional methods.

Problems such as predicting customer churn, forecasting housing prices, detecting fraud, or predicting the outcome of sports tournaments involve large numbers of interacting variables. These are the kinds of problems where ML systems become valuable.

Esther described machine learning as:

The process of learning complex patterns from existing data and using those patterns to make predictions on unseen data.

That definition highlights the most important requirement for ML systems: the ability to generalize.

A successful model is not one that memorizes training data. It is one that performs reliably on new data it has never seen before.

Research ML vs Production ML

One of the most valuable parts of the session was Esther’s explanation of the difference between research machine learning and production machine learning.

Many engineers assume that if a model performs well during experimentation, it is ready for deployment.

In reality, research ML and production ML operate under completely different conditions.

In research environments, engineers typically work with clean datasets, static files, and controlled experiments. Most datasets are already prepared and structured. Training happens in predictable conditions.

Production environments are very different.

Real-world data is messy, incomplete, noisy, and constantly changing. Data pipelines may fail. User behavior shifts. External events change the environment the model operates in.

This difference becomes especially important when discussing objectives.

In research, the main goal is usually simple: build the most accurate model possible.

In production, however, multiple teams have competing priorities.

An ML engineer may want a highly sophisticated deep learning system with maximum accuracy. Meanwhile, the business team may care more about infrastructure costs, scalability, and speed.

Production systems also prioritize latency.

Users expect predictions instantly. If a recommendation engine or AI assistant takes too long to respond, users abandon the platform.

This means production systems must optimize not only for accuracy, but also for:

speed
scalability
cost efficiency
reliability
infrastructure performance

This is why production engineering matters just as much as model design.

Why Most ML Projects Fail

One of the most striking insights from the Packt Talk was how frequently ML projects fail.

Esther referenced industry findings showing that a large percentage of machine learning initiatives never even reach production. Even among deployed systems, many degrade significantly within months.

There are several reasons for this.

Poor Problem Framing

Many organizations begin AI projects without clearly defining the actual business problem.

Teams often say things like:

“We want to use AI.”
“We need a machine learning strategy.”
“Can we automate this?”

But vague business goals lead to vague ML objectives.

Without clear alignment between technical teams and business requirements, organizations end up building systems that never deliver meaningful value.

According to Esther, successful ML projects begin with proper problem framing.

Teams must understand:

whether ML is even necessary
what success actually looks like
which business metrics matter most
what constraints exist in production

Without this clarity, even technically strong models can fail.

Catch the complete story and technical insights on Packt’s Medium handle.

Data Science & ML Research Roundup

◾Stability AI Releases Stable Audio 3: A Family of Fast Latent Diffusion Models for Audio Generation and Editing: Stability AI has open-sourced Stable Audio 3, a new family of latent diffusion models for generating and editing stereo audio at 44.1 kHz. The release includes scalable models, variable-length generation, fast inference, inpainting, and a novel SAME autoencoder with 4096× compression. SA3 supports music and SFX creation, achieves strong benchmark scores, and enables long-form audio generation efficiently on consumer hardware.

◾Innovations from Google I/O 26 on Google Cloud: At Google I/O 2026, Google expanded its vision for the “Agentic Enterprise,” where AI agents move beyond chatbots to autonomously execute workflows across apps, data systems, and developer environments. The announcement introduced Gemini 3.5 Flash for advanced reasoning and coding, Gemini Omni for multimodal video generation, Antigravity for enterprise-scale agent orchestration, and Gemini Spark, a personal AI work agent integrated across Workspace and business tools. Together, these launches signal Google’s push toward AI-native productivity, software development, and enterprise automation.

◾Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs. OmniVoice Studio is emerging as a compelling open-source alternative to ElevenLabs, offering fully local voice AI capabilities without subscriptions or cloud processing. The desktop app supports voice cloning, video dubbing, diarization, dictation, and multilingual TTS across 646 languages. Built with FastAPI, WhisperX, Demucs, and Pyannote, it also includes an MCP server for integrations with Claude, Cursor, and custom AI workflows.

◾Agent Executor, Google’s distributed Agent Runtime: Google has open-sourced Agent Executor, a runtime standard designed to make long-running AI agents more reliable, scalable, and production-ready. Built for the emerging “agentic enterprise,” it introduces durable execution, secure sandboxing, session consistency, trajectory branching, and distributed deployment support. Combined with Agent Substrate for Kubernetes-scale orchestration, Google is positioning Agent Executor as foundational infrastructure for running millions of enterprise AI agents across hybrid environments.

◾NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule. NVIDIA has introduced Gated DeltaNet-2, a new linear attention architecture designed to improve long-context memory handling in AI models. By separating memory “erase” and “write” operations into independent channel-wise gates, the model achieves stronger retrieval accuracy and reasoning performance than Mamba-2, KDA, and prior DeltaNet variants. Trained at 1.3B parameters on 100B tokens, it also maintains efficient linear-time scaling for long-running sequence processing.

◾Benchmark LLMs on-device with AI Edge Portal: Google has expanded AI Edge Portal with new tools for benchmarking and debugging on-device LLMs across more than 120 Android device types. The update helps developers optimize latency, memory usage, and inference performance for mobile AI workloads using metrics like decode speed and initialization time. Google also introduced Model Explorer, a visualization tool for analyzing model graphs, quantization issues, and hardware compatibility in edge AI deployments.

◾Vibe-coded AI Studio apps with Firestore, Firebase, Cloud SQL: Google is expanding AI Studio into a full-stack “vibe coding” platform, enabling developers to build and deploy AI-powered applications directly to Google Cloud without needing billing setup or infrastructure management. The update adds support for Cloud SQL, Firestore, Firebase Auth, and Workspace integrations, while AI agents can now automatically provision databases, generate schemas, configure authentication, and deploy apps through natural language prompts.

◾Technical deep dive: AgentCore payments and innovation in agentic commerce. Amazon has introduced Bedrock AgentCore Payments, a managed infrastructure layer that enables AI agents to autonomously make secure microtransactions for APIs, MCPs, and digital services. Designed for the emerging “agentic economy,” the platform handles payment orchestration, wallet security, stablecoin support, spending guardrails, and observability through a single API. The release positions AgentCore as foundational infrastructure for scalable, enterprise-grade autonomous AI commerce.

◾Build highly scalable serverless LangGraph multi-agent systems in AWS with Amazon Bedrock AgentCore: AWS has introduced a serverless multi-agent AI architecture that combines LangGraph orchestration with Amazon Bedrock AgentCore Memory and Observability. The framework uses AWS Lambda, Step Functions, and Bedrock to build scalable AI workflows with persistent memory, real-time telemetry, and parallel agent coordination. Demonstrated through a marketing campaign review system, the solution highlights how enterprises can operationalize reliable, observable, and production-ready multi-agent AI systems on AWS.

◾Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore. AWS and NVIDIA have introduced a production-ready multi-agent AI architecture that combines NVIDIA NIM for GPU-accelerated inference, Strands Agents for orchestration, and Amazon Bedrock AgentCore for memory and observability. Designed for scalable enterprise AI systems, the framework supports parallel agent execution, persistent context, real-time monitoring, and serverless deployment, enabling organizations to build reliable, high-performance generative AI workflows at scale.

See you next time!

You're currently a free subscriber to Packt DataPro. For the full experience, upgrade your subscription.

Upgrade to paid

Comment

Restack

Why Most AI Systems Fail After Deployment + Google’s Big Bet on Agentic Enterprise AI

The Operational Crisis in AI Systems — And How Google, AWS & NVIDIA Are Rebuilding the Stack

Why Most AI Systems Fail After Deployment + Google’s Big Bet on Agentic Enterprise AI

The Operational Crisis in AI Systems — And How Google, AWS & NVIDIA Are Rebuilding the Stack

This Week’s Sponsor

Grow your Mac app with Setapp

Why Most ML Projects Fail in the Real World

AI engineer Esther Guru explains what it really takes to build reliable ML systems beyond just training models.

When Should You Use Machine Learning?

Research ML vs Production ML

Why Most ML Projects Fail

Poor Problem Framing

Data Science & ML Research Roundup

Recent emails from Packt