Packt

US · packtpub.com

The Blueprint for Production-Ready AI Agents

Plus: 15× faster LLM inference, Google's Python UDFs, speech AI breakthroughs, and next-gen document intelligence.

This email was sent

June 25, 2026 9:35am EDT

Is this your brand on Milled? Claim it.

Matte tone:

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

The Blueprint for Production-Ready AI Agents

Plus: 15× faster LLM inference, Google's Python UDFs, speech AI breakthroughs, and next-gen document intelligence.

Merlyn Shelley

Jun 25

READ IN APP

Build self-service analytics 4× faster with OpenUI Cloud.

What if every user could turn a question into a dashboard? OpenUI Cloud plugs into your existing data stack, transforming plain-English queries into live charts, KPI cards, reports, and dashboards—without adding to your BI backlog or frontend roadmap.

Start Building Now

👋Hey there, welcome to DataPro #175.

Building AI systems is no longer the hard part. Building AI systems that behave consistently, scale across teams, and are fast enough for production is. Many organizations are discovering that great prompts alone don’t translate into reliable products. What they need are reusable workflows, measurable infrastructure, and architectures designed for production from day one.

That’s exactly where this week’s leading story comes in. Elisa Terumi, PhD, explains why Skills are emerging as a foundational building block for AI agents, showing how they transform one-off prompts into reusable, version-controlled capabilities that make agentic systems easier to build, maintain, and scale.

In this week’s highlights:

🧠 Learn how AI Skills are reshaping agent architectures and why they’re becoming the new standard for reusable AI workflows.
⚡ Explore DFlash, a new speculative decoding technique delivering up to 15× higher LLM throughput on NVIDIA Blackwell GPUs.
📄 Discover Datalab’s lift and Baidu’s Unlimited OCR, two approaches pushing document AI forward with structured extraction and long-document parsing.
🎙️ See how speech AI is evolving with Gradium’s real-time translation models and Amazon Nova 2 Sonic powering natural voice agents.
Discover how OpenUI Cloud brings conversational analytics to your existing data stack.
📊 Dive into the latest cloud tooling, from BigQuery Managed Python UDFs and Google Observability Analytics to enterprise-scale AI deployments across banking and telecom.

Whether you’re building agentic applications, optimizing inference, or deploying AI into production, this edition brings together the architectural patterns, infrastructure updates, and engineering breakthroughs shaping the next generation of Data and AI systems.

Cheers,

Merlyn Shelley,

Growth Lead, Packt.

⚙️ This Week's Packt Expert Workshop: Production-Ready GenAI Starts with Evaluation

As GenAI becomes part of AI applications, data pipelines, and ML workflows, evaluating model outputs is no longer optional. Join Amy Chen and Surjeet Mishra to learn practical frameworks for measuring LLM quality, identifying failure modes, building evaluation pipelines, and implementing feedback loops that make GenAI systems more reliable, trustworthy, and production-ready.

🎟 Save 30% today with the registration link.

The workshop is filling up quickly. If GenAI is part of your AI, ML, or data stack, you won't want to miss it.

Join Amy Chen & Sujeet Mishra Live

What Are Skills in AI Agent Systems? And How to Build Your Own

Written by Elisa Terumi, PhD

The term sounds simple. But in modern AI systems, it has a very specific meaning.

And understanding it changes how you build with LLMs.

What are skills?

Skills are modular, reusable instruction sets that teach an AI system how to perform specific tasks or workflows.

In systems like Claude Code, a skill is typically:

A folder

Containing a SKILL.md file

With structured instructions describing how a task should be executed

Once defined, the system can automatically apply that knowledge whenever a relevant request appears.

This is the key shift:

Instead of repeating prompts, you encode behavior once — and reuse it.

I’ve created a repository with practical examples of skills — feel free to explore it here: https://github.com/elisaterumi-ai/agent-skills-in-practice

Skills as structured capabilities (not just prompts)

A common misconception is to treat skills as “saved prompts.”

They are not. A saved prompt is a one-off instruction you reuse manually. A skill is closer to a standard operating procedure (SOP) for AI:

It defines what to do

When to do it

How to do it consistently

The practical difference is significant. A prompt depends on you remembering to use it and applying it correctly each time. A Skill is activated automatically by the system when the context is relevant, follows a testable structure, and can be shared with your team as part of the repository.

Technically, a Skill combines instructions, workflows, and context to handle multi-step tasks — while a standalone tool executes one specific deterministic function, and a one-off prompt has no structure or reuse.

That combination is what makes Skills an architectural pattern, not just a convenience.

How skills work under the hood

The execution model is subtle — and important.

When a system (like Claude Code) runs:

It loads only skill names and descriptions

It receives a user request

It performs semantic matching

It selects relevant skills

It loads the full instructions and executes them

This has two implications:

Skills do not clutter the context window

They activate only when needed

Skills vs prompts vs tools

Understanding this distinction is critical.

Prompts

One-off instructions

Not reusable

No structure

Tools

Execute a specific function

Deterministic behavior

Skills

Combine: instructions + workflows + context

Handle multi-step tasks

In other words:

A tool does one thing. A skill orchestrates how things should be done.

Why skills matter (from experimentation to production)

Skills are not just a convenience feature. They are an architectural pattern.

They enable:

Consistency → same output format every time

Reuse → define once, apply everywhere

Scalability → move from prompts to systems

Collaboration → share workflows across teams

In fact, skills are increasingly used to:

encode coding standards

enforce documentation formats

automate workflows

embed domain knowledge into AI systems

Where skills live

Skills are typically scoped at two levels:

Personal skills

Stored locally

Reused across projects

In Claude systems, personal skills live in ~/.claude/skills in your home directory. These follow you across all your projects — your commit style, your documentation format, how you like code explained.

Project skills

Stored in repositories

Version-controlled

Shared with teams

Project skills live in .claude/skills inside the repository root. Anyone who clones the repo gets these skills automatically. This is where team standards live: coding conventions, brand guidelines, project-specific processes. Because they sit inside the repository, they’re version-controlled alongside the code and shared naturally through Git.

This makes them part of the codebase — not just user configuration.

The Anatomy of a Skill

A Skill is a directory containing a SKILL.md file. The directory name should match the skill name. The file has two parts: a YAML metadata block at the top and Markdown instructions below.

The metadata defines name and description,both required. The description is the most critical field: it’s what Claude uses to decide whether the Skill is relevant. Two optional fields also exist: allowed-tools, which restricts which tools Claude can use while the Skill is active, and model, which specifies which Claude model to use for that Skill.

The instructions define the steps, rules, and output format. This is where the actual procedure lives.

When should you create a skill?

A practical rule:

If you are repeating the same instructions more than once, you should create a skill.

Typical use cases:

Code review guidelines

Commit message formats

Documentation templates

Data processing pipelines

Domain-specific transformations

Practical Example: A PR Description Skill

Let’s build a personal Skill that teaches Claude to write pull request descriptions in a consistent format.

First, create the directory:

mkdir -p ~/.claude/skills/pr-description

Then create the SKILL.md file inside that directory:

---

name: pr-description

description: Writes pull request descriptions. Use when creating a PR,

writing a PR, or when the user asks to summarize changes for a pull request.

---
When writing a PR description:

1. Run `git diff main...HEAD` to see all changes on this branch

2. Write a description following this format:

## What
One sentence explaining what this PR does.

## Why
Brief context on why this change is needed.

## Changes
- Bullet points of specific changes made
- Group related changes together
- Mention any files deleted or renamed

Restart Claude Code.

The next time you say “write a PR description for my changes,” Claude will recognize the request, load the Skill, and follow the template — same format every time.

Dive deeper into the topic on Packt’s Medium handle.

🕸️ Turn Connected Data Into Better AI Answers

Many RAG systems fail not because the model lacks knowledge, but because retrieval lacks structure. Join Bruno Gonçalves for a practical workshop on GraphRAG and learn how to build AI applications that can reason across relationships, answer multi-hop questions, and generate more trustworthy responses.

🎟️ Save 35% on your ticket with the DataPro community offer.

Discover how leading teams are tackling AI hallucinations and building systems that can reason across complex business data.

Save Your Spot Now!

Data Science & ML Research Roundup

◾ How Loka Built a Natural, Low-Latency Voice Agent with Amazon Nova 2 Sonic: Loka built a conversational AI agent with Amazon Nova 2 Sonic to eliminate the slow, robotic experience of traditional voice assistants. By using native speech-to-speech processing, the solution delivers faster responses, higher speech reasoning accuracy, and lower costs. Prompt engineering further boosted conversational quality, making the AI more natural, accurate, and production-ready for customer support at scale.

◾ Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing: Baidu has open-sourced Unlimited OCR, a 3B-parameter model that solves a major OCR bottleneck by keeping memory usage constant, enabling efficient parsing of long documents in a single pass. Built on DeepSeek OCR, it delivers higher accuracy, faster throughput, and lower latency, making it well suited for large-scale document processing, transcription, and multimodal parsing workflows.

◾ Huntington Bank: Redacting sensitive data from 400M+ documents with AWS Huntington Bank cut a multi-year compliance project down to months by building a scalable AWS-powered pipeline to detect and redact sensitive data across 400 million documents. Using Amazon Textract, SageMaker, Step Functions, and Lambda, the solution achieved over 95% redaction accuracy while securely processing documents at massive scale with high concurrency and PCI DSS compliance.

◾ Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas Datalab has introduced lift, a 9B open-weights vision model that extracts structured JSON from PDFs and images using JSON schemas. It achieves 90.2% field accuracy while processing multi-page documents in a single pass, making it one of the strongest self-hostable extraction models. Schema-constrained decoding reduces hallucinations and enables reliable document automation workflows.

◾ Build a healthcare appointment agent with Amazon Nova 2 Sonic AWS has published a reference architecture for building healthcare appointment agents with Amazon Nova 2 Sonic and Bedrock AgentCore. The speech-to-speech AI authenticates patients, confirms or reschedules appointments, collects pre-visit information, and escalates to staff when needed. Built with serverless AWS services and healthcare-specific tools, it enables natural, low-latency voice interactions that can help reduce appointment no-shows at scale.

◾ Gradium Launches stt-translate and s2s-translate, Real-Time Speech Translation Models Beating gpt-realtime-translate on Accuracy and Latency: Gradium has launched stt-translate and s2s-translate, real-time speech translation models that combine transcription, translation, and speech output into a faster two-model pipeline. Supporting five languages and 20 pairs, the models claim stronger BLEU accuracy than GPT and Gemini alternatives, 3-second average latency, live browser streaming, and voice control, including cloning for multilingual meetings, agents, and dubbing.

◾ Open models, global networks: How AT&T and GSMA are accelerating innovation with Gemma Google Cloud and GSMA have introduced Open Telco AI, an initiative built on Gemma models to bring domain-specific AI to telecom networks. Fine-tuned on specialized telecom data, the open OTel models outperform larger general-purpose models on network tasks while reducing hallucinations through RAG. The project aims to accelerate AI-driven network automation, self-healing systems, and telecom-grade AI adoption.

◾ How to Design an OpenHarness Style Agent Runtime with Tools, Memory, Permissions, Skills, and Multi-Agent Coordination: This tutorial breaks down how to build an OpenHarness-style agent runtime from scratch, exposing the full mechanics behind modern agent systems. It walks through tool schemas, permissions, lifecycle hooks, memory, skills, retries, cost tracking, context compaction, and multi-agent coordination, giving developers a runnable framework for understanding how agents reason, call tools, manage state, and complete tasks.

◾ Query logs and traces with SQL in Observability Analytics: Google Cloud has rebranded Log Analytics as Observability Analytics, adding GA support for SQL-based analysis of logs and traces in a unified workspace. Developers can now join telemetry with business data to troubleshoot applications, optimize AI agents, and identify performance bottlenecks using BigQuery-powered SQL, while the new Observability API enables programmatic access for agentic workflows and automation.

◾ DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell: Researchers at UC San Diego have introduced DFlash, a speculative decoding method that generates entire token blocks in parallel instead of one token at a time. By combining lightweight diffusion drafting with autoregressive verification, DFlash delivers up to 6× faster lossless inference in research benchmarks, while NVIDIA reports up to 15× higher throughput on Blackwell GPUs for latency-sensitive AI workloads such as coding agents and reasoning models.

◾ Python UDF in BigQuery, now generally available: Google Cloud has announced the general availability of BigQuery Managed Python UDFs, enabling developers to run custom Python code and popular libraries like NumPy, pandas, and scikit-learn directly within BigQuery SQL. The serverless feature eliminates infrastructure management while supporting vectorized execution, configurable compute resources, external API integration, and production-grade monitoring for advanced analytics and machine learning workflows.

See you next time!

You're currently a free subscriber to Packt DataPro. For the full experience, upgrade your subscription.

Upgrade to paid

Comment

Restack

The Blueprint for Production-Ready AI Agents

Plus: 15× faster LLM inference, Google's Python UDFs, speech AI breakthroughs, and next-gen document intelligence.

The Blueprint for Production-Ready AI Agents

Plus: 15× faster LLM inference, Google's Python UDFs, speech AI breakthroughs, and next-gen document intelligence.

Build self-service analytics 4× faster with OpenUI Cloud.

⚙️ This Week's Packt Expert Workshop: Production-Ready GenAI Starts with Evaluation

What Are Skills in AI Agent Systems? And How to Build Your Own

Written by Elisa Terumi, PhD

Why skills matter (from experimentation to production)

The Anatomy of a Skill

When should you create a skill?

Practical Example: A PR Description Skill

🕸️ Turn Connected Data Into Better AI Answers

Data Science & ML Research Roundup

Recent emails from Packt