The Blueprint for Production-Ready AI Agents
Plus: 15× faster LLM inference, Google's Python UDFs, speech AI breakthroughs, and next-gen document intelligence.
Is this your brand on Milled? Claim it.
The Blueprint for Production-Ready AI AgentsPlus: 15× faster LLM inference, Google's Python UDFs, speech AI breakthroughs, and next-gen document intelligence.
Build self-service analytics 4× faster with OpenUI Cloud.What if every user could turn a question into a dashboard? OpenUI Cloud plugs into your existing data stack, transforming plain-English queries into live charts, KPI cards, reports, and dashboards—without adding to your BI backlog or frontend roadmap. 👋Hey there, welcome to DataPro #175. Building AI systems is no longer the hard part. Building AI systems that behave consistently, scale across teams, and are fast enough for production is. Many organizations are discovering that great prompts alone don’t translate into reliable products. What they need are reusable workflows, measurable infrastructure, and architectures designed for production from day one. That’s exactly where this week’s leading story comes in. Elisa Terumi, PhD, explains why Skills are emerging as a foundational building block for AI agents, showing how they transform one-off prompts into reusable, version-controlled capabilities that make agentic systems easier to build, maintain, and scale. In this week’s highlights:
Whether you’re building agentic applications, optimizing inference, or deploying AI into production, this edition brings together the architectural patterns, infrastructure updates, and engineering breakthroughs shaping the next generation of Data and AI systems. Cheers, Merlyn Shelley, Growth Lead, Packt. ⚙️ This Week's Packt Expert Workshop: Production-Ready GenAI Starts with EvaluationAs GenAI becomes part of AI applications, data pipelines, and ML workflows, evaluating model outputs is no longer optional. Join Amy Chen and Surjeet Mishra to learn practical frameworks for measuring LLM quality, identifying failure modes, building evaluation pipelines, and implementing feedback loops that make GenAI systems more reliable, trustworthy, and production-ready. 🎟 Save 30% today with the registration link. The workshop is filling up quickly. If GenAI is part of your AI, ML, or data stack, you won't want to miss it. What Are Skills in AI Agent Systems? And How to Build Your OwnWritten by Elisa Terumi, PhDThe term sounds simple. But in modern AI systems, it has a very specific meaning. And understanding it changes how you build with LLMs. What are skills? Skills are modular, reusable instruction sets that teach an AI system how to perform specific tasks or workflows. In systems like Claude Code, a skill is typically: A folder Containing a SKILL.md file With structured instructions describing how a task should be executed Once defined, the system can automatically apply that knowledge whenever a relevant request appears. This is the key shift: Instead of repeating prompts, you encode behavior once — and reuse it. I’ve created a repository with practical examples of skills — feel free to explore it here: https://github.com/elisaterumi-ai/agent-skills-in-practice Skills as structured capabilities (not just prompts) A common misconception is to treat skills as “saved prompts.” They are not. A saved prompt is a one-off instruction you reuse manually. A skill is closer to a standard operating procedure (SOP) for AI: It defines what to do When to do it How to do it consistently The practical difference is significant. A prompt depends on you remembering to use it and applying it correctly each time. A Skill is activated automatically by the system when the context is relevant, follows a testable structure, and can be shared with your team as part of the repository. Technically, a Skill combines instructions, workflows, and context to handle multi-step tasks — while a standalone tool executes one specific deterministic function, and a one-off prompt has no structure or reuse. That combination is what makes Skills an architectural pattern, not just a convenience. How skills work under the hood The execution model is subtle — and important. When a system (like Claude Code) runs: It loads only skill names and descriptions It receives a user request It performs semantic matching It selects relevant skills It loads the full instructions and executes them This has two implications: Skills do not clutter the context window They activate only when needed Skills vs prompts vs tools Understanding this distinction is critical. Prompts One-off instructions Not reusable No structure Tools Execute a specific function Deterministic behavior Skills Combine: instructions + workflows + context Handle multi-step tasks In other words: A tool does one thing. A skill orchestrates how things should be done. Why skills matter (from experimentation to production)Skills are not just a convenience feature. They are an architectural pattern. They enable: Consistency → same output format every time Reuse → define once, apply everywhere Scalability → move from prompts to systems Collaboration → share workflows across teams In fact, skills are increasingly used to: encode coding standards enforce documentation formats automate workflows embed domain knowledge into AI systems Where skills live Skills are typically scoped at two levels: Personal skills Stored locally Reused across projects In Claude systems, personal skills live in ~/.claude/skills in your home directory. These follow you across all your projects — your commit style, your documentation format, how you like code explained. Project skills Stored in repositories Version-controlled Shared with teams Project skills live in .claude/skills inside the repository root. Anyone who clones the repo gets these skills automatically. This is where team standards live: coding conventions, brand guidelines, project-specific processes. Because they sit inside the repository, they’re version-controlled alongside the code and shared naturally through Git. This makes them part of the codebase — not just user configuration. The Anatomy of a SkillA Skill is a directory containing a SKILL.md file. The directory name should match the skill name. The file has two parts: a YAML metadata block at the top and Markdown instructions below. The metadata defines name and description,both required. The description is the most critical field: it’s what Claude uses to decide whether the Skill is relevant. Two optional fields also exist: allowed-tools, which restricts which tools Claude can use while the Skill is active, and model, which specifies which Claude model to use for that Skill. The instructions define the steps, rules, and output format. This is where the actual procedure lives. When should you create a skill?A practical rule: If you are repeating the same instructions more than once, you should create a skill. Typical use cases: Code review guidelines Commit message formats Documentation templates Data processing pipelines Domain-specific transformations Practical Example: A PR Description SkillLet’s build a personal Skill that teaches Claude to write pull request descriptions in a consistent format. First, create the directory: mkdir -p ~/.claude/skills/pr-description Then create the SKILL.md file inside that directory: --- Restart Claude Code. The next time you say “write a PR description for my changes,” Claude will recognize the request, load the Skill, and follow the template — same format every time. Dive deeper into the topic on Packt’s Medium handle. 🕸️ Turn Connected Data Into Better AI AnswersMany RAG systems fail not because the model lacks knowledge, but because retrieval lacks structure. Join Bruno Gonçalves for a practical workshop on GraphRAG and learn how to build AI applications that can reason across relationships, answer multi-hop questions, and generate more trustworthy responses. 🎟️ Save 35% on your ticket with the DataPro community offer. Data Science & ML Research Roundup◾ How Loka Built a Natural, Low-Latency Voice Agent with Amazon Nova 2 Sonic: Loka built a conversational AI agent with Amazon Nova 2 Sonic to eliminate the slow, robotic experience of traditional voice assistants. By using native speech-to-speech processing, the solution delivers faster responses, higher speech reasoning accuracy, and lower costs. Prompt engineering further boosted conversational quality, making the AI more natural, accurate, and production-ready for customer support at scale. ◾ Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing: Baidu has open-sourced Unlimited OCR, a 3B-parameter model that solves a major OCR bottleneck by keeping memory usage constant, enabling efficient parsing of long documents in a single pass. Built on DeepSeek OCR, it delivers higher accuracy, faster throughput, and lower latency, making it well suited for large-scale document processing, transcription, and multimodal parsing workflows. ◾ Huntington Bank: Redacting sensitive data from 400M+ documents with AWS Huntington Bank cut a multi-year compliance project down to months by building a scalable AWS-powered pipeline to detect and redact sensitive data across 400 million documents. Using Amazon Textract, SageMaker, Step Functions, and Lambda, the solution achieved over 95% redaction accuracy while securely processing documents at massive scale with high concurrency and PCI DSS compliance. ◾ Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas Datalab has introduced lift, a 9B open-weights vision model that extracts structured JSON from PDFs and images using JSON schemas. It achieves 90.2% field accuracy while processing multi-page documents in a single pass, making it one of the strongest self-hostable extraction models. Schema-constrained decoding reduces hallucinations and enables reliable document automation workflows. ◾ Build a healthcare appointment agent with Amazon Nova 2 Sonic AWS has published a reference architecture for building healthcare appointment agents with Amazon Nova 2 Sonic and Bedrock AgentCore. The speech-to-speech AI authenticates patients, confirms or reschedules appointments, collects pre-visit information, and escalates to staff when needed. Built with serverless AWS services and healthcare-specific tools, it enables natural, low-latency voice interactions that can help reduce appointment no-shows at scale. ◾ Gradium Launches stt-translate and s2s-translate, Real-Time Speech Translation Models Beating gpt-realtime-translate on Accuracy and Latency: Gradium has launched stt-translate and s2s-translate, real-time speech translation models that combine transcription, translation, and speech output into a faster two-model pipeline. Supporting five languages and 20 pairs, the models claim stronger BLEU accuracy than GPT and Gemini alternatives, 3-second average latency, live browser streaming, and voice control, including cloning for multilingual meetings, agents, and dubbing. ◾ Open models, global networks: How AT&T and GSMA are accelerating innovation with Gemma Google Cloud and GSMA have introduced Open Telco AI, an initiative built on Gemma models to bring domain-specific AI to telecom networks. Fine-tuned on specialized telecom data, the open OTel models outperform larger general-purpose models on network tasks while reducing hallucinations through RAG. The project aims to accelerate AI-driven network automation, self-healing systems, and telecom-grade AI adoption. ◾ How to Design an OpenHarness Style Agent Runtime with Tools, Memory, Permissions, Skills, and Multi-Agent Coordination: This tutorial breaks down how to build an OpenHarness-style agent runtime from scratch, exposing the full mechanics behind modern agent systems. It walks through tool schemas, permissions, lifecycle hooks, memory, skills, retries, cost tracking, context compaction, and multi-agent coordination, giving developers a runnable framework for understanding how agents reason, call tools, manage state, and complete tasks. ◾ Query logs and traces with SQL in Observability Analytics: Google Cloud has rebranded Log Analytics as Observability Analytics, adding GA support for SQL-based analysis of logs and traces in a unified workspace. Developers can now join telemetry with business data to troubleshoot applications, optimize AI agents, and identify performance bottlenecks using BigQuery-powered SQL, while the new Observability API enables programmatic access for agentic workflows and automation. ◾ DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell: Researchers at UC San Diego have introduced DFlash, a speculative decoding method that generates entire token blocks in parallel instead of one token at a time. By combining lightweight diffusion drafting with autoregressive verification, DFlash delivers up to 6× faster lossless inference in research benchmarks, while NVIDIA reports up to 15× higher throughput on Blackwell GPUs for latency-sensitive AI workloads such as coding agents and reasoning models. ◾ Python UDF in BigQuery, now generally available: Google Cloud has announced the general availability of BigQuery Managed Python UDFs, enabling developers to run custom Python code and popular libraries like NumPy, pandas, and scikit-learn directly within BigQuery SQL. The serverless feature eliminates infrastructure management while supporting vectorized execution, configurable compute resources, external API integration, and production-grade monitoring for advanced analytics and machine learning workflows. See you next time! You're currently a free subscriber to Packt DataPro. For the full experience, upgrade your subscription. |



