The feature that broke your model wasn't the algorithm
Plus: Claude Sonnet 5, AlphaEvolve's 4× breakthrough, Bedrock patterns, and production AI architectures.
Is this your brand on Milled? Claim it.
The feature that broke your model wasn't the algorithmPlus: Claude Sonnet 5, AlphaEvolve's 4× breakthrough, Bedrock patterns, and production AI architectures.
Social engineering is about manipulating people's emotions. Identify the susceptibilities that hackers use to exploit people.This NINJIO Insights Report dives into the key emotional susceptibilities that make social engineering work and offers concrete steps that your security team can take to equip your workforce to resist cyberattacks. 👋 Hi there, welcome to DataPro #176. As foundation models become increasingly commoditized, competitive advantage is shifting somewhere less glamorous but far more consequential: the quality of your data, features, and production systems. This week’s lead story explores exactly that. In an insightful conversation, Cornellius Yudha Wijaya, Chief Product Officer and AI practitioner, shares why the features that deliver the best offline metrics often become the first ones to fail in production. Interviewed by Vaideeshwari Roshan, he explains why point-in-time correctness, feature stability, and engineering discipline matter far more than squeezing out another percentage point of model accuracy. This week’s highlights:
If you’re building machine learning systems that need to survive production, scale across cloud infrastructure, or keep pace with the rapidly evolving AI ecosystem, this edition is packed with ideas, architectures, and lessons worth borrowing. Let’s get into it. Cheers, Merlyn Shelley, Growth Lead, Packt. The Feature That Looked Perfect… Until It Reached ProductionFeature engineering is often described as the secret sauce behind great machine learning models. But after reading through a conversation between my colleague Vaideeshwari Roshan and data scientist Cornellius Yudha Wijaya, I realized the conversation isn’t really about feature engineering. It’s about engineering trust. Cornell has spent more than seven years building machine learning systems across insurance and AI startups, working with sensitive enterprise data where every prediction has business consequences. His perspective is refreshingly different from the feature engineering tutorials most of us consume. Rather than focusing on creating more features, he focuses on building features that survive production. Below is an edited version of that conversation, along with a few observations that stood out to me as I read through it. 🎙️ Behind the ConversationInterviewer: Vaideeshwari Roshan, Portfolio Manager, Packt Expert: Cornellius Yudha Wijaya, Chief Product Officer, Data Scientist, and Author “Feature engineering isn’t about creating more features.” One of the first things that struck me was how quickly Cornell shifted the conversation away from algorithms. Many of us instinctively think about machine learning in terms of model selection. Should we use XGBoost? CatBoost? Neural networks? Cornell’s answer reminded me that production ML teams often spend far more time debating the quality of their features than the sophistication of their models. Vaideeshwari: You’ve worked on enterprise machine learning systems as well as AI products in startups. How has your perspective on feature engineering evolved? Cornellius: Early in my career, I thought feature engineering was mostly about creating predictive variables from raw data. Over time, I realized that’s only one part of the job. Today, I think about whether a feature is reliable enough for production. Can it survive upstream data changes? Is it explainable to business stakeholders? Can another engineer understand and reproduce it? Those questions matter just as much as predictive performance.
✍️ Editor’s ReflectionThat distinction feels increasingly relevant. As foundation models and AutoML continue to simplify model building, the competitive advantage is shifting elsewhere. The harder problem isn’t choosing an algorithm. It’s deciding which information your model should trust. That philosophy became even clearer when Cornell described one of his largest production projects. Building a churn prediction model where time was the biggest challenge The project Cornell chose to discuss wasn’t unusual on paper. It was a customer churn prediction system built at Allianz Life Indonesia. The challenge wasn’t predicting churn. The challenge was making sure every prediction reflected what the business actually knew at that moment in time. Vaideeshwari: What made this project particularly challenging from a feature engineering perspective? Cornellius: The data came from multiple enterprise systems spanning different time periods. Every customer had to be represented at a specific reference date, and every feature had to be calculated only from information available before that point. That sounds simple, but it’s one of the easiest places to make mistakes. One incorrect join or timestamp can accidentally introduce future information into your feature set. Vaideeshwari: You often emphasize “point-in-time correctness.” Why is it so important? Cornellius: Because models should never learn from the future. If a feature accidentally contains information generated after your prediction date, validation results become misleading. The model appears much better than it really is. Then you deploy it, and suddenly performance drops because that future information no longer exists. Every feature should answer one question: “Would we have known this information at prediction time?” If the answer is no, the feature shouldn’t exist. ✍️ What surprised me mostData leakage is something every ML engineer has heard about. But Cornell frames it differently. He doesn’t describe it as a modeling mistake. He describes it as a feature engineering mistake. That’s an important mindset shift because it moves the discussion upstream, where these problems actually begin. The most valuable features weren’t static One misconception I had before reading this interview was assuming customer profiles would carry most of the predictive power. Instead, Cornell kept coming back to one word:
Vaideeshwari: Which features ultimately made the biggest difference? Cornellius: Behavioral features consistently outperformed static customer attributes. Instead of describing who customers were, we focused on how they were changing. Rolling-window aggregates over the previous 7, 30, and 90 days helped capture recent activity. Trend features showed whether engagement was increasing or decreasing. Delta features highlighted meaningful behavioral shifts. For highly skewed financial variables, log transformations also improved robustness. Customers are constantly changing. Our features needed to reflect that. ✍️ Editor’s ReflectionThat answer stayed with me. Good features don’t simply describe reality. They describe change. Whether you’re predicting churn, fraud, demand, or equipment failure, the strongest signal often isn’t the current state. It’s the direction of movement. The feature that improved accuracy… and still got deleted This was probably the most unexpected moment in the conversation. Cornell explained that his team deliberately removed features that improved model performance. Naturally, Vaideeshwari asked why. Vaideeshwari: Why would you remove a feature that improves your validation metrics? Cornellius: Because validation metrics aren’t the business objective. Some features looked extremely predictive because they contained subtle leakage. Others depended on business rules that changed frequently. A few simply weren’t stable over time. Keeping them improved offline accuracy. Removing them improved production reliability. Sometimes you have to trade a slightly better benchmark for a much better production system. ✍️ One sentence worth remembering
Production doesn’t end with model deployment One theme kept appearing throughout the interview. Feature engineering isn’t finished once the model is trained. In many ways, that’s when the real work begins. Vaideeshwari: How did your team maintain feature quality after deployment? Cornellius: Every feature was treated like a managed asset. We documented feature definitions, tracked version changes, maintained validation rules, and monitored data freshness, missing values, and business-rule consistency. When upstream systems changed unexpectedly, anomaly detection alerted us before the issue affected production predictions. Feature engineering is an ongoing operational process. Vaideeshwari: Did richer features ever conflict with operational requirements? Cornellius: Definitely. More sophisticated features often improve predictive performance. But they also increase computation costs, latency, and pipeline complexity. We had to balance model quality with operational SLAs and business value. The best feature isn’t always the most sophisticated one. It’s the one that delivers reliable value at scale. Three lessons every ML engineer can borrow Before wrapping up, Vaideeshwari asked Cornell what he hoped practitioners would remember most from his experience. Vaideeshwari: If our readers remember only three lessons from this conversation, what should they be? Cornellius: 1. Point-in-time correctness is non-negotiable. Never allow your model to learn from information it wouldn’t have during prediction. 2. Stability is a feature. Reliable, consistent features often outperform clever ones over the long term. 3. Every feature should support a business decision. Predictions become valuable only when they enable action. Continue reading the full conversation on Packt’s Medium publication. Data Science & ML Research Roundup◾ Gemini Enterprise Agent Platform remote MCP server: Google Cloud has introduced the Gemini Enterprise Agent Platform remote MCP server, giving developers a secure way to connect external AI agents and IDEs like Claude Code to Google Cloud resources. Built on the open MCP standard, it enables access to Model Garden, prompt libraries, notebooks, and model management through a single governed interface, combining faster development with enterprise-grade security and IAM-based access controls. ◾ CUP (Common Useful Python): Building Reliable Python Workflows with Baidu’s Utility Toolkit. Baidu’s Common Useful Python (CUP) toolkit brings together production-ready utilities that simplify reliable Python development. From structured logging, configuration management, caching, and concurrency to ID generation, scheduling, networking, and resource monitoring, CUP provides a unified toolkit for building maintainable, scalable applications. The tutorial walks through practical workflows that help developers reduce boilerplate and improve the reliability of real-world Python systems. ◾ How Schrödinger sped up molecular discovery by 4x with Alphaevolve: Schrödinger has accelerated molecular simulations by 4× using Google DeepMind’s AlphaEvolve, an evolutionary AI coding agent that optimizes performance-critical algorithms in machine-learned force fields (MLFFs). By replacing computational bottlenecks with AI-generated parallel implementations, the company significantly sped up model training and inference, enabling faster drug discovery, catalyst design, and materials research while paving the way for AI-optimized GPU kernels in scientific computing. ◾ Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared. Anthropic has unveiled Claude Sonnet 5, its most capable mid-tier model yet, designed for long-running agentic workflows, autonomous coding, and tool use. It outperforms Sonnet 4.6 across every published benchmark while narrowing the gap with Opus 4.8. With introductory API pricing of $2/$10 per million tokens, Sonnet 5 offers one of the strongest cost-to-performance ratios for AI coding, automation, and enterprise development workloads. ◾ Simplify multi-account access to Amazon Bedrock models with managed entitlements: Amazon Bedrock now supports managed entitlements, allowing enterprises to subscribe to third-party foundation models such as Anthropic Claude and Cohere once and securely distribute access across AWS accounts. By centralizing subscriptions with AWS License Manager, organizations simplify governance, streamline multi-account AI deployments, maintain consistent pricing, and eliminate the need for AWS Marketplace permissions in individual workload accounts. ◾ Implementing resilience patterns with Amazon Bedrock and LLM gateway: As generative AI moves into production, Amazon Bedrock provides a set of resilience patterns to keep LLM inference highly available, scalable, and cost efficient. The guide covers five production-ready approaches, including cross-Region inference, multi-account quota isolation, and intelligent request routing, helping organizations improve availability, handle traffic spikes, reduce throttling, and build reliable multi-model AI applications on AWS. ◾ How Outpost VFX Uses AWS to Accelerate AI Model Training for Visual Effects: Outpost VFX has accelerated AI-powered face replacement training by up to 8× using AWS multi-GPU P5 instances and PyTorch Distributed Data Parallel. By replacing single-GPU workflows with distributed training, the studio reduced model training from 1–2 weeks to just two days, enabling faster creative iteration, higher-quality outputs, and more efficient visual effects production at scale. See you next time! You're currently a free subscriber to Packt DataPro. For the full experience, upgrade your subscription. |




