GraphRAG vs RAG, Claude Mastery, LifeSciBench & 1M-Context Models
GraphRAG explained by a former JPMorgan VP • OpenAI's LifeSciBench • Vercel Eve • MiniMax-M3 • GLM-5.2 • Exclusive workshops.
Is this your brand on Milled? Claim it.
GraphRAG vs RAG, Claude Mastery, LifeSciBench & 1M-Context ModelsBruno Gonçalves on RAG failures • OpenAI's LifeSciBench • Vercel Eve • MiniMax-M3 • GLM-5.2 • SageMaker Async updates • Exclusive workshops
👋Hi there! Welcome to DataPro #174. This week, we’re exploring a question that sits at the heart of enterprise AI: Why do RAG systems still hallucinate when the answer is already in the data? Former JPMorgan Chase VP of Data Science Bruno Gonçalves unpacks the retrieval challenges behind many RAG failures and explains why GraphRAG is gaining traction for multi-hop reasoning, corpus-wide analysis, and explainable AI. If you're building AI applications today, these challenges will likely feel familiar. From unreliable retrieval and reasoning gaps to measuring AI ROI, scaling agents, and managing ever-growing context windows, organizations are moving beyond experimenting with AI and into the harder work of making it reliable, explainable, and production-ready. Many of the stories in this week's edition explore that transition from prototype to practice. As part of our commitment to bringing practical, practitioner-led learning to the DataPro community, we’re also pleased to partner with GrowthSchool and Outskill. Together with AI communities around the world, we're helping surface high-quality learning opportunities, from free modules and workshops to expert-led training and hands-on sessions hosted by the Packt Virtual Conference team. One such opportunity this week is a free workshop by Outskill on turning Claude into a 24/7 AI copilot, covering Skills, Connectors, Cowork, vibe coding, and workflow automation. It’s a practical, hands-on way to learn how experienced practitioners are integrating AI into their daily workflows. And if you can’t join live, every session is recorded and shared with registered attendees, so you can learn at your own pace and revisit the material whenever you need it. Make Claude Your 2nd Brain That works 24/7 By Mastering In It 16 hours This Week’s Highlights🔬 OpenAI’s LifeSciBench Reveals How Far AI Still Is From Scientific Reasoning 🤖 Vercel Open-Sources Eve, the Agent Framework Behind 100+ Production AI Agents 📈 Google’s DORA Research Says AI ROI Starts With a Productivity Dip ⚡ Amazon SageMaker Async Inference Removes Mandatory S3 Uploads 🧠 MiniMax-M3 Launches With 1 Million Context and Sparse Attention Breakthroughs 🚀 GLM-5.2 Raises the Bar for Open-Source Agentic Coding Models 💻 Run a Powerful Coding Model Locally With Just 4.5GB of VRAM Gemma4-12B-Coder combines verified reasoning traces and coding expertise in a lightweight, fully local package. Let’s get into it. Cheers, Merlyn Shelley, Growth Lead, Packt. Why RAG Hallucinates With the Answer in HandWritten by Bruno Gonçalves, data scientist, educator, and former Vice President of Data Science and Finance at JPMorgan Chase. Vector RAG has a strange failure pattern. The answer sits in your documents; the retriever returns something plausible but still gets it wrong. The fault sits with your retrieval scaffolding that cannot see structure, and it shows up in three predictable ways. 1. Multi-hop questions. Ask about the indirect suppliers of Company X. The retriever returns chunks that mention Company X, but the indirect suppliers live in documents that never name it. One document says firm A supplies firm B. Another says firm B supplies firm C. No single chunk holds the chain, so no similarity score will surface it. Cosine similarity has no notion of A → B → C. A knowledge graph stores those links as data: ndes represent Entities and their relationships are edges. With this formalism, the question is just a two-hop traversal that runs in milliseconds. 2. Global questions. “What are the main themes across these 500 documents?” Top-k retrieval grabs the ten chunks closest to the query and ignores the other ten thousand. The GraphRAG paper named the problem: global questions are summarization tasks, not retrieval tasks. The fix is to build the a map first, then compute the answers from the map. Extract entities and relations from every chunk, cluster them into communities, summarize each community, then reduce those summaries into one response. On million-token test corpora, this beat vector RAG on both the comprehensiveness and the diversity of its answers. 3. Explainability. Vector RAG can show you a relevant chunk. It cannot show you a reason. A chunk might score 0.87 on query similarity, but that is the whole story. Graph RAG returns the chain of entities and relations behind the answer, hop by hop, back to the source text. In finance, healthcare, and law, that audit trail decides whether the system ships. The thread running through all three failure modes is the same. Hallucination in RAG is rarely a generation problem. It is a retrieval problem wearing a generation costume. The model invents connections precisely where the retriever failed to supply them: across hops, across the whole corpus, and across the gap between an answer and its evidence. Hand the model real structure instead of a stack of similar-looking chunks, and the invented connections mostly disappear. The documents had the answer all along. The graph is what lets the system find it. Full GraphRAG is expensive to index. The pipeline makes LLM calls for every chunk, every entity description, and every community summary. One practitioner account puts the price of indexing a single five-gigabyte legal dataset at $33,000 in early 2024. That number kept plenty of teams on plain vectors, whatever the quality argument said. LazyGraphRAG attacked the cost in November 2024. It skips upfront summarization entirely, relying instead on noun-phrase extraction and co-occurrence statistics, with zero LLM calls, so it costs about the same as building a vector index, or about 0.1% of full GraphRAG. The graph work shifts to query time, where an iterative search builds only the structure a given question needs. Answer quality matches GraphRAG global search at more than 700 times lower query cost, and the cost objection to graphs largely dissolved. Want to Build a GraphRAG System?Reading about GraphRAG is one thing. Building one is another. Start with [our walkthrough], where you’ll learn how to turn 2,000 news articles into a searchable knowledge graph. Then take the next step on July 11 with our live Production GraphRAG Workshop, where you’ll build a complete GraphRAG chatbot from raw Wikipedia data in just 3.5 hours. You’ll learn entity and relationship extraction with spaCy and REBEL, graph construction with NetworkX, hybrid graph-plus-vector retrieval, and grounded generation with an LLM. You’ll also receive the recording, source code, slides, and a certificate, so you can revisit the material long after the session ends. 🎟️ Exclusive for DataPro readers: Save 35% on your ticket. Whether you’re exploring GraphRAG for the first time or looking to improve an existing RAG pipeline, this workshop will give you a practical framework for tackling the multi-hop questions traditional retrieval systems often miss. Basic Python and Docker knowledge are all you need. Seats are limited. Data Science & ML Research Roundup◾OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric: OpenAI has launched LifeSciBench, a new benchmark designed to test how well AI models perform real-world life sciences research, not just fact recall. Built by 173 PhD-level scientists, it includes 750 expert-authored tasks spanning 7 research workflows and 7 biological domains, supported by 1,062 artifacts and graded against 19,020 rubric criteria. Results show significant headroom: the top-performing model, GPT-Rosalind, achieved a 36.1% pass rate and 0.576 normalized score, while GPT-5.5 scored 25.7%. Models struggled most with artifact-heavy tasks, experimental design, and precise sequence generation, highlighting how far AI remains from matching expert scientific reasoning. ◾Vercel Releases Eve: An Open-Source AI Agent Framework Where Each Agent is a Directory of Files Mapped to Capabilities: Vercel has open-sourced eve, the AI agent framework powering 100+ production agents internally. Built around a filesystem-first approach, each agent is simply a directory where files map to capabilities like tools, skills, channels, schedules, and subagents. The framework ships with built-in durability, sandboxed execution, human approvals, secure MCP/OpenAPI connections, multi-channel deployment, and observability. Vercel reports real-world adoption at scale: d0 handles 30,000+ data queries monthly, Vertex autonomously resolves 92% of support tickets, and its autonomous SDR delivers a reported 32× ROI. Agents can be scaffolded with a single command and deployed unchanged from local development to production. ◾How to measure the business value of generative AI: Google’s latest DORA research on AI-assisted software development argues that proving AI ROI requires more than measuring productivity gains. The report highlights a “J-curve” effect, where teams often experience an initial productivity dip due to learning new workflows, increased code-review demands, and bottlenecks in testing and approvals. While 90% of surveyed developers now use AI at work, financial outcomes vary widely, with successful organizations investing in workflow and cultural changes alongside tooling. DORA recommends building explicit ROI models that account for both visible costs and hidden adoption challenges, using frameworks and calculators that link AI investments to productivity, security, developer experience, and business growth. ◾Amazon SageMaker AI Async Inference now supports inline request payloads: Amazon SageMaker AI has added inline payload support for Async Inference, allowing developers to send request data directly through the new Body parameter instead of first uploading inputs to Amazon S3. The feature supports payloads up to 128 KB, eliminates an extra network round-trip, removes S3 upload costs and IAM dependencies, and simplifies client code while maintaining existing output behavior through S3. Designed to be backward compatible, it works with existing async endpoints without model or container changes and is available across 31 AWS commercial regions. AWS recommends inline payloads for smaller JSON and structured-data workloads, while larger inputs such as images and audio should continue using S3-based InputLocation workflows. ◾MiniMaxAI/MiniMax-M3: MiniMax has released MiniMax-M3, a native multimodal foundation model with a massive 1 million-token context window, 428B total parameters, and 23B activated parameters. Unlike models that add multimodal capabilities later, M3 is trained on text, images, and video from the first training step, enabling deeper cross-modal reasoning. Its new MiniMax Sparse Attention (MSA) architecture dramatically improves long-context efficiency, delivering 9× faster prefilling, 15× faster decoding, and reducing per-token compute costs to 1/20th of its predecessor (M2) at 1M-token context. The model also targets advanced coding and agentic workflows, achieving frontier-level results on long-horizon coding and cowork benchmarks, and supports three reasoning modes—enabled, adaptive, and disabled—to balance accuracy, latency, and throughput. Since its release, the open-source model has already recorded 56,000+ monthly downloads on Hugging Face. See you next time! You're currently a free subscriber to Packt DataPro. For the full experience, upgrade your subscription. |


