gm legends, happy Thursday.
The AI that launched in 2024 promising to replace software engineers, the model benchmarker that started running agents itself, and the forum thread asking who's actually watching what the agents are doing.
Is this your brand on Milled? Claim it.
|
|||
gm legends, happy Thursday.
The AI that launched in 2024 promising to replace software engineers, the model benchmarker that started running agents itself, and the forum thread asking who's actually watching what the agents are doing.
Devin Desktop lets you run and manage multiple Devin agents from a single interface, with parallel sessions and unified oversight across your fleet. Cognition built it after walking back their original pitch: Devin launched in 2024 promising to replace software engineers, and their CEO now says agents shouldn't replace humans at all.
🔥 Our Take: Devin launched in 2024 as the AI that would replace software engineers. The demos were contested, the benchmarks disputed, the $500/mo price arrived to underwhelmed reviews. Cognition's CEO now says AI agents shouldn't replace humans. A full reversal of the launch pitch. Devin Desktop is a good product built by a team that got the story wrong the first time.
Arena Agent Mode brings autonomous AI agents to arena.ai. The same platform that built its reputation benchmarking and comparing frontier models now deploys them to browse, research, code, and complete real-world tasks.
🔥 Our Take: Arena has spent years telling you which AI model is best. Now they're charging you to run it. The grader became the executor. You're paying for their expertise. Or you're paying them to validate their own rankings. Those are not the same thing.
They kept using words like colleague, coworker, team member. One CEO called it the glue holding their e-commerce business together, which is a lot, but also… you see why. It lives in Slack and plugs into 3,000+ tools, so instead of jumping between tabs, you just ask for the thing. Pull Stripe against HubSpot, check Sentry alerts, spin up a campaign brief, build a landing page, send a report upstairs. It all happens there.
It has already hit top 5 on Product Hunt with 130 comments, is SOC 2 certified, and your data does not train models.One user said it was the first time AI felt like a real coworker, which is either exciting or slightly concerning depending on your week.
PawPause is a free, open-source macOS app that detects when a cat settles on your keyboard and pauses all input system-wide. Milad Safarzadeh, a designer at Toggl, built it on-device with no cloud, no account, no subscription.
🔥 Our Take: This week, AI raised $65 billion and engineers were figuring out how to manage fleets of autonomous agents. Also: a product designer at a time-tracking app shipped a free Mac utility that watches for a cat. The $0 price is the point.
Aadil Ghani (@aadilghani) posted from inside a frustrating loop: start a task in Claude Code, Cursor, or Codex, switch contexts, come back 30-40 minutes later, find the agent either done long ago or stuck waiting for you. He asked how people handle visibility across multiple agents running in parallel.
The replies fanned out. Tina Chhabra pushed the problem further: an agent booking meetings or sending emails without you knowing is scarier than one writing code. Beatriz Albernaz framed it as a security issue. Simran Kumar said they started treating every agent like a background service that has to notify on state change.
The line that cut through was Conrad N's: "The tools that win may not be the ones building better agents, but the ones helping humans supervise dozens of them effectively."
We send this email daily.
Feel free to switch to weekly or unsubscribe from these emails at any time.