Daily AI Brief · 2026-06-12

Models

New models, weights and benchmarks.

Mistral is rumored to be raising €3B at €20B valuation

The funding round would value the company at around €20 billion (about $23.15 billion), nearly double its Series C valuation of €11.7 billion.

融资

TechCrunch AIRSS·13d ago71

Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale

Avataar AI's distilled video model is priced at $0.005 for every second of generation

多模态

Hugging FaceRSS·12d ago70

olmo-eval: An evaluation workbench for the model development loop

Simon WillisonRSS·13d ago64

Claude Fable is relentlessly proactive

After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive. It knows a whole lot of tricks and it will deploy pretty much any of them to get to its goal. I'll illustrate this with an example. I was hacking on Datasette Agent today when I noticed a glitch: a horizontal scrollbar that shouldn't be there in the jump menu chat prompt. I snapped this screenshot: Then I started a fresh claude session in my datasette-agent checkout, dragged in the screenshot and told it: Look at dependencies to help figure out why there is a horizontal scrollbar here I had a hunch the cause was in a dependency of Datasette Agent (likely Datasette itself) and I knew Fable was good at digging into dependency code, either by inspecting installed files in its own virtual environment site-packages or by referencing a local checkout on disk. Telling it to start with dependencies felt like a good bet. I got distracted by a domestic task and wandered away

端侧智能体编程

Industry

Funding, policy and market moves.

TechCrunch AIRSS·12d ago71

SpaceX IPO: Live updates on everything you need to know

TechCrunch has followed SpaceX's start, struggles, and successes from the early days. And we're here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and maybe some who won't), pre-IPO deals, and what's tucked inside its S-1 registration document.

TechCrunch AIRSS·12d ago71

SpaceX IPO: Everything you need to know

TechCrunch AIRSS·13d ago71

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Unlike humanoid robots designed around a fixed form — think Boston Dynamics — Theker's machines are built to be reconfigured.

TechCrunch AIRSS·13d ago71

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

The new round values the physical AI startup that aims to automate heavy engineering and drug design at $41 billion.

Papers

Research worth a read.

Simon WillisonRSS·12d ago64

Quoting Andrew Singleton

Jenny owns a crematorium. John’s propane company gives her a $20 billion investment in return for 5 percent of her operation. Jenny throws $10 billion into the incinerator, then pays John $10 billion to buy propane to burn that money to ashes. John reports that his AI investments have generated $10 billion in revenue this quarter and that he owns 5 percent of a $100 billion business. A reporter from Forbes is assigned to profile John and Jenny, and over the course of his research, he becomes embroiled in a passionate but confusing three-way love affair with them, which eventually turns into a polyamorous common-law marriage. His profile is glowing, but light on financial details. — Andrew Singleton, AI Economics for Dummies Tags: ai

arXiv cs.CLPaper·13d ago61

Shopping Reasoning Bench: An Expert-Authored Benchmark for Multi-Turn Conversational Shopping Assistants

arXiv:2606.12608v1 Announce Type: new Abstract: Conversational shopping assistants now serve hundreds of millions of customers, yet no existing benchmark jointly evaluates the open-ended multi-turn reasoning, domain expertise, and criterion-level quality that real shopping conversations demand. Shopping reasoning is unique among language model applications. Unlike factual question answering or verifiable code generation, it requires balancing subjective preferences, budget constraints, and cross-product trade-offs across multi-turn dialogue, capabilities absent from previous e-commerce and general-purpose benchmarks. We introduce the Shopping Reasoning Bench, an expert-authored benchmark of 525 missions (232 single-turn, 293 multi-turn) with 10863 importance-weighted binary rubrics authored by retail domain experts. These criteria are organized under a taxonomy of five reasoning categories and fifteen subcategories covering diverse demands such as preference refinement, trade-off anal

推理编程

arXiv cs.AIPaper·13d ago61

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

arXiv:2606.12451v1 Announce Type: new Abstract: Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval addresses this by encoding each tool as a virtual token appended to the LLM vocabulary, fine-tuned in two stages (memorization then retrieval SFT) to use the LLM as a retriever, achieving strong performance on standard ToolBench retrieval benchmarks. Yet these benchmarks use verbose, fully-specified queries, and their evaluation applies constrained decoding that restricts outputs to valid token paths, neither reveals whether the model actually understands its tools. We introduce \textbf{ToolSense}, an open-source LLM-powered diagnostic framework that takes any tool catalog as input and automatically generates three benchmarks: a Realistic Retrieval Benchmark (RRB) with queries at three ambig

开源

arXiv cs.AIPaper·13d ago61

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

arXiv:2606.12563v1 Announce Type: new Abstract: Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autonomous optimization systems operate on isolated targets with stateless evaluation. Arbor instead maintains an explicit search tree of scored hypotheses that serves as the shared working memory across agents, evolving with every measurement, treating failures as diagnostic signal that reshapes subsequent exploration, and expanding as prior successes shift the bottleneck distribution. We validate Arbor on full-stack LLM inference optimization, a domain where achieving peak performance has historically required coordinated effort from engineering teams across the application, framework, compiler, kernel, and hardware stack. Arbor pairs an Orchestrator agent, which drives optimization by delegating to Domain Specialists across the inference stack, with a Critic agent that s

推理智能体

arXiv cs.AIPaper·13d ago61

Strategic Decision Support for AI Agents

arXiv:2606.12587v1 Announce Type: new Abstract: Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly reversed: AI agents act on behalf of users, while humans and tools becomes support mechanisms around them. This role reversal brings reliability concerns to the forefront, since agentic errors can be consequential and agent behavior must remain aligned with human goals and constraints. Departing from the classical view of decision support, we revisit its two basic principles, the cost--value tradeoff of seeking support and the role of uncertainty quantification, in a setting where AI agents are the central actors. We propose a framework for strategic decision support for AI agents through an optimization problem that minimizes support usage subject to controlling a counterfactual missed-support error: the probability that the agent acts alone on instances where support

智能体

Big Tech

What the major labs and platforms shipped.

OpenAI BlogRSS·12d ago79

New OpenAI Academy courses for the next era of work

OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.

OpenAI BlogRSS·13d ago79

How Preply combines AI and human tutors to personalize learning

Preply uses OpenAI to launch AI-generated lesson summaries, providing personalised feedback and language learning exercises.

TechCrunch AIRSS·12d ago71

Chinese cybercrime operation that used AI to scam ‘hundreds of thousands of victims’ sued by Google

The tech giant said a group called "Outsider Enterprise" used AI to scam hundreds of thousands of victims, sending 2.5 million text messages over a span of two weeks.

TechCrunch AIRSS·12d ago71

Google sues alleged Chinese cybercrime operation that used AI to send scam texts

The tech giant said a group called "Outsider Enterprise" used AI to scam hundreds of thousands of victims, sending 2.5 million text messages over a span of two weeks.

TechCrunch AIRSS·12d ago71

SpaceX, Anthropic, and OpenAI’s hot IPO summer

The IPO market is back, and it’s not the same companies leading the charge. FAANG had a good run, but a new acronym is taking over: MANGOS — Meta (or Microsoft, depending on who you ask), Anthropic, Nvidia, Google, OpenAI, and SpaceX. Half of that bunch is heading to public markets in the same window, and it’s a stress test for investors, for valuations, and for […]

AI Hot Daily Brief · 2026-06-12

Models

Mistral is rumored to be raising €3B at €20B valuation

Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale

olmo-eval: An evaluation workbench for the model development loop

Claude Fable is relentlessly proactive

Industry

SpaceX IPO: Live updates on everything you need to know

SpaceX IPO: Everything you need to know

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

Papers

Quoting Andrew Singleton

Shopping Reasoning Bench: An Expert-Authored Benchmark for Multi-Turn Conversational Shopping Assistants

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

Strategic Decision Support for AI Agents

Big Tech

New OpenAI Academy courses for the next era of work

How Preply combines AI and human tutors to personalize learning

Chinese cybercrime operation that used AI to scam ‘hundreds of thousands of victims’ sued by Google

Google sues alleged Chinese cybercrime operation that used AI to send scam texts

SpaceX, Anthropic, and OpenAI’s hot IPO summer