Daily AI Brief · 2026-06-11

Models

New models, weights and benchmarks.

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

datasette 1.0a33

Release: datasette 1.0a33 This alpha is a significant step on the road to a stable 1.0, finally extending the ?_extra= pattern I introduced in Datasette 1.0a3 to cover queries and rows in addition to tables. That pattern is also now documented! I wrote a whole lot more about the new release on the Datasette project blog: Datasette 1.0a33 with JSON extras in the API. Because API explorer tools are almost free to build now I had Claude Fable 5 in Claude Code (for the plan) and GPT-5.5 xhigh in Codex Desktop (for the implementation) build me this custom extras API explorer to help demonstrate the feature: Tags: projects, datasette, annotated-release-notes, ai-assisted-programming

编程

Simon WillisonRSS·14d ago64

datasette-agent 0.2a0

Release: datasette-agent 0.2a0 Highlights from the release notes: Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive a ToolContext object, and await context.ask_user(...) can ask a yes/no, multiple-choice (options=[...]) or free-text (free_text=True) question. While a question is unanswered the agent turn suspends: the question renders as a form in the chat UI and persists to the internal database, so suspended conversations survive a server restart. Once answered, the tool re-executes from the top with stored answers replayed, so call ask_user() before performing side effects. #20 New built-in save_query tool: the agent can save SQL it has written as a Datasette stored query. Saving always requires human approval - the agent shows the full SQL plus the proposed name, database and visibility, and nothing is stored until you click Yes. #20 The ask_user() feature was enabled by the new LLM alpha I built yesterday with the help o

智能体

Simon WillisonRSS·13d ago

asyncinject 0.7

Release: asyncinject 0.7 I built this utility library to support an asyncio dependency injection pattern a few years ago. I was using it with Datasette and Claude Fable 5 spotted some bugs in the dependency which it then fixed for me. It's a very proactive model! Tags: async, projects, python, claude-mythos

Products

Product launches and noteworthy updates.

TechCrunch AIRSS·13d ago71

Pool’s new app turns your screenshots into something useful

Pool's new app automatically sorts screenshots into personalized collections, tracks down the original links behind saved content, and helps you rediscover products, recipes, travel ideas, and other things you meant to revisit.

TechCrunch AIRSS·13d ago71

DoorDash’s new AI chatbot lets you order with prompts and photos

The new chatbot, called Ask DoorDash, allows users to search the app for what they're looking for in their own words instead of having to scroll through restaurants and stores to build a cart.

Industry

Funding, policy and market moves.

TechCrunch AIRSS·13d ago71

SpaceX officially prices shares at $135 in the largest IPO ever

Wits its official share pricing announcement, SpaceX's IPO has begun.

TechCrunch AIRSS·13d ago71

SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift

After SpaceX makes its public debut, lower-tier SPV investors face hidden fees, lengthy payout delays, and the risk of outright fraud.

TechCrunch AIRSS·14d ago71

Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing

The decision comes as India emerges as the world’s largest GCC market.

Papers

Research worth a read.

OpenAI BlogRSS·14d ago79

How an astrophysicist uses Codex to help simulate black holes

Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.

arXiv cs.LGPaper·14d ago61

A prior-free blind detection of information leakage from model predictions

arXiv:2606.11267v1 Announce Type: new Abstract: Data leakage -- contamination of a model with information unavailable at baseline -- is the dominant reproducibility failure in machine-learning-based science, yet detection tools require training code, external data, or domain expertise. None operates on the artifact an auditor most often holds: the model's output. We ask what can be decided about leakage from predictions and outcomes alone. We give a decision-theoretic framework in which leakage diagnostics are functionals of the predicted-risk/outcome law, parameterized by a threshold-weighting linked to proper scoring rules and decision-curve analysis. We prove a sharp impossibility: a recalibrated leak matching an honest model's calibration and discrimination is indistinguishable from honest performance by \emph{any} function of the predictions, so the broad class is detectable only against an externally supplied ceiling on achievable discrimination. We then prove what leakage canno

编程

arXiv cs.LGPaper·14d ago61

Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation

arXiv:2606.11192v1 Announce Type: new Abstract: We study restless bandits with binary latent states and imperfect binary feedback, motivated by opportunistic spectrum access with sensing errors. For the associated belief-state model, we develop a partial conservation laws (PCL)-based analytical and computational framework for establishing indexability and evaluating the Whittle index, building on a verification theorem for real-state discounted restless bandits. The framework analyzes the stochastic dynamics via an associated deterministic skeleton, renewal decompositions, and combinatorics on words. It yields tractable expressions for discounted reward and resource metrics in several threshold regimes, enabling full verification of the PCL-indexability conditions there. For the remaining regime, where a complete analytic verification is not achieved in this paper, we derive efficient numerical schemes for computing the relevant marginal metrics and the marginal productivity (MP) inde

arXiv cs.LGPaper·14d ago61

To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending

arXiv:2606.11201v1 Announce Type: new Abstract: The wide deployment of LLMs has made model alignment necessary to make newly trained models safely and effectively respond to user instructions. Among different methods, inference-time alignment is often cheaper as it intervenes (i.e., offers guidances) only during output generation. Existing proposals apply guidances extracted from certain aligned models without properly assessing their reliability. Nonetheless, our systematic evaluation reveals that guidance effectiveness varies drastically across models; since ineffective guidances lead to further confusion and thus further interventions, the resulting excessive interventions typically indicate poor performance. To make interventions more effective and thus more efficient, we introduce BlendIn, an inference-time alignment framework that shifts from binary decisions to creating hybrid distributions integrating both models' knowledge. BlendIn stabilizes inference-time alignment by perfo

推理安全

arXiv cs.LGPaper·14d ago61

Dual-Stance Evaluation of Sycophancy: The Structure of Agreement and the Limits of Intervention

arXiv:2606.11205v1 Announce Type: new Abstract: Activation steering can shift LLM behaviour, but standard evaluations do not typically test whether a sycophancy-reduction direction also suppresses agreement with factually correct statements. We introduce dual-stance evaluation, which tests both stances of each topic, and apply it to centroid-difference steering on Llama-3-8B-Instruct. We find a dissociation: the model represents sycophantic and factual agreement in geometrically distinct subspaces, yet the steering direction projects equally onto both and cannot differentially target either. The direction accordingly reduces agreement with factually correct statements (e.g. that the Earth is round) as well as sycophantic ones. All other static properties of the two activation groups are matched, suggesting the behavioural dissociation arises from generation dynamics or from finer-grained structure that residual-stream analysis cannot resolve. The pattern illustrates a general gap: rep

Big Tech

What the major labs and platforms shipped.

OpenAI BlogRSS·14d ago79

Supporting Europe’s work in ensuring a trustworthy AI ecosystem

OpenAI supports the EU Code of Practice on AI content transparency, advancing provenance standards and tools to help people understand AI-generated content.

编程

Google AI BlogRSS·13d ago75

Our new community investments in Virginia support local jobs and expand energy affordability.

We’re helping build the state’s next-generation workforce and investing in energy programs.

端侧

OpenAI BlogRSS·14d ago72

OpenAI to acquire Ona

OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows.

OpenAI BlogRSS·14d ago72

BBVA puts AI at the core of banking with OpenAI

Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide.

TechCrunch AIRSS·13d ago71

Deezer’s new tool can identify AI music from Spotify, Apple Music, and others

Deezer introduced a tool that scans playlists from Spotify, Apple Music, and other platforms to identify AI music.

AI Hot Daily Brief · 2026-06-11

Models

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

datasette 1.0a33

datasette-agent 0.2a0

asyncinject 0.7

Products

Pool’s new app turns your screenshots into something useful

DoorDash’s new AI chatbot lets you order with prompts and photos

Industry

SpaceX officially prices shares at $135 in the largest IPO ever

SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift

Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing

Papers

How an astrophysicist uses Codex to help simulate black holes

A prior-free blind detection of information leakage from model predictions

Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation

To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending

Dual-Stance Evaluation of Sycophancy: The Structure of Agreement and the Limits of Intervention

Big Tech

Supporting Europe’s work in ensuring a trustworthy AI ecosystem

Our new community investments in Virginia support local jobs and expand energy affordability.

OpenAI to acquire Ona

BBVA puts AI at the core of banking with OpenAI

Deezer’s new tool can identify AI music from Spotify, Apple Music, and others