Daily AI Brief · 2026-06-19

Products

Product launches and noteworthy updates.

TechCrunch AIRSS·5d ago71

Billionaire Ambani wants AI in every call, app, and home

Reliance is weaving AI into telecom services used by more than 500 million people.

TechCrunch AIRSS·5d ago71

The US says ASML’s top chip tool may be in China. ASML says it isn’t

There's a commercial logic that cuts against the idea that ASML would risk its export license to arm a Chinese customer.

Simon WillisonRSS·6d ago64

Datasette Apps: Host custom HTML applications inside Datasette

Today we launched a new plugin for Datasette, datasette-apps, with this launch announcement post on the Datasette project blog. That post has the what, but I'm going to expand on that a little bit here to provide the why. The TL;DR Datasette Apps are self-contained HTML+JavaScript applications that run in a tightly constrained sandbox hosted on your Datasette application. They can use JavaScript to run read-only SQL queries against data in Datasette, and can run write queries too if you configure them with some stored queries. Here's a very simple example and a more complex custom timeline example - the latter looks like this: Apps are allowed to run JavaScript and render HTML and CSS. They are limited in terms of access - the they run in prevents them from accessing cookies or localStorage and they also have an injected CSP header (thanks to this research) which prevents them from making HTTP requests to outside hosts, preventing a malicious or buggy app from exfiltrating priv

Industry

Funding, policy and market moves.

TechCrunch AIRSS·5d ago71

The CEO of Allbirds’ new AI biz has a plan, but no employees

Call it a startup with a sole founder and a very large seed round, but what's next is less clear.

TechCrunch AIRSS·6d ago71

Source: Elastic agrees to buy CRV-backed DeductiveAI for up to $85M

DeductiveAI, a startup that uses AI to catch and resolve bugs in software, was founded just three years ago.

Papers

Research worth a read.

arXiv cs.LGPaper·6d ago61

Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms

arXiv:2606.19369v1 Announce Type: new Abstract: Estimation-of-distribution algorithms (EDAs) are a powerful class of evolutionary methods for black-box optimization, especially when little is known about the structure of the objective. Whereas classical evolutionary algorithms rely on hand-designed mutation and crossover operators, hard to devise for unknown problem structures, and a source of bias, EDAs sidestep operator design entirely: they fit a probability distribution to the best individuals and sample the next generation from it. EDAs are well established on continuous parameter spaces, but they have not previously been generalized to sparse ones, in which most coefficients of a good solution are exactly zero. Existing sparse black-box optimizers therefore reintroduce exactly what EDAs were designed to avoid: hand-crafted sparsity operators, bi-level schemes alternating between support set and active values, zeroing thresholds, and other baked-in assumptions. We close this gap

arXiv cs.CLPaper·6d ago61

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

arXiv:2606.19354v1 Announce Type: new Abstract: Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning performance of large language models (LLMs) by investing additional compute at inference time. A central component of TTS is the \emph{verifier}, which selects or scores candidate solutions to guide the search process. While prior work has explored the benefit of verification, a fundamental question remains underexplored: \emph{what is the optimal granularity of verification under a given compute budget?} Coarse-grained outcome reward models (ORMs) and fine-grained process reward models (PRMs) represent two extremes, yet neither alone achieves compute-optimality across all regimes. In this paper, we establish a unified theoretical framework, called \textbf{GRACE} (\underline{G}ranularity-\underline{R}egulated \underline{A}daptive \underline{C}omputational \underline{E}fficiency), that characterizes the optimal verification granularity as an explicit fu

推理

arXiv cs.CLPaper·6d ago61

Trustworthy Multi-Agent Systems: Mitigating Semantic Drift with the Argent Signaling Protocol

arXiv:2606.19356v1 Announce Type: new Abstract: When multi-agent LLM systems produce bad answers, not all failures are equal: some answers are grounded in the right material but incomplete, while others are simply ungrounded and should be stopped. Current retry strategies treat both cases identically (try again and hope for the best), leaving human supervisors unable to tell whether a retry was warranted or whether the system should have halted instead. We introduce the Argent Signaling Protocol (ASP), a compact machine-readable header that accompanies every AI-generated response with structured quality signals: certainty (@C), grounding (@G), stochasticity (@S), and an assumption index that classifies the evidentiary basis of each claim. These signals enable a controller to distinguish repairable failures from containment failures and route each case differently. We evaluate ASP in two modes. In standalone mode, a 27-question document-grounded QA benchmark over the Array BioPharm

智能体

arXiv cs.CLPaper·6d ago61

Characterizing Narrative Content in Web-scale LLM Pretraining Data

arXiv:2606.19468v1 Announce Type: new Abstract: The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of narrative features in Dolma, a 3-trillion-token open pretraining corpus. Drawing on narrative theory, we design a framework spanning three core narrative elements (agency, setting, and events) operationalized as 11 interpretable dimensions. After sampling and annotating a diverse set of 400 passages, we finetune and validate NarraBERT, a RoBERTa-based model for fine-grained narrative prediction. We apply NarraBERT to 3M passages, resulting in a new dataset, NarraDolma. We find (i) narrative structure is measurable at scale across extremely heterogeneous data, (ii) we uncover a continuous, multidimensional narrative structure underlying web text, and (iii) narrative qualities are unequally distributed across pretraining sources and topics in

arXiv cs.CLPaper·6d ago61

Reliability without Validity: A Systematic, Large-Scale Evaluation of LLM-as-a-Judge Models Across Agreement, Consistency, and Bias

arXiv:2606.19544v1 Announce Type: new Abstract: LLM-as-a-Judge has become the dominant evaluation paradigm for language models, but judge validation in practice relies on exact-match agreement, a metric that does not correct for chance and systematically overstates discriminative ability. We present the largest systematic evaluation of LLM-as-a-Judge to date: 21 judges from nine providers across MT-Bench, JudgeBench, and RewardBench, evaluated under three protocols (agreement, consistency, bias audit) over 118 runs and approximately 541,000 individual judgments. Four findings emerge, consistent across the full cohort, including the April 2026 frontier: kappa deflation between exact match and Cohen's kappa is universal (33--41 pp on MT-Bench), judge rankings shift by up to 14 positions across benchmarks, high test--retest reliability (>0.95) coexists with severe position bias (>0.10) in two production-deployed judges (instantiating a consistency--bias paradox), and verbosity bias is sm

Big Tech

What the major labs and platforms shipped.

TechCrunch AIRSS·5d ago76

The US banned Anthropic’s Fable 5 release, but the numbers don’t seem to care

Just as last week was ending, the US government forced Anthropic to pull its two newest models, Fable 5 and Mythos 5, citing national security concerns after Amazon researchers allegedly found a way to bypass Fable 5’s guardrails. Cybersecurity researchers have since signed an open letter calling the move dangerous, and Anthropic itself noted the same jailbreaks exist in other models. So is […]

安全

TechCrunch AIRSS·5d ago71

Is the US government’s Anthropic ban accidentally helping the brand?

安全

AI Hot Daily Brief · 2026-06-19

Products

Billionaire Ambani wants AI in every call, app, and home

The US says ASML’s top chip tool may be in China. ASML says it isn’t

Datasette Apps: Host custom HTML applications inside Datasette

Industry

The CEO of Allbirds’ new AI biz has a plan, but no employees

Source: Elastic agrees to buy CRV-backed DeductiveAI for up to $85M

Papers

Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

Trustworthy Multi-Agent Systems: Mitigating Semantic Drift with the Argent Signaling Protocol

Characterizing Narrative Content in Web-scale LLM Pretraining Data

Reliability without Validity: A Systematic, Large-Scale Evaluation of LLM-as-a-Judge Models Across Agreement, Consistency, and Bias

Big Tech

The US banned Anthropic’s Fable 5 release, but the numbers don’t seem to care

Is the US government’s Anthropic ban accidentally helping the brand?