Daily AI Brief - Today in AI

Models

New models, weights and benchmarks.

Introducing computer use in Gemini 3.5 Flash

Quoting Tom MacWright

In the last few months, I've started to see [job applications] that were clearly cowritten by an LLM, link to an LLM-generated portfolio site, which then links to LLM-generated GitHub projects, with purely LLM-generated commit messages. [...] My other reaction is that I don't know anything about these people. They haven't put themselves out there. They haven't said anything true. [...] The perfected, generated, prompted resume is generic and impersonal. It tells me nothing about this person, other than that they use particular tools. — Tom MacWright, Accidental anonymity Tags: careers, ai, tom-macwright, ai-misuse

Hugging FaceRSS·1d ago63

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

Products

Product launches and noteworthy updates.

TechCrunch AIRSS·10h ago71

Facebook rolls out an AI companion app for creators

The new app, which is currently being tested with select creators, will have Facebook's recently-launched AI creator assistant built into it.

TechCrunch AIRSS·11h ago71

Figma adds code layers, support for animations, more AI features in new update

Figma's update adds a new code layer, support for motion and shaders, and the ability to create custom plugins for various tasks using AI.

编程

Industry

Funding, policy and market moves.

TechCrunch AIRSS·6h ago71

AI was supposed to kill engineering jobs, but new data suggests they’re the most resilient

While AI dominates the layoff narrative, engineers are actually making up a larger share of new hires, according to SignalFire data.

TechCrunch AIRSS·6h ago71

The memory chip crunch is paying off for this US company

Revenue quadrupled to $41.45 billion compared with the same period a year ago. The company's profit, meanwhile, rose from $1.88 billion to an incredible $28.2 billion year-over-year.

TechCrunch AIRSS·8h ago71

Companies are scrambling to stop employees from maxing out AI budgets with small tasks

The tokenmaxxing era was brief. We now appear to be entering the era of token rationing.

TechCrunch AIRSS·11h ago71

Agility Robotics plans to go public via SPAC in a $2.5B deal

Agility Robotics, the humanoid robotics startup that spun out of Oregon State University in 2015, expects to generate $620 million in proceeds.

TechCrunch AIRSS·14h ago71

3 days left to save up to $190 on your TechCrunch Founder Summit 2026 pass

You have just 3 days left to save up to $190 on your pass to TechCrunch Founder Summit 2026 before Early Bird rates end on June 26 at 11:59 p.m. PT. Register here.

Papers

Research worth a read.

arXiv cs.LGPaper·1d ago61

Synergizing Physically Constrained MCMC and Chemical-Informed Gaussian Processes for Reaction Network Discovery

arXiv:2606.23757v1 Announce Type: new Abstract: Extracting interpretable governing equations from sparse, noisy chemical time-series data remains difficult because discrete reaction topology and continuous kinetic parameters are tightly coupled. We present PC-MCMC-CIGP, a reproducible gray-box workflow that combines spike-and-slab topology sampling, hard conservation and thermodynamic screening, and a Chemical-Informed Gaussian Process (CIGP) residual model for parameter calibration and experimental design. The methodological contribution is not a new MCMC or GP family in isolation; rather, it is the integration of these components into a physically constrained workflow with explicit uncertainty-aware acquisition choices. On the H2 + Br2 benchmark, the constrained sampler distinguishes elementary radical pathways from deceptive phenomenological fits in our experiments. On styrene epoxidation, the CIGP optimization loop improves final yield by 12.5% over the reported GP-BO baseline. A

arXiv cs.CLPaper·1d ago61

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

arXiv:2606.23937v1 Announce Type: new Abstract: Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B classifiers. Under gold-policy conditioning, a compact structured state improves macro-F1 over raw trajectories by 0.13-0.17 after tuning. We then replace the benchmark-designated policy clause with the top-ranked clause retrieved from decision-time context. Although the exact governing clause is retrieved at rank 1 for only 7% of airline states, the primary 3B classifier obtains macro-F1 0.58 with retrieved clauses versus 0.60 with gold clauses (Delta=-0.02, task-cluster 95% CI [-0.23,+0.21]); mismatched-policy and no-policy controls score 0.32 and 0.21. We do not detect a macro-F1 difference between retrieved and gold clauses in this configuration, although the interval remains too wide to establish non-inferi

arXiv cs.CLPaper·1d ago61

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

arXiv:2606.23948v1 Announce Type: new Abstract: Self-supervised and supervised speech models are increasingly used to investigate which linguistic information their internal representations encode, and at what level of abstraction they encode it. One underexplored phenomenon is consonant cluster reduction (CCR) in African American English (AAE), a widespread phonological process and a source of automatic speech recognition (ASR) disparity. To examine how CCR is represented, we conduct speaker-independent layer-wise probing of wav2vec2-base and Whisper-small using two tasks: segmental reduction detection and segmental restoration of underlying cluster identity. Both models distinguish reduced and canonical forms with high accuracy. Crucially, reduced segments retain cues to their underlying stops, indicating that CCR is encoded as structured gradient phonological variation rather than simple segmental deletion. These results demonstrate structured phonological encoding of AAE CCR patte

arXiv cs.CLPaper·1d ago61

Does My Embedding Reflect That $A = B$? Evaluating Mathematical Equivalence in Embedding Models

arXiv:2606.23959v1 Announce Type: new Abstract: Because mathematics is highly abstract, a single statement can take very different forms depending on what subfield it is framed in. There are many examples where breakthroughs occurred after researchers discovered that a question had already been answered in a different field. At the same time, the growth of new resources related to formalization has increased the need for tools that enable efficient and reliable navigation between mathematical 'languages' (e.g., from Lean to natural language). In this paper, we investigate whether current embedding models capture mathematical equivalence. To do this, we introduce the Mathematically Equivalent but Lexically Different Pairs (MELD) Dataset, a collection of mathematically equivalent statements that are expressed in very different language. We show that current state-of-the-art embedding models tend to group statements by the terminology used to make them instead of the underlying math. Mot

arXiv cs.CLPaper·1d ago61

Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization

arXiv:2606.23989v1 Announce Type: new Abstract: End-to-end large language models (LLMs) produce fluent multi-document summaries but remain prone to hallucination, and the attributions they offer are typically coarse (whole documents or passages) and generated post hoc, leaving each summary statement hard to verify. We revisit the modular Extract--Select--Rewrite paradigm and recast its intermediate representation as the unit of attribution. We present CAMS, a Claim-Anchored Multi-document Summarization framework that (i) extracts atomic claims with token-level provenance from every source document, (ii) clusters equivalent claims across documents while flagging inter-source conflicts, (iii) selects a support-aware and salient subset, and (iv) rewrites the selection into a summary in which every sentence is anchored to a support-checked claim that links back to one or more source spans. Because content is localized before it is realized, the pipeline is attribution-oriented by construc

Big Tech

What the major labs and platforms shipped.

OpenAI BlogRSS·22h ago79

OpenAI and Broadcom unveil LLM-optimized inference chip

OpenAI and Broadcom introduce Jalapeño, a custom AI chip built for LLM inference to improve performance, efficiency, and scale across AI systems.

推理

TechCrunch AIRSS·6h ago71

AI researchers continue to leave Google for its rivals

Top AI researchers Jonas Adler and Alexander Pritzel are leaving Google for Anthropic, following departures from top scientists Noam Shazeer and John Jumper.

TechCrunch AIRSS·13h ago71

OpenAI unveils its first custom chip, built by Broadcom

Named Jalapeño, the new processor was designed specifically for the unique needs of OpenAI's inference systems.

推理

Hugging FaceRSS·12h ago70

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

NVIDIA BlogRSS·1d ago67

NVIDIA and AWS Collaborate to Bring AI to Production at Scale

Building AI systems at scale is demanding, requiring low-latency inference, fast vector search, strong GPU price-performance and infrastructure that can grow without multiplying operational complexity. NVIDIA’s latest work with Amazon Web Services (AWS) addresses each of those constraints. Across Amazon OpenSearch and Amazon EC2, NVIDIA AI infrastructure is giving enterprises more practical paths to deploy […]

推理

AI Hot Daily Brief · 2026-06-24

Models

Introducing computer use in Gemini 3.5 Flash

Quoting Tom MacWright

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

Products

Facebook rolls out an AI companion app for creators

Figma adds code layers, support for animations, more AI features in new update

Industry

AI was supposed to kill engineering jobs, but new data suggests they’re the most resilient

The memory chip crunch is paying off for this US company

Companies are scrambling to stop employees from maxing out AI budgets with small tasks

Agility Robotics plans to go public via SPAC in a $2.5B deal

3 days left to save up to $190 on your TechCrunch Founder Summit 2026 pass

Papers

Synergizing Physically Constrained MCMC and Chemical-Informed Gaussian Processes for Reaction Network Discovery

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

Does My Embedding Reflect That $A = B$? Evaluating Mathematical Equivalence in Embedding Models

Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization

Big Tech

OpenAI and Broadcom unveil LLM-optimized inference chip

AI researchers continue to leave Google for its rivals

OpenAI unveils its first custom chip, built by Broadcom

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

NVIDIA and AWS Collaborate to Bring AI to Production at Scale

Archive

Assine nossa newsletter de IA