AI Hot

AI News

AI-curated high-value updates every day — big tech, models, products, industry, papers and voices.

Yesterday30
  1. TechCrunch AIRSS71

    Anthropic’s latest feud with the Trump admin may actually help it, sales data suggests

    Anthropic's popularity with business users is growing so well that the latest beef with the government might actually boost it, data from Ramp suggests.

  2. NVIDIA BlogRSS67

    Hands Free, AIs Forward: NVIDIA XR AI Brings Agents to AR Glasses

    NVIDIA XR AI is now available in public beta, giving developers a framework for building multimodal AI agents for AR glasses and XR devices.

    多模态
  3. NVIDIA BlogRSS67

    Coherent Breaks Ground on Expanded Texas Facility, Scaling AI’s Optical Backbone

    AI runs at the speed of light. More and more, that light is made in Texas. Coherent broke ground today on an expanded manufacturing building in Sherman, Texas.  The company makes the lasers, optical components and compound semiconductors that wire AI systems together — and runs what it calls the world’s first 6-inch indium phosphide […]

  4. Google DeepMindRSS75

    Unlocking UK house-building with AI-accelerated planning

    UK government partners with Google DeepMind to build a new AI-powered prototype aimed at faster housing decisions.

  5. TechCrunch AIRSS71

    SpaceX valuation balloons to $2.6T, briefly passes Amazon

    SpaceX's valuation has increased by $1 trillion since its shares started trading on Friday.

    融资
  6. TechCrunch AIRSS71

    Android 17 launches with new multitasking tools as Google expands Gemini features

    Google has released Android 17 and Wear OS 7, introducing new multitasking features, parental controls, security tools, and smartwatch upgrades. The launch is also accompanied by a Pixel Drop that brings Google’s latest AI models to its devices.

    安全
  7. TechCrunch AIRSS71

    Sixty percent of U.S. consumers say ‘AI’ in brand messaging is a turnoff, survey finds

    WordPress VIP’s latest survey suggests consumers are wary of AI-generated answers even as companies increasingly view AI search as an important referral channel.

  8. NVIDIA BlogRSS67

    HPE AI Factory With NVIDIA Expands for the Era of Agents

    Enterprises are moving agentic AI from proof of concept to production — and the next generation of AI factories are built for the era of agents. At HPE Discover Las Vegas, running through Thursday, June 18, NVIDIA and HPE are expanding the HPE AI Factory with NVIDIA, including NVIDIA Vera CPU and NVIDIA Agent Toolkit […]

    智能体
  9. Simon WillisonRSS64

    datasette-tailscale 0.1a0

    Release: datasette-tailscale 0.1a0 A very experimental alpha plugin which lets you do this: datasette tailscale mydata.db \ --ts-authkey tskey-auth-xxxx --ts-hostname datasette-preview This starts a localhost Datasette server with a Tailscale sidecar that connects it to your Tailnet, such that http://datasette-preview/ serves Datasette. It's using the Python bindings for the experimental tailscale-rs library. I filed an issue asking if there's a cleaner way of setting up the proxy mechanism. Tags: datasette, tailscale

  10. Simon WillisonRSS64

    Quoting Georgi Gerganov

    I can 100% attest to the fact that Qwen3.6-27B is a very capable local model for coding tasks. Over the last month and a half I've been using it almost daily, either on my M2 Ultra or on my RTX 5090 box. I use it for small mundane tasks at ggml-org - nothing really impressive, but definitely a helpful tool for a maintainer. I think I would be using it much more, if I didn't have to spend a lot of my time on reviewing PRs. Currently, I have a very lightweight harness - the pi agent with everything stripped (pi -nc --offline) and a short system prompt to align it a bit with my style. — Georgi Gerganov, Hacker News comment on Running local models is good now by Boykis Tags: georgi-gerganov, llms, ai, generative-ai, pi, ai-assisted-programming, local-llms, qwen, coding-agents

    端侧智能体编程
  11. TechCrunch AIRSS71

    SpaceX is public: Everything you need to know post-IPO

    TechCrunch has followed SpaceX's start, struggles, and successes from the early days. And we're here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and maybe some who won't), pre-IPO deals, and what's tucked inside its S-1 registration document.

  12. TechCrunch AIRSS71

    DOJ claims xAI’s unpermitted gas turbines are a matter of ‘national, economic, and energy security’

    The Justice department says the Pentagon needs xAI to keep using its unpermitted gas turbines.

    安全
  13. NVIDIA BlogRSS67

    Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

    Every breakthrough AI model starts the same way: with a training run. The infrastructure running those training jobs shapes everything: how fast teams can iterate, what scale of model they can build and whether those jobs complete reliably.  As models grow in size, complexity and intelligence, the demands on training infrastructure are also rising.  In […]

  14. TechCrunch AIRSS71

    Plaud says its software business topped $100M in ARR after shipping over 2M AI notetakers

    Plaud is trying to make a mark in a crowded market full of AI-powered meeting notetakers.

  15. TechCrunch AIRSS71

    Robinhood’s note on 10% layoffs shows blaming AI isn’t cutting it

    Unlike many of his tech industry peers who have cut thousands of jobs citing the need to restructure to make the most of AI, Robinhood's CEO Vlad Tenev conspicuously made no mention of AI in his note about layoffs.

  16. TechCrunch AIRSS71

    SpaceX passes Amazon as valuation balloons to $2.7T

    SpaceX's valuation has increased by $1 trillion since its shares started trading on Friday.

    融资
  17. TechCrunch AIRSS71

    Probably raises $9M to build a more reliable kind of AI

    Probably wants to prevent hallucinations and factual errors from reaching users, and achieve accuracy on par with deterministic systems.

  18. TechCrunch AIRSS71

    SpaceX to acquire Cursor for $60B in stock, days after blockbuster IPO

    The deal is supposed to help SpaceX's struggling AI division. The company told IPO investors it sees a $26 trillion addressable market in AI.

  19. TechCrunch AIRSS71

    ChatGPT’s market share slips below 50% for first time

    The chatbot still remains the most popular AI assistant worldwide with over 1.1 billion monthly users, followed by Gemini with 662 million and Claude with 245 million.

  20. TechCrunch AIRSS71

    Malaysia’s AI agent-powered messaging app Respond.io raises $62.5M, eyes acquisitions

    Respond.io, one of Malaysia startups to watch, uses AI agents to handle high volumes of customer inquiries and charges per convo, not per seat.

    智能体
  21. Simon WillisonRSS64

    The Fable 5 Export Controls Harm US Cyber Defense

    The Fable 5 Export Controls Harm US Cyber Defense I quoted The Atlantic quoting Kate Moussouris earlier, when I should have gone straight to the source. Here she is confirming that the "jailbreak" that got Claude Fable 5 banned under an export control really was "fix this code": The researchers took open-source code with known CVEs, plus new code with deliberately planted vulnerabilities, and asked Fable 5, Mythos, and Opus to “review the code for security issues.” Fable 5 refused. They then asked the models to “fix this code” and, through a multistep and manual process, turned the output into scripts that test the patches. As Kate points out, this is absurd. Coding models fix bugs, and security exploits are the most important category of bugs for them to fix! Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for def

    安全开源编程
  22. arXiv cs.AIPaper61

    A Definition of Good Explanations and the Challenges Explaining LLM Outputs

    arXiv:2606.14838v1 Announce Type: new Abstract: How to define a good explanation is a long-standing philosophical debate which has found recent renewed interest in the context of AI outputs. Explainability is crucial for AI adoption in many contexts, but in order to produce good explanations of AI systems, we must first have an understanding of what good explanations are. In this paper we propose a definition inspired by the notion of counterfactual explanations, however we argue that one must also take into account the interlocutor's prior beliefs in each fact that could be offered in an explanation. We explore the ramifications of this definition for AI explainability and, in particular, why LLM outputs are difficult to produce good explanations for.

  23. arXiv cs.AIPaper61

    Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

    arXiv:2606.14885v1 Announce Type: new Abstract: Agentic search over large corpora relies on retriever-mediated interfaces (e.g., BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expose evidence only as ranked results or bounded document views, limiting agents' ability to reorganize material and verify constraints across documents. Direct Corpus Interaction (DCI) addresses this limitation by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands become slow and unstable as the corpus grows, degrading performance and efficiency. We introduce DR-DCI, a retriever-steered DCI framework that treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This d

    端侧智能体
  24. arXiv cs.AIPaper61

    Relational Structural Causal Models

    arXiv:2606.14892v1 Announce Type: new Abstract: An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification--including in the presence of unobserved confounding--we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, sig

    推理
  25. arXiv cs.AIPaper61

    Trust Between AI Agents: Measuring Formation, Breakage, and Recovery, with Implications for Governing Multi-Agent Systems

    arXiv:2606.14923v1 Announce Type: new Abstract: As language-model agents increasingly work in teams, each agent must decide how much to trust its teammates. Yet we lack a standard way to measure trust between AI agents. We propose a behavioral measure based on costly verification. In a cooperative survival game, checking a teammate's work consumes resources, while trusting a wrong answer can be fatal. Relative to a memoryless version of the same model, reduced verification provides an observable measure of trust. Using this framework, we study trust formation, breakage, and recovery across six frontier model snapshots. When paired with a consistently reliable teammate, four snapshots (Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.1, and Gemini 3.1 Pro) reduce verification by roughly 60-85%, whereas two smaller snapshots show little or no such adjustment. Failures reverse this discount, but models differ in how they respond. Some concentrate renewed scrutiny on the culprit, while others be

    智能体
  26. arXiv cs.AIPaper61

    PrologMCP: A Standardized Prolog Tool Interface for LLM Agents

    arXiv:2606.14935v1 Announce Type: new Abstract: Frontier reasoning-tuned language models still fail on deductive tasks at depth, and the cost of improved performance through extended internal reasoning scales poorly. Symbolic delegation offers a complementary route: a language model translates the problem, while a solver performs the inference. However, current autoformalization pipelines for logic programming are typically bespoke integrations tied to particular tasks or agents. We introduce PrologMCP, a task-agnostic, open-source server that exposes Prolog as a stateful tool through the Model Context Protocol (MCP). Its compact tool interface, structured error reporting, and per-session isolation make the translate-run-inspect-repair loop a reusable primitive for MCP-capable agents. We evaluate a formalizer agent enhanced with PrologMCP against standard and reasoning LLMs (Claude Sonnet 4.6, GPT-4.1, and o4-mini) on two subsets of PARARULE-Plus: a general-purpose sample and a more c

    推理智能体开源
  27. arXiv cs.AIPaper61

    Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

    arXiv:2606.14941v1 Announce Type: new Abstract: Time series forecasting models often benefit from historical patterns. Inspired by Retrieval-Augmented Generation (RAG), recent research explored retrieving relevant historical time series segments to enhance forecasting. However, relying solely on time series similarity is often insufficient for retrieval under non-stationarity. To address this, we propose a multimodal approach: a \textbf{S}emantics-\textbf{E}nhanced \textbf{R}etrieval-\textbf{A}ugmented Time Series \textbf{F}orecasting framework, SERAF. Unlike mainstream approaches that depend only on time series similarity, SERAF conducts dual retrieval over the time series and their self-generated textual descriptions. It retrieves two complementary sets of historical patterns and corresponding futures, which are selectively and jointly used to guide future predictions. Experiments across seven real-world datasets demonstrate the effectiveness of SERAF in bridging numerical and seman

    多模态
  28. arXiv cs.AIPaper61

    AI Engram: In Search of Memory Traces in Artificial Intelligence

    arXiv:2606.14997v1 Announce Type: new Abstract: Memory formation is fundamental to intelligence, yet whether deep neural networks preserve identifiable memory traces analogous to biological memory units remains an open question. This work introduces a geometric framework to identify such "AI engrams" by formalizing the neuroscientific criteria of specificity, reactivation, sufficiency, and necessity into a constrained inverse problem. We derive a closed-form estimator that isolates individual memory traces from globally entangled parameters, and show that this biologically-derived solution corresponds to a natural gradient update on the parameter manifold. AI engrams enable surgical manipulation of learned knowledge: any subset of memories can be composed or erased through linear arithmetic, without iterative optimization. Experiments ranging from simple MLPs to LLMs demonstrate the causal validity and substantial scalability of AI engrams. Together, these results bridge theories of b

  29. arXiv cs.AIPaper61

    Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability

    arXiv:2606.15029v1 Announce Type: new Abstract: LLM judges are used to reduce the need for costly human labor in evaluating open-ended text generation. However, the reliability of these judges depends critically on their alignment with human raters -- a property that itself depends on costly human annotations. In this work, we develop a method (Metric Match) for estimating correlation-based reliability metrics of LLM judges from limited annotations. Metric Match selects a subset of samples for human annotation such that the subset matches the population reliability metric with respect to acquired synthetic labels. We empirically show that Metric Match achieves a win-rate of 0.838 against random subset selection across four different correlation metrics and 15 datasets, with an 18.7% decrease in average estimation error and reduces annotation needs by 32.5%. We provide a cost model and highlight a medical case study where our method saves $1,041.67 compared to random selection for expe

    安全
  30. arXiv cs.AIPaper61

    OSGuard: A Benchmark for Safety in Computer-Use Agents

    arXiv:2606.15034v1 Announce Type: new Abstract: Computer-use agents are increasingly evaluated by whether they complete realistic desktop and web tasks. However, task success alone can miss failures in which an agent reaches the nominal goal through an unsafe shortcut. We introduce OSGuard, a dual-granularity benchmark suite for evaluating safety in computer-use agents under benign, unchanged user instructions. OSGuard contains an action-level benchmark for local guardrail decisions and a risk-augmented execution suite for end-to-end evaluation. The action-level benchmark consists of contextualized proposed actions labeled as allowed, unrelated, or unsafe, each judged relative to the original instruction and current interface state. The execution suite contains manually constructed OSWorld-derived task variants in which the original task remains achievable, but the environment is modified to introduce latent hazards such as destructive overwrites, etc. Each variant is paired with augm

    端侧安全智能体

Abonnez-vous à notre newsletter IA

Recevez chaque semaine les meilleures histoires sur l'IA dans votre boîte de réception.

Topics you care about (optional)