新模型、开源权重与评测。
Ethan Mollick 将 7 个月前的 GPT-5.2 与新款 GLM-5.2 Deep Think Max 进行对比,用同一提示词要求生成可运行于 Twigl 的着色器(描绘哥特塔楼无限城市半淹于风暴海洋)。GLM-5.2 出现了若干错误。此前 Ethan 曾提前体验 GPT-5.2,并展示了 GPT-5.2 Pro 单次生成的该着色器版本。
2026年6月,伯克利RDI发布Agents' Last Exam(ALE)基准,包含1,500余项源于真实工作的任务,覆盖55个非体力职业。对Fable 5、GPT-5.5、Composer 2.5等前沿智能体的测评显示:在最困难层级成功率均为0%;整体任务表现接近,但单任务成本差异巨大(Fable 5约$15.70,GPT-5.5约$3.80,Composer 2.5约$1.33)。CLI子集ALE-CLI最佳通过率仅25.2%。主要失败模式是智能体未验证输出即宣称完成。数据集、代码及CLI子集已开源。
这款聊天机器人仍然是全球最受欢迎的AI助手,拥有超过11亿月活跃用户,其次是Gemini(6.62亿)和Claude(2.45亿)。
值得关注的产品发布与更新。
Respond.io是值得关注的马来西亚初创公司之一,它利用AI代理处理大量客户咨询,并按对话次数而非座位数收费。
融资、政策与市场动向。
WordPress VIP的最新调查显示,尽管企业越来越认为AI搜索是一个重要的引流渠道,但消费者对AI生成的答案仍持警惕态度。
TechCrunch从早期就关注了SpaceX的起步、挣扎和成功。我们同样关注接下来会发生什么。这份关于SpaceX上市的报道包括了谁可能受益(也许也有人不会),上市前的交易,以及其S-1注册文件中隐藏的内容。
司法部表示,五角大楼需要xAI继续使用其未经许可的燃气轮机。
Plaud正试图在充满AI驱动的会议记事本的拥挤市场中占据一席之地。
与许多科技行业同行不同,他们因需要优化以充分利用AI而裁掉数千名员工,Robinhood首席执行官Vlad Tenev在其裁员说明中明显未提及AI。
值得一读的研究与论文。
arXiv:2606.14838v1 Announce Type: new Abstract: How to define a good explanation is a long-standing philosophical debate which has found recent renewed interest in the context of AI outputs. Explainability is crucial for AI adoption in many contexts, but in order to produce good explanations of AI systems, we must first have an understanding of what good explanations are. In this paper we propose a definition inspired by the notion of counterfactual explanations, however we argue that one must also take into account the interlocutor's prior beliefs in each fact that could be offered in an explanation. We explore the ramifications of this definition for AI explainability and, in particular, why LLM outputs are difficult to produce good explanations for.
arXiv:2606.14885v1 Announce Type: new Abstract: Agentic search over large corpora relies on retriever-mediated interfaces (e.g., BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expose evidence only as ranked results or bounded document views, limiting agents' ability to reorganize material and verify constraints across documents. Direct Corpus Interaction (DCI) addresses this limitation by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands become slow and unstable as the corpus grows, degrading performance and efficiency. We introduce DR-DCI, a retriever-steered DCI framework that treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This d
大厂与平台今天的关键动作。
OpenAI 推出部署模拟技术,一种在部署前使用真实对话数据预测人工智能模型行为的方法,以提高安全性与评估准确性。
英国政府与Google DeepMind合作,开发一个新的AI驱动原型,旨在加快住房决策。
自周五其股票开始交易以来,SpaceX的估值已增加1万亿美元。
Google 发布了 Android 17 和 Wear OS 7,引入了新的多任务功能、家长控制、安全工具和智能手表升级。此次发布还伴随着 Pixel Drop,将 Google 最新的 AI 模型带入其设备中。
自SpaceX股票周五开始交易以来,其估值已增加1万亿美元。