返回每日简报
2026-06-12

AI Hot 每日简报 · 2026-06-12

模型

新模型、开源权重与评测。

产品

值得关注的产品发布与更新。

掘金 AI 热榜Forum·12 天前76

老板:“你是怎么使用 AI 的,真能做到不手写代码?为什么 Codex 在我手里感觉是个智障。。”我:“这样,然后再这样。。”老板直接跪了。

AI HotRSS·12 天前69

Kimi K2.7 Code 开源发布,编码与智能体性能提升

KIMI AI🔥: 一个新的开源"Kimi K2.7 Code"模型已在 API 和 Huggingface 上发布! > 相比 K2.6,编码与智能体性能提升 > 推理效率 > 长时域编码 测试时间 👀

编程
AI HotRSS·12 天前69

Kimi K2.6 Code 开源,改编码与智能体性能

KIMI AI🔥:全新开源 "Kimi K2.6 Code" 模型已在 API 及 HuggingFace 发布! > 较 K2.6 改进编码与智能体性能 > 推理效率 > 长时编码 测试时间 👀

编程
AI HotRSS·12 天前69

Moonshot 发布并开源 Kimi-K2.7-Code 编程模型

Moonshot 发布并开源 Kimi-K2.7-Code 编程模型,相比 K2.6 在多个基准上大幅提升:Kimi Code Bench v2 提高 21.8%,Program Bench 提高 11.0%,MLS Bench Lite 提高 31.5%。推理效率优化,推理 token 使用量降低 30%,指令遵循与长时编码任务成功率提升。即将推出 6 倍高速模式。模型现已通过 Kimi API 和 Kimi Code 开放使用。

编程
AI HotRSS·12 天前69

月之暗面开源 Kimi K2.7 Code 编程模型,预告 6 倍速高速版

月之暗面发布并开源 Kimi K2.7 Code 编程模型。相比 K2.6,长上下文编程指令遵循和长程任务性能提升,过度思考倾向改善,平均 token 消耗减少 30%。Kimi Code Bench v2 提升 21.8%、Program-Bench 提升 11%、MLS Bench Lite 提升 31.5%;Agent 基准提升约 10%。即日起通过 Kimi API 调用,输入 6.5 元/百万 token、输出 27 元、缓存输入 1.3 元。非编程任务仍推荐 K2.6,模型需开启思考模式。预告高速版(输出约 180 Token/s),6 月 15 日可调用,6x 速度仅需 2x 价格。

智能体编程

行业

融资、政策与市场动向。

论文

值得一读的研究与论文。

AI HotRSS·12 天前69

Maxproof 论文发布

6月12日,名为 Maxproof 的论文在 arXiv 上发布,并在 Hacker News 上获得 100 点热度。

AI HotRSS·12 天前69

AI 养马更省心:Hermes Agent 上线 Profile Builder,5 步配置 AI 智能体

Nous Research 于 6 月 11 日发布 Hermes Agent 的 Profile Builder,将分散的命令行配置整合到网页端。用户通过 Dashboard 可在五步内完成智能体角色创建:设置身份名称与描述、选择模型与服务商、开关内置技能、从 Skills Hub 安装技能、配置 MCP 服务器,最后检查预览。技能以 SKILL.md 形式存储,智能体先读取短描述,命中任务再加载全文。MCP 服务器支持 HTTP URL 和本地 stdio 命令,Nous 批准的目录可一键安装并内联提示输入密钥。Hermes Agent 为开源智能体,主打记忆用户习惯并自动构建技能库。

智能体
Simon WillisonRSS·12 天前64

引用安德鲁·辛格顿

Jenny owns a crematorium. John’s propane company gives her a $20 billion investment in return for 5 percent of her operation. Jenny throws $10 billion into the incinerator, then pays John $10 billion to buy propane to burn that money to ashes. John reports that his AI investments have generated $10 billion in revenue this quarter and that he owns 5 percent of a $100 billion business. A reporter from Forbes is assigned to profile John and Jenny, and over the course of his research, he becomes embroiled in a passionate but confusing three-way love affair with them, which eventually turns into a polyamorous common-law marriage. His profile is glowing, but light on financial details. — Andrew Singleton, AI Economics for Dummies Tags: ai

arXiv cs.AIPaper·13 天前61

ToolSense:用于审计LLM中参数化工具知识的诊断框架

arXiv:2606.12451v1 Announce Type: new Abstract: Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval addresses this by encoding each tool as a virtual token appended to the LLM vocabulary, fine-tuned in two stages (memorization then retrieval SFT) to use the LLM as a retriever, achieving strong performance on standard ToolBench retrieval benchmarks. Yet these benchmarks use verbose, fully-specified queries, and their evaluation applies constrained decoding that restricts outputs to valid token paths, neither reveals whether the model actually understands its tools. We introduce \textbf{ToolSense}, an open-source LLM-powered diagnostic framework that takes any tool catalog as input and automatically generates three benchmarks: a Realistic Retrieval Benchmark (RRB) with queries at three ambig

开源
arXiv cs.AIPaper·13 天前61

Arbor:自主代理的认知层树搜索

arXiv:2606.12563v1 Announce Type: new Abstract: Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autonomous optimization systems operate on isolated targets with stateless evaluation. Arbor instead maintains an explicit search tree of scored hypotheses that serves as the shared working memory across agents, evolving with every measurement, treating failures as diagnostic signal that reshapes subsequent exploration, and expanding as prior successes shift the bottleneck distribution. We validate Arbor on full-stack LLM inference optimization, a domain where achieving peak performance has historically required coordinated effort from engineering teams across the application, framework, compiler, kernel, and hardware stack. Arbor pairs an Orchestrator agent, which drives optimization by delegating to Domain Specialists across the inference stack, with a Critic agent that s

推理智能体

大厂

大厂与平台今天的关键动作。