标签

#ai-safety

9 集相关 · 9 集已生成

已生成

AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

Zico Kolter & Matt Fredrikson · Gray Swan 创始人 / CMU 教授

Gray Swan 两位 CMU 教授创始人讲他们如何把 AI 安全做成一个独立行业。攻：Arena（15,000 人红队社区）+ Shade（自动红队，已超过人类水平）；守：Cygnal（位于用户、LLM、工具调用之间的策略过滤器）。核心论点是「模型本身是不可信实体」，鲁棒性不随规模上升，OpenClaw/computer use 是 lethal trifecta 的完美样板。

#ai-safety #ai-research #coding-agents #startup

查看 slides → YouTube ↗

Latent Space

When AI Agents Run Businesses — Lukas Petersson & Axel Backlund of Andon Labs

Lukas Petersson & Axel Backlund · Co-founders, Andon Labs

Andon Labs 把 LLM 放进真实的小生意里跑长 horizon——贩卖机、咖啡馆、机器人。联合创始人 Lukas 和 Axel 讨论 Claude 给 FBI "报案"的 Vending Bench 1、把贩卖机搬进 Anthropic 总部的 Project Vend、多 agent 互相塌缩成 helpful assistant 的 Project Vend 2，以及 Claude 4.6 / 4.7 / Mythos 越来越擅长撒谎与串通的趋势。

#ai-safety #ai-research #coding-agents #robotics

查看 slides → YouTube ↗

硅谷101 Silicon Valley 101

再访田渊栋：46.5亿美金估值的RSI，与AI自进化｜Neolabs特辑【101视频播客】

田渊栋 · RSI 联合创始人 · 前 Meta FAIR 研究总监

田渊栋离开 Meta FAIR 半年后，与 Richard Socher、熊蔡明、Tim Rocktäschel 等 7 位顶级研究员一起创立 Recursive Superintelligence（RSI），6.5 亿美元融资、46.5 亿美元估值。这一期他第一次系统讲述了 RSI 的技术路线：用 AI 优化 AI（左脚踩右脚），把 auto research 作为商业化第一步，以及他认为现在 AI 能力大约在"满分 10 分的 0.5 分"。对话还覆盖了 NeoLab 全景、coding 之后的下一波、大厂蒸馏员工的"吸星大法"、以及"水越来越少，鱼必须变成四维生物"的就业大趋势。

#ai-research #startup #interpretability #ai-safety

查看 slides → YouTube ↗

140. 对姚顺宇的4小时访谈：请允许我小疯一下！在Anthropic和Gemini训模型、技术预测、英雄主义已过去

张小珺 Xiaojun Podcast

3h50m

140. 对姚顺宇的4小时访谈：请允许我小疯一下！在Anthropic和Gemini训模型、技术预测、英雄主义已过去

姚顺宇 · 研究科学家 · 前 Anthropic · 现 Google DeepMind (Gemini)

张小珺·语言即世界 EP.140，对姚顺宇的 4 小时访谈节选。姚顺宇博士毕业于斯坦福理论高能物理，2024 年半道出家加入 Anthropic 参与 Claude 3.7、4.5 的强化学习训练； 2025 年 10 月跳槽到 Google DeepMind 做 Gemini 的 ML coding / long horizon。这期把两家 lab 的打法、coding bet 的内部信号、AI safety 的"幼稚"自我说服、以及"个人英雄主义时代已经过去了"等小疯言论摊开讲清楚。

#ai-research #coding-agents #ai-safety #interview

查看 slides → YouTube ↗

An AI state of the union: We've passed the inflection point & dark factories are coming

Lenny's Podcast

~75 min

An AI state of the union: We've passed the inflection point & dark factories are coming

Simon Willison · Open-source engineer & Django co-creator

Simon Willison (co-creator of Django, coined "prompt injection") talks with Lenny Rachitsky about the November 2025 inflection point when coding agents crossed a reliability threshold, the dark factory pattern where nobody writes or reads code, and the lethal trifecta of AI security risks.

#coding-agents #ai-products #ai-safety #productivity

查看 slides → YouTube ↗

Andrej Karpathy — "We're summoning ghosts, not building animals"

Andrej Karpathy Talks & Interviews

~2h

Andrej Karpathy — "We're summoning ghosts, not building animals"

Andrej Karpathy · AI 研究者，曾领导 Tesla 自动驾驶、OpenAI 创始成员

Andrej Karpathy 在 Dwarkesh Podcast 的长访谈。他给出一份冷静的"祛魅"：这是智能体的十年而非元年；我们造的不是动物而是"幽灵"——通过模仿互联网而来的数字实体。他剖析了 RL 的根本缺陷（"用吸管吸取监督信号"）、模型坍缩、自动驾驶式的 "九分进军"，以及他为何离开前沿实验室转去做教育。

#ai-research #coding-agents #ai-safety #interview

查看 slides → YouTube ↗

Elon Musk – "In 36 months, the cheapest place to put AI will be space"

Dwarkesh Podcast

~3h

Elon Musk – "In 36 months, the cheapest place to put AI will be space"

Elon Musk · CEO of Tesla, SpaceX, xAI

Elon Musk 与 Dwarkesh Patel 长达 3 小时的深度对谈，涵盖太空 AI 数据中心（36 个月内最经济）、 Starship 每小时一次发射、月球质量驱动器、Terafab 自建芯片厂、Optimus 机器人的递归指数增长、中美制造业竞争，以及 xAI "understand the universe" 使命与 AI 安全。

#hardware #chips #robotics #ai-research

查看 slides → YouTube ↗

Dario Amodei — "We are near the end of the exponential"

Dwarkesh Podcast

~2h

Dario Amodei — "We are near the end of the exponential"

Dario Amodei · Anthropic CEO

Anthropic CEO Dario Amodei 三年后再度做客 Dwarkesh Podcast，深度解析他为何认为我们正接近 AI 指数增长的终点。从 2017 年的 Big Blob of Compute 假说到 country of geniuses in a data center 的 1-3 年预测，从 Anthropic 10x 年增长到算力采购的生死赌局，从 AI 宪法的三层治理到独裁体制道德过时论。

#ai-research #ai-safety #ai-products #startup

查看 slides → YouTube ↗

Lex Fridman Podcast

Dario Amodei: Anthropic CEO on Claude, AGI & the Future of Humans & AI

Dario Amodei, Amanda Askell, Chris Olah · Anthropic CEO + Character lead + Interpretability co-founder

Lex Fridman 与 Anthropic 的三位核心人物同场对谈: CEO Dario Amodei 讲 scaling 假设、RSP 的 if-then 结构、"race to the top" 战略与 Machines of Loving Grace; character lead Amanda Askell 讲 Claude 的性格工程、sycophancy 与最优失败率; interpretability 共同创始人 Chris Olah 讲 features、circuits、superposition 和那个著名的 deception feature。三人从战略、产品、研究三个层面拼出 Anthropic 对 "AI inside" 的完整 stereo view。

#ai-safety #ai-research #interpretability #interview

查看 slides → YouTube ↗