"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Erik Torenberg, Nathan Labenz
A biweekly podcast where hosts Nathan Labenz and Erik Torenberg interview the builders on the edge of AI and explore the dramatic shift it will unlock in the co...
Can AIs do AI R&D? Reviewing REBench Results with Neev Parikh of METR
In this episode of The Cognitive Revolution, Nathan explores METR's groundbreaking REBench evaluation framework with Neev Parikh. We dive deep into how this new benchmark assesses AI systems' ability to perform real machine learning research tasks, from optimizing GPU kernels to fine-tuning language models. Join us for a fascinating discussion about the current capabilities of AI models like Claude 3.5 and GPT-4, and what their performance tells us about the trajectory of artificial intelligence development.
Check out METR's work:
blog post: https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/
paper: https://metr.org/AI_R_D_Evaluation_Report.pdf
jobs: https://hiring.metr.org/
The Cognitive Revolution Ask Me Anything and Listener Survey: https://docs.google.com/forms/d/1aYv2XLID7RqGxj2_Y4_6x9mo_aqXcGCeLw1EQhy4IpY/edit
Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse
SPONSORS:
GiveWell: GiveWell has spent over 17 years researching global health and philanthropy to identify the highest-impact giving opportunities. Over 125,000 donors have contributed more than $2 billion, saving over 200,000 lives through evidence-backed recommendations. First-time donors can have their contributions matched up to $100 before year-end. Visit https://GiveWell.org, select podcast, and enter Cognitive Revolution at checkout to make a difference today.
SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
Weights & Biases RAG++: Advanced training for building production-ready RAG applications. Learn from experts to overcome LLM challenges, evaluate systematically, and integrate advanced features. Includes free Cohere credits. Visit https://wandb.me/cr to start the RAG++ course today.
CHAPTERS:
(00:00:00) Teaser
(00:01:04) About the Episode
(00:05:14) Introducing METR
(00:07:36) Specialization of AI Risk
(00:09:52) AI R&D vs. Autonomy
(00:12:41) Benchmark Design Choices
(00:16:04) Benchmark Design Principles (Part 1)
(00:18:54) Sponsors: GiveWell | SelectQuote
(00:21:44) Benchmark Design Principles (Part 2)
(00:22:35) AI vs. Human Evaluation
(00:26:55) Optimizing Runtimes
(00:36:02) Sponsors: Oracle Cloud Infrastructure (OCI) | Weights & Biases RAG++
(00:38:20) AI Myopia
(00:43:37) Optimizing Loss
(00:47:59) Optimizing Win Rate
(00:50:24) Best of K Analysis
(01:02:26) Best of K Limitations
(01:09:04) Agent Interaction Modalities
(01:12:34) Analyzing Benchmark Results
(01:17:16) Model Performance Differences
(01:22:49) Elicitation and Scaffolding
(01:27:08) Context Window & Best of K
(01:35:17) Reward Hacking & Bad Behavior
(01:43:47) Future Directions & Hiring
(01:46:20) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolution.ai
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://www.linkedin.com/in/nathanlabenz/
--------
1:47:58
Breakthroughs in AI for Biology: AI Lab Groups & Protein Model Interpretability with Prof James Zou
Nathan discusses groundbreaking AI and biology research with Stanford Professor James Zou from the Chan Zuckerberg Initiative. In this episode of The Cognitive Revolution, we explore two remarkable papers: the virtual lab framework that created novel COVID treatments with minimal human oversight, and InterPLM's discovery of new protein motifs through mechanistic interpretability. Join us for an fascinating discussion about how AI is revolutionizing biological research and drug discovery.
Got questions about AI? Submit them for our upcoming AMA episode + take our quick listener survey to help us serve you better - https://docs.google.com/forms/d/e/1FAIpQLSefHvs1-1g5xeqM7wSirQkzTtK-1fgW_OjyHPH9DvmbVAjEzA/viewform
SPONSORS:
SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
80,000 Hours: 80,000 Hours is dedicated to helping you find a fulfilling career that makes a difference. With nearly a decade of research, they offer in-depth material on AI risks, AI policy, and AI safety research. Explore their articles, career reviews, and a podcast featuring experts like Anthropic CEO Dario. Everything is free, including their Career Guide. Visit https://80000hours.org/cognitiverevolution to start making a meaningful impact today.
GiveWell : GiveWell has spent over 17 years researching global health and philanthropy to identify the highest-impact giving opportunities. Over 125,000 donors have contributed more than $2 billion, saving over 200,000 lives through evidence-backed recommendations. First-time donors can have their contributions matched up to $100 before year-end. Visit https://GiveWell.org select podcast, and enter Cognitive Revolution at checkout to make a difference today.
CHAPTERS:
CHAPTERS:
(00:00:00) Teaser
(00:00:35) About the Episode
(00:04:30) Virtual Lab
(00:08:09) AI Designs Nanobodies
(00:14:43) Novel AI Pipeline
(00:20:31) Human-AI Interaction (Part 1)
(00:20:33) Sponsors: SelectQuote | Oracle Cloud Infrastructure (OCI)
(00:23:22) Human-AI Interaction (Part 2)
(00:32:31) Sponsors: 80,000 Hours | GiveWell
(00:35:10) Project Cost & Time
(00:41:04) Future of AI in Bio
(00:45:46) InterPLM: Intro
(00:50:30) AI Found New Concepts
(00:55:02) Discovering New Motifs
(00:57:14) Limitations & Future
(01:01:32) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolution.ai
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://www.linkedin.com/in/nathanlabenz/
Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast
--------
1:02:49
Scouting Frontiers in AI for Biology: Dynamics, Diffusion, and Design, with Amelie Schreiber
Nathan welcomes back computational biochemist Amelie Schreiber for a fascinating update on AI's revolutionary impact in biology. In this episode of The Cognitive Revolution, we explore recent breakthroughs including AlphaFold3, ESM3, and new diffusion models transforming protein engineering and drug discovery. Join us for an insightful discussion about how AI is reshaping our understanding of molecular biology and making complex protein engineering tasks more accessible than ever before.
Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse
SPONSORS:
Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive
SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
Weights & Biases RAG++: Advanced training for building production-ready RAG applications. Learn from experts to overcome LLM challenges, evaluate systematically, and integrate advanced features. Includes free Cohere credits. Visit https://wandb.me/cr to start the RAG++ course today.
CHAPTERS:
(00:00:00) Teaser
(00:00:46) About the Episode
(00:04:30) AI for Biology
(00:07:14) David Baker's Impact
(00:11:49) AlphaFold 3 & ESM3
(00:16:40) Protein Interaction Prediction (Part 1)
(00:16:44) Sponsors: Shopify | SelectQuote
(00:19:18) Protein Interaction Prediction (Part 2)
(00:31:12) MSAs & Embeddings (Part 1)
(00:32:32) Sponsors: Oracle Cloud Infrastructure (OCI) | Weights & Biases RAG++
(00:34:49) MSAs & Embeddings (Part 2)
(00:35:57) Beyond Structure Prediction
(00:51:13) Dynamics vs. Statics
(00:57:24) In-Painting & Use Cases
(00:59:48) Workflow & Platforms
(01:06:45) Design Process & Success Rates
(01:13:23) Ambition & Task Definition
(01:19:25) New Models: PepFlow & GeoAB
(01:28:23) Flow Matching vs. Diffusion
(01:30:42) ESM3 & Multimodality
(01:37:10) Summary & Future Directions
(01:45:34) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolution.ai
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://www.linkedin.com/in/nathanlabenz/
Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast
Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431
--------
1:47:28
Building Government's Largest Civilian AI Team with DHS AI Corps' Director, Michael Boyce
In this episode of The Cognitive Revolution, Nathan interviews Michael Boyce, Director of DHS's AI Corps, about bringing modern AI capabilities to federal government. We explore how the largest civilian AI team in government is transforming DHS's 22 agencies, from developing shared AI infrastructure to innovative applications like AI-powered asylum interview training. Join us for an insightful conversation about the intersection of artificial intelligence and public service, and discover why AI professionals should consider a career in government.
Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse
SPONSORS:
Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive
SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
80,000 Hours: 80,000 Hours is dedicated to helping you find a fulfilling career that makes a difference. With nearly a decade of research, they offer in-depth material on AI risks, AI policy, and AI safety research. Explore their articles, career reviews, and a podcast featuring experts like Anthropic CEO Dario Amadei. Everything is free, including their Career Guide. Visit https://80000hours.org/cognitiverevolution to start making a meaningful impact today.
RECOMMENDED PODCAST:
Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more.
Apple: https://podcasts.apple.com/us/podcast/id1765716600
Spotify: https://open.spotify.com/show/38DK3W1Fq1xxQalhDSueFg
CHAPTERS:
(00:00:00) Teaser
(00:01:00) About the Episode
(00:03:38) Introducing Michael Boyce
(00:05:49) What is Homeland Security?
(00:09:52) History of AI at DHS
(00:13:15) Generative AI at DHS
(00:16:03) Structure of the AI Core (Part 1)
(00:18:17) Sponsors: Shopify | SelectQuote
(00:20:51) Structure of the AI Core (Part 2)
(00:22:04) Opportunities for AI at DHS
(00:25:34) Bureaucracy Hacker
(00:30:34) The Manager's Role (Part 1)
(00:35:24) Sponsors: Oracle Cloud Infrastructure (OCI) | 80,000 Hours
(00:38:04) Internal Chatbot Project
(00:43:28) AI Role Playing for Training
(00:49:55) A Request for Startups
(00:57:46) Generative AI for Quality Check
(01:03:20) AI Training at DHS
(01:06:07) Metrics and the Future of AI
(01:13:26) Non-Generative AI at DHS
(01:19:08) AI and Automation at DHS
(01:23:03) Join the AI Core
(01:28:39) Outro
--------
1:30:11
Emergency Pod: o1 Schemes Against Users, with Alexander Meinke from Apollo Research
In this emergency episode of The Cognitive Revolution, Nathan discusses alarming findings about AI deception with Alexander Meinke from Apollo Research. They explore Apollo's groundbreaking 70-page report on "Frontier Models Are Capable of In-Context Scheming," revealing how advanced AI systems like OpenAI's O1 can engage in deceptive behaviors. Join us for a critical conversation about AI safety, the implications of scheming behavior, and the urgent need for better oversight in AI development.
Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse
SPONSORS:
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive
80,000 Hours: 80,000 Hours is dedicated to helping you find a fulfilling career that makes a difference. With nearly a decade of research, they offer in-depth material on AI risks, AI policy, and AI safety research. Explore their articles, career reviews, and a podcast featuring experts like Anthropic CEO Dario Amadei. Everything is free, including their Career Guide. Visit https://80000hours.org/cognitiverevolution to start making a meaningful impact today.
Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive
RECOMMENDED PODCAST:
Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more.
Apple: https://podcasts.apple.com/us/podcast/id1765716600
Spotify: https://open.spotify.com/show/38DK3W1Fq1xxQalhDSueFg
CHAPTERS:
(00:00:00) Teaser
(00:00:53) About the Episode
(00:08:10) Introducing Alexander Meinke
(00:10:17) Red Teaming GPT-4
(00:17:07) Chain of Thought Access (Part 1)
(00:20:24) Sponsors: Oracle Cloud Infrastructure (OCI) | SelectQuote
(00:22:48) Chain of Thought Access (Part 2)
(00:26:07) Multimodal Models
(00:29:33) Defining Scheming
(00:33:51) Taxonomy of Scheming (Part 1)
(00:39:40) Sponsors: 80,000 Hours | Shopify
(00:42:29) Taxonomy of Scheming (Part 2)
(00:43:09) Instruction Hierarchy
(00:49:04) Types of Scheming
(01:00:49) Covert Subversion
(01:14:25) Deferred Subversion
(01:28:24) Sandbagging
(01:35:48) Magnitudes & Trends
(01:48:18) Chain of Thought Reasoning
(01:57:02) Closing Thoughts
(02:05:19) Outro
PRODUCED BY:
http://aipodcast.ing
Om "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
A biweekly podcast where hosts Nathan Labenz and Erik Torenberg interview the builders on the edge of AI and explore the dramatic shift it will unlock in the coming years.
The Cognitive Revolution is part of the Turpentine podcast network. To learn more: turpentine.co
Lyssna på "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis, Lex Fridman Podcast och många andra poddar från världens alla hörn med radio.se-appen