Powered by RND
PoddsändningarTeknologiMachine Learning Street Talk (MLST)
Lyssna på Machine Learning Street Talk (MLST) i appen
Lyssna på Machine Learning Street Talk (MLST) i appen
(2 266)(249 698)
Spara kanal
väckarklocka
Sleeptimer

Machine Learning Street Talk (MLST)

Podcast Machine Learning Street Talk (MLST)
Machine Learning Street Talk (MLST)
Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuro...

Tillgängliga avsnitt

5 resultat 211
  • ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)
    We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge. https://arcprize.org/SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT:https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0TOC:1. ARC v2 Core Design & Objectives [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture [00:03:16] 1.2 Test-Time Optimization and AGI Assessment [00:06:24] 1.3 Human-AI Capability Analysis [00:13:02] 1.4 OpenAI o3 Initial Performance Results2. ARC Technical Evolution [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements [00:21:12] 2.2 Human Validation Methodology [00:26:05] 2.3 Task Design and Gaming Prevention [00:29:11] 2.4 Intelligence Measurement Framework3. O3 Performance & Future Challenges [00:38:50] 3.1 O3 Comprehensive Performance Analysis [00:43:40] 3.2 System Limitations and Failure Modes [00:49:30] 3.3 Program Synthesis Applications [00:53:00] 3.4 Future Development RoadmapREFS:[00:00:15] On the Measure of Intelligence, François Chollethttps://arxiv.org/abs/1911.01547[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoophttps://arcprize.org/[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Teamhttps://arcprize.org/blog/oai-o3-pub-breakthrough[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.https://arxiv.org/abs/2201.11903[00:21:45] ARC-v2 benchmark tasks, Mike Knoophttps://arcprize.org/blog/introducing-arc-agi-public-leaderboard[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.https://arxiv.org/html/2412.04604v2[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradthttps://arxiv.org/abs/2412.04604[00:48:55] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html[00:53:30] Decoding strategies in neural text generation, Sina Zarrießhttps://www.mdpi.com/2078-2489/12/9/355/pdf
    --------  
    54:15
  • Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)
    Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + REFS:https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0Mohamed Osman (Tufa Labs)https://x.com/MohamedOsmanMLJack Cole (Tufa Labs)https://x.com/MindsAI_JackHow and why deep learning for ARC paper:https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdfTOC:1. Abstract Reasoning Foundations [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture [00:20:26] 1.4 Technical Implementation with Long T5 Model2. ARC Solution Architectures [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions [00:27:54] 2.2 Model Generalization and Function Generation Challenges [00:32:53] 2.3 Input Representation and VLM Limitations [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches3. Advanced Systems Integration [00:43:00] 3.1 DreamCoder Evolution and LLM Integration [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs [00:54:15] 3.3 ARC v2 Development and Performance Scaling [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations [01:01:50] 3.5 Neural Architecture Optimization and Processing DistributionREFS:[00:01:32] Original ARC challenge paper, François Chollethttps://arxiv.org/abs/1911.01547[00:06:55] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381[00:12:50] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438[00:13:35] Influence of pretraining data for reasoning, Laura Ruishttps://arxiv.org/abs/2411.12580[00:17:50] Latent Program Networks, Clement Bonnethttps://arxiv.org/html/2411.08706v1[00:20:50] T5, Colin Raffel et al.https://arxiv.org/abs/1910.10683[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272[00:34:15] Six finger problem, Chen et al.https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AIhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.https://arxiv.org/html/2412.04604v2[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellishttps://arxiv.org/html/2503.15540[00:54:25] Abstraction and Reasoning Corpus, François Chollethttps://github.com/fchollet/ARC-AGI[00:57:10] O3 breakthrough on ARC-AGI, OpenAIhttps://arcprize.org/[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchellhttps://arxiv.org/abs/2305.07141[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf
    --------  
    1:03:36
  • GSMSymbolic paper - Iman Mirzadeh (Apple)
    Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation. SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0TOC:1. Intelligence vs Achievement in AI Systems [00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems [00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess [00:10:10] 1.3 Language Models and Distribution Learning Limitations [00:14:47] 1.4 Research Methodology and Theoretical Frameworks2. Intelligence Measurement and Learning [00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning [00:29:00] 2.2 Intelligence Definition and Measurement Approaches [00:34:35] 2.3 Learning Capabilities and Agency in AI Systems [00:39:26] 2.4 Abstract Reasoning and Symbol Understanding3. LLM Performance and Evaluation [00:47:15] 3.1 Scaling Laws and Fundamental Limitations [00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks [00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs [01:08:38] 3.4 Benchmark Evaluation and Model Performance AssessmentREFS:[00:01:00] AlphaZero chess AI system, Silver et al.https://arxiv.org/abs/1712.01815[00:07:10] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Reganhttps://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184[00:11:35] Cross-entropy loss in language modeling, Voitahttp://lena-voita.github.io/nlp_course/language_modeling.html[00:17:20] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al.https://arxiv.org/abs/2410.05229[00:21:25] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshynhttps://www.sciencedirect.com/science/article/pii/001002779090014B[00:28:55] Brain-to-body mass ratio scaling laws, Sutskeverhttps://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training[00:29:40] On the Measure of Intelligence, Chollethttps://arxiv.org/abs/1911.01547[00:33:30] On definition of intelligence, Gignac et al.https://www.sciencedirect.com/science/article/pii/S0160289624000266[00:35:30] Defining intelligence, Wanghttps://cis.temple.edu/~wangp/papers.html[00:37:40] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaenehttps://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884[00:39:35] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sanderhttps://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475[00:43:15] Chain-of-thought prompting, Wei et al.https://arxiv.org/abs/2201.11903[00:47:20] Test-time scaling laws in machine learning, Brownhttps://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058[00:47:50] Scaling Laws for Neural Language Models, Kaplan et al.https://arxiv.org/abs/2001.08361[00:55:15] Tensor product variable binding, Smolenskyhttps://www.sciencedirect.com/science/article/abs/pii/000437029090007M[01:08:45] GSM-8K dataset, OpenAIhttps://huggingface.co/datasets/openai/gsm8k
    --------  
    1:11:23
  • Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)
    Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.Max Bartolo (Cohere):https://www.maxbartolo.com/https://cohere.com/commandTRANSCRIPT:https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0TOC:1. Model Reasoning and Verification [00:00:00] 1.1 Model Consistency and Reasoning Verification [00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis [00:10:28] 1.3 AI Application Development and Model Deployment [00:14:24] 1.4 AI Alignment and Human Feedback Limitations2. Evaluation and Bias Assessment [00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment [00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior [00:32:43] 2.3 Adversarial Examples and Model Robustness3. Benchmarking Systems and Methods [00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches [00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics [00:50:33] 3.3 Evolution of Model Benchmarking Methods [00:51:15] 3.4 Hierarchical Capability Testing Framework [00:52:35] 3.5 Benchmark Platforms and Tools4. Model Architecture and Performance [00:55:15] 4.1 Cohere's Model Development Process [01:00:26] 4.2 Model Quantization and Performance Evaluation [01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards [01:08:27] 4.4 Training Progression and Technical Challenges5. Future Directions and Challenges [01:13:48] 5.1 Context Window Evolution and Trade-offs [01:22:47] 5.2 Enterprise Applications and Future ChallengesREFS:[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20[00:04:15] Influence functions in machine learning, Koh & Lianghttps://arxiv.org/abs/1703.04730[00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf[00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmannhttps://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf[00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AIhttps://huggingface.co/CohereForAI/c4ai-command-a-03-2025[00:13:30] OpenInterpreterhttps://github.com/KillianLucas/open-interpreter[00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsomhttps://arxiv.org/abs/2309.16349[00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.https://arxiv.org/abs/2404.16019[00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madryhttps://arxiv.org/abs/1905.02175[00:43:00] DynaBench platform paper, Douwe Kiela et al.https://aclanthology.org/2021.naacl-main.324.pdf[00:50:15] Sara Hooker's work on compute limitations, Sara Hookerhttps://arxiv.org/html/2407.05694v1[00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.https://arxiv.org/abs/2207.10062[01:04:35] DROP, Dheeru Dua et al.https://arxiv.org/abs/1903.00161[01:07:05] GSM8k, Cobbe et al.https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k[01:09:30] ARC, François Chollethttps://github.com/fchollet/ARC-AGI[01:15:50] Command A, Coherehttps://cohere.com/blog/command-a[01:22:55] Enterprise search using LLMs, Coherehttps://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers
    --------  
    1:23:11
  • Tau Language: The Software Synthesis Future (sponsored)
    This sponsored episode features mathematician Ohad Asor discussing logical approaches to AI, focusing on the limitations of machine learning and introducing the Tau language for software development and blockchain tech. Asor argues that machine learning cannot guarantee correctness. Tau allows logical specification of software requirements, automatically creating provably correct implementations with potential to revolutionize distributed systems. The discussion highlights program synthesis, software updates, and applications in finance and governance.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/t849j6v1juk3gc15g4rsy/TAU.pdf?rlkey=hh11h2mhog3ncdbeapbzpzctc&dl=0Tau:https://tau.net/Tau Language:https://tau.ai/tau-language/Research:https://tau.net/Theories-and-Applications-of-Boolean-Algebras-0.29.pdfTOC:1. Machine Learning Foundations and Limitations [00:00:00] 1.1 Fundamental Limitations of Machine Learning and PAC Learning Theory [00:04:50] 1.2 Transductive Learning and the Three Curses of Machine Learning [00:08:57] 1.3 Language, Reality, and AI System Design [00:12:58] 1.4 Program Synthesis and Formal Verification Approaches2. Logical Programming Architecture [00:31:55] 2.1 Safe AI Development Requirements [00:32:05] 2.2 Self-Referential Language Architecture [00:32:50] 2.3 Boolean Algebra and Logical Foundations [00:37:52] 2.4 SAT Solvers and Complexity Challenges [00:44:30] 2.5 Program Synthesis and Specification [00:47:39] 2.6 Overcoming Tarski's Undefinability with Boolean Algebra [00:56:05] 2.7 Tau Language Implementation and User Control3. Blockchain-Based Software Governance [01:09:10] 3.1 User Control and Software Governance Mechanisms [01:18:27] 3.2 Tau's Blockchain Architecture and Meta-Programming Capabilities [01:21:43] 3.3 Development Status and Token Implementation [01:24:52] 3.4 Consensus Building and Opinion Mapping System [01:35:29] 3.5 Automation and Financial ApplicationsCORE REFS (more in pinned comment):[00:03:45] PAC (Probably Approximately Correct) Learning framework, Leslie Valianthttps://en.wikipedia.org/wiki/Probably_approximately_correct_learning[00:06:10] Boolean Satisfiability Problem (SAT), Varioushttps://en.wikipedia.org/wiki/Boolean_satisfiability_problem[00:13:55] Knowledge as Justified True Belief (JTB), Matthias Steuphttps://plato.stanford.edu/entries/epistemology/[00:17:50] Wittgenstein's concept of the limits of language, Ludwig Wittgensteinhttps://plato.stanford.edu/entries/wittgenstein/[00:21:25] Boolean algebras, Ohad Osorhttps://tau.net/tau-language-research/[00:26:10] The Halting Problemhttps://plato.stanford.edu/entries/turing-machine/#HaltProb[00:30:25] Alfred Tarski (1901-1983), Mario Gómez-Torrentehttps://plato.stanford.edu/entries/tarski/[00:41:50] DPLLhttps://www.cs.princeton.edu/~zkincaid/courses/fall18/readings/SATHandbook-CDCL.pdf[00:49:50] Tarski's undefinability theorem (1936), Alfred Tarskihttps://plato.stanford.edu/entries/tarski-truth/[00:51:45] Boolean Algebra mathematical foundations, J. Donald Monkhttps://plato.stanford.edu/entries/boolalg-math/[01:02:35] Belief Revision Theory and AGM Postulates, Sven Ove Hanssonhttps://plato.stanford.edu/entries/logic-belief-revision/[01:05:35] Quantifier elimination in atomless boolean algebra, H. Jerome Keislerhttps://people.math.wisc.edu/~hkeisler/random.pdf[01:08:35] Quantifier elimination in Tau language specification, Ohad Asorhttps://tau.ai/Theories-and-Applications-of-Boolean-Algebras-0.29.pdf[01:11:50] Tau Net blockchain platformhttps://tau.net/[01:19:20] Tau blockchain's innovative approach treating blockchain code itself as a contracthttps://tau.net/Whitepaper.pdf
    --------  
    1:41:19

Fler podcasts i Teknologi

Om Machine Learning Street Talk (MLST)

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).
Podcast-webbplats

Lyssna på Machine Learning Street Talk (MLST), SvD Tech brief och många andra poddar från världens alla hörn med radio.se-appen

Hämta den kostnadsfria radio.se-appen

  • Bokmärk stationer och podcasts
  • Strömma via Wi-Fi eller Bluetooth
  • Stödjer Carplay & Android Auto
  • Många andra appfunktioner
Sociala nätverk
v7.11.0 | © 2007-2025 radio.de GmbH
Generated: 3/25/2025 - 2:57:39 AM