We will build the course in this Google Doc.

Course Description

Over the last two years, artificial intelligence has advanced far beyond language modeling. This seminar explores the frontier of AI systems that perceive, reason, and act in the world—not just predict the next token. We will examine breakthroughs in world models, scientific discovery, embodied and agentic intelligence, video generation, and robotics, as well as emerging architectures that challenge the dominance of the transformer. Students will analyze seminal papers, discuss ethical and scientific implications, and present projects that critically explore or prototype these new paradigms.

Key themes include how AI is evolving from pattern completion to world simulation, how models learn causality, physics, and control, and how AI for science is accelerating discovery in biology, chemistry, and genomics.

Core Topics

  1. From Language to World Models
    • Why "beyond LLMs"? The shift from symbolic reasoning to grounded simulation.
    • Case study: Genie 3 and generative world models for interactive environments.
    • Topics: latent dynamics, representation learning, causal structure, and embodied simulation.
  2. Video Generation and World Simulation
    • Sora and its successors as general-purpose visual simulators.
    • From frame interpolation to full 3D world reasoning.
    • Concepts: diffusion transformers, scene coherence, physical consistency.
  3. Vision and Perception Foundation Models
    • SAM 2, OpenSeeD, and the rise of streaming visual memory models.
    • From static vision to continuous perception.
    • Applications in tracking, AR/VR, and real-world robotics.
  4. Embodied AI and Robotics Foundation Models
    • RT-X, RT-2, and the scaling laws of robot data.
    • Visuomotor learning and shared policy spaces across robots.
    • Experiments in real-time control, imitation learning, and sim2real transfer.
  5. Agentic and Interactive Systems
    • Beyond LLM-based agents: perception-driven autonomy, memory, and tool use.
    • Multimodal reasoning, temporal abstraction, and embodied decision-making.
    • Ethical implications of autonomous agents.
  6. AI for Science I: Molecular and Structural Discovery
    • AlphaFold 3 and diffusion-based biomolecular modeling.
    • Integration of physical priors and experimental validation.
    • Scientific impact and reproducibility.
  7. AI for Science II: Genomics and the “Alpha Genome” Vision
    • Scaling AI to understand gene regulation, protein interactions, and epigenetics.
    • Cross-modal learning from DNA, RNA, and protein sequences.
    • Toward AI-driven discovery pipelines in life sciences.
  8. Architectural Innovations Beyond Transformers
    • Mamba and selective state-space models (SSMs).
    • Hybrid attention–state-space architectures and efficient sequence modeling.
    • The next generation of hardware-efficient AI.
  9. Cognitive and Predictive Models of the World
    • Predictive processing, active inference, and their computational analogues.
    • Links between neuroscience, reinforcement learning, and world modeling.
  10. Ethics, Evaluation, and Explainability Beyond Text
    • How to benchmark video, robotic, and world models.
    • Provenance, safety, and control in generative simulation.
    • Societal implications of synthetic worlds