FUBMI-WB : Seminar/Proseminar: Large Language Models W24/25 : Home

Home

This seminar provides an exploration of large language models (LLMs), covering both foundational concepts and the latest advancements in the field. Participants will gain a comprehensive understanding of the architecture, training, and applications of LLMs, based on seminal research papers. The course will be organised as a journal club: students present individual papers, which are then discussed in the group to make sure we all get the ideas presented.

### Potential Topics

- Neural networks and deep learning basics

- Sequence modeling and RNNs (Recurrent Neural Networks)

- Vaswani et al.'s "Attention is All You Need" paper

- Self-attention mechanism

- Multi-head attention and positional encoding

- GPT-1: Radford et al.'s pioneering work

- GPT-2: Scaling and implications

- GPT-3: Architectural advancements and few-shot learning

- BERT (Bidirectional Encoder Representations from Transformers)

- T5 (Text-To-Text Transfer Transformer)

- DistilBERT and efficiency improvements

- Mamba:l and other SSMs: Design principles and performance

- Flash Attention et al: Improving efficiency and scalability

- Training regimes and resource requirements

- Fine-tuning and transfer learning

- Emergence of new capabilities

Tools list begins here

Content begins here