This seminar is about random objects in very high-dimensional Euclidean spaces, as they appear for instance in data science. One intriguing property of the associated theory is that our low-dimensional intuition often fails spectacularly, as the following examples show.

  1. An n-dimensional standard normal distribution is highly concentrated around a sphere (i.e., the surface of a ball) of radius square root of n, rendering our „bell-curve“ intuition useless.
  2. Let us consider the unit sphere in high dimensions, and let A be a subset of it covering at least half of its area. Then any small neighborhood of A will be exponentially close (with respect to the size of the neighborhood) to covering the whole sphere.

We will explore the mathematical background of these phenomena and review, in particular, inequalities concerning concentration of probability. Depending on number of participants and interest we might even learn about their consequences for random graphs and matrix completion.

Organization

Time & location: Wed 16:15 - 17:45, Room: A3/HS001 (Hörsaal), Exception: A6/SR 025/026 on June 15

Contact

Péter Koltai (peter.koltai@fu-berlin.de)

Prerequisites

A rigorous course in probability theory, further undergraduate linear algebra and calculus.

Literature

[Ver] R. Vershynin: High-Dimensional Probability
https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf

[BHK] A. Blum, J. Hopcroft, and R. Kannan: Foundations of Data Science.
https://home.ttic.edu/~avrim/book.pdf

Procedure

The seminar will consist of weekly student talks (~60 min) and following discussion. Each talk will be moderated by another student participant of the seminar. Every speaker should present to P. Koltai a detailed concept of their talk at least two weeks prior to the talk. Please make an appointment via peter.koltai@fu-berlin.de.

The final grade will be composed from the results of the own talk(s).

Topics will be assigned during the first class on Wednesday, Apr 20. These include (page numbers refer to the online version of the book [Ver]):

0. Appetizer: Probabilistic proof and approximate version of Caratheodory
1. Basics on random variables (pp. 6-12)
2. Hoeffding (pp. 13-19)
3. Chernoff + degrees of random graphs (pp. 19-23)
4. Sub-gaussian distributions: Definition and examples (pp. 24-29)
5. General Hoeffding’s and Khintchine’s inequalities, Centering, sub-exponential distributions (pp. 29-35)
6. Bernstein’s inequality & outlook (pp. 37-40)
7. Random vectors in high dimensions: norm & PCA (pp. 42-47)
8. TBA
8+. Johnson-Lindenstrauss lemma & others

Schedule

For the schedule, see this link.