Course

Natural Language Processing

Prof Dr Christoph Schommer, University Luxembourg, Dept of Computer Science, Campus Belval. Home: illas.uni.lu/members/christoph_schommer Email: christoph.schommer@{uni.lu, fu-berlin.de}

Whats it all about?

The processing and the understanding of natural language is one of the most important aspects of Artificial Intelligence in general. This is not only due to, e.g., the simulation of natural conversations among humans as a cognitive process, but also due to daily text-related applications that underpin the importance of AI as a supporting instrument ("AI is for humans"). Concrete examples are, e.g., chatbot technology, the generation of texts, the application of machine learning in text-intensive environments, and the retrieval of the right information by search engines. Just to note that the aspect of ambiguity is still an issue that even deepl.com and translate.google.com are unable to solve (October 2020).

This course will take place in Winter Term 2020/21 from 5 November 2020 until 25 February 2021 (15 weeks) on Thursdays, 10h15 - 11h45 (Lecture) and 12h15 - 13h45 (Tutorial) via Webex. In terms of content, this is not a lingustic course, but rather a course that includes aspects of language processing, machine learning and calculable methods. It should also be pointed out that the course is not intended to be a monologue of the professor, but that we will work together on the underlying topic.

Our weekly meeting link:
    https://unilu.webex.com/unilu/j.php?MTID=mbf9408197d3ac6f555770c1db0411717

5 November 2020 - 25 February 2021

{ Meeting number: 163 940 6354
Password: yQ6tVkmN5M6 // Host key: 698099 }

The course is organized as Lecture and a Tutorial that is performed by the course participants ("Presentation groups" 1-11).

____________

IMPORTANT (4 November 2020)

I have organised a DOODLE-Poll, where you can select the day of presentation (since we are around 45 participants, 11 days are presented):

Tutorials : DOODLE POLL

Please note that  a) each participant can cross only one slot and b) the number of presenters on the specific day is limited to 2.

In the lecture part of the course, we discuss selected aspects of selected chapters of the book by David Jurafsky, James Martin: "Speech and Language Processing". Source: see https://web.stanford.edu/~jurafsky/slp3/

In the Tutorial, you work with the book "Natural Language Processing with Python" by Steven Bird, Ewan Klein, and Edward Loper. Source: https://www.nltk.org/book/

Additional references:

  • The Natural language Toolkit.: download + user manual: http://www.nltk.org/
  • J. Allen: Natural Language Understanding (Pearson)
  • C. Manning, H. Schütze: Foundations of Statistical Natural Language Processing (MIT Press)
  • S. Russel, P. Norvig: Artificial Intelligence, A Modern Approach (Pearson)

Contact

For questions or comments of any kind, do not hesitate to send me a letter: christoph.schommer@  { uni.lu, fu-berlin.de}

Please note that the course takes place only if the number of participants is >= 5.

Evaluation
The evaluation is as follows: 40% Final assignment + 60% Tutorial. Both Final Assignment and Tutorial must be followed, otherwise there will be no mark given. Please note that the final assignment takes place on 4 March 2021 from 10-12

Lecture

In the lecture, I will address selected topics of Natural Language Processing as given in the book by Jurafsky as well as below.

Tutorial
Depending on the # of participants, the number of participants per presentation group fluctuates from 1 to 4. The default value is 2. The documents are available via pdf under Resources. You may also find the documents as html-page by clicking on the corresponding link below:

Please note that there are no exercises but that the Tutorial is mainly a self-study.

What you will learn in the Tutorial part (see book by Bird et al.):

  1.  How simple programs can help you manipulate and analyze language data, and how to write these programs.
  2.  How key concepts from NLP and linguistics are used to describe and analyze language.
  3.  How data structures and algorithms are used in NLP
  4.  How language data is stored in standard formats, and how data can be used to evaluate the performance of NLP techniques

Please note:

  • Each of the chapter suggests exercises, which can be addressed by the presentation group.
  • The presentation should be balanced with presentation slides and the programming environment. Everyone is encouraged to follow the instructions of the presenters.
  • Each presentation group is responsible that the contents is addressed.
  • The goal of the presentation is to teach the other participants.
  • It is clear that some chapter are complex and that not the whole content can be presented. So, the presention group's task is also to give a fair overview of the chapter and motivate which content has been chosen.

Evaluation Scheme - Lecture + Tutorial

Lecture

We will have a final assignment at the end of the course. The final assignment takes up to 100 minutes work (net; respecting the time for receiving and sending, the whole time is 120 minutes), where you have to work on the question sheet that I will send to you by email. You are then invited to send me your answer sheet by email, too. The questions are less about facts but more about an understanding of the the overall subject. 

Please note: we will have a test assignment under real conditions on the 18th of February 2021. The purpose of this assignment/exam is a) to learn how I ask and what answers I expect and b) to have a summary/conclusions of the course. We will then discuss the test assignment/exam a week later (25 February). Please note that your answers are not evaluated and that they do not count. It is a dress rehearsal.

Tutorial

  • Has the presentation be understood by the other participants? ( the feedback of the audience is needed here. )
  • Has the presentation been made nearly complete with respect to the given chapter (not quantity, but with regards to content)?
  • Has the presentation be convivial (best) or sleepy (worsest)?
  • Has the presented content be correct as such?
  • How were the questions answered by the presention group members?
  • Did the group act in a balancing way? // worsest: no one speaks; second worsest: one speaks, the others remain silent.
  • Presence of the participants

Please note: the presenters must have sitched ON their video. The programming environment and/or slides have to be shared with all.

  • I expect that each participant is present during all Tutorials. If someone can not attend then please let the presenters and myself know. Thank you!
  • Each participant is invited to follow the instructions of the presenting group. The presenting group may direct questions to the audience.
  • Each participant is self-responsible with respect to the software (python, nltk) installation.
  • Deliveries: 2h Tutorial presentation

Preliminary Schedule

05 November
   10-12 Course Overview; Introduction
   12-14 Tutorial Overview

12 November
   10-12 Regular expressions, Word repair
   12-14 { Self-study }

19 November
   10-12 Language Models: n-grams, Smoothing
   12-14 { Self-study }

26 November
   10-12 Vector Space Model; tf.idf; Word embeddings
   12-14 { Self-study }

03 December
   10-12 Part-of-Speech; HMM; Viterbi
   12-14 Group 01: Stachnik, Vindimian

10 December
   10-12 Naive Bayes; Precision, Recall, and F-measure
   12-14 Group 02: Bich, Kaibel

17 December
   10-12 Practical NLP aspects - my talk at the European Language Resource Consortium (ELRC)
   12-14 Group 03: Akperov, Golghalyani

07 January  
   10-12 Connectionism for a Word-Sense Disambiguation
   12-14 Group 04: Kothari, Schäfer

14 January
   10-12 Context-free Grammars, CYK Chart Parser
   12-14 Group 05: Bockhorn, Chisaru

21 January
   10-12 Information Extraction, Associative Memories based on converstations.
   12-14 Group 06: Mies, Pinto

28 January
   10-12 Text classification
   12-14 Group 07: Kirchner, Szwedowicz

04 February
   10-12 Test assignment as a Summary of the course + Preparation
   12-14 Group 08: Akkus,  Baral

11 February
   10-12 We discuss the test exam/assignment /Dress rehearsal
   12-14 Group 09: Hristov, Rademann

18 February
   10-12 Group 10: PS, Comak
   12-14 Group 11: Cornelius, Lin

25 February
   10-12 self-study + preparation for the final assignment (no course, no tutorial today)
   12-14 self-study + preparation for the final assignment (no course, no tutorial today)

FINAL EXAM/ASSIGNMENT :      4 March 2021, 10-12.

Rules and conditions:

  • The final assignment (FA) takes place from 10:00 - 11h50; the FA is designed for 90 minutes.However, I grant additional 10 minutes for sending me your answer sheet by email. This makes 110 minutes in total.
  • Your submission should contain in the subject: "NLP FA - <FirstName Lastname>"Submission : via Assignment
  • The FA is open-book, which means that you can use courseware, books, et cetera. However, group work is NOT allowed.
  • Once you submit your answer sheet, you automatically declare that you have done the FA by yourself and that all the writings have been done by yourself as well (no copy from anywhere). The FA counts for 40%, the other 60% are from your Tutorial presentation.
  • Please note that there are 3 types of questions: Fact questions, Opinion questions, and Understanding questions:
    • Fact questions: you have to calculate or explain concretely.
    • Opinion questions: you have to state your opinion/impression about something,In this case, please use substantial and independent arguments.Examples are: explanability, handling (as a Human-Computer Interface), performance (time, storage), expandabililty, reliability, applicabililty, and others.
  • There will be no questions wrt nltk/python programming. Only the aspects that we have discussed in the course, are counted.
  • Good Luck!