INFORMATION RETRIEVAL  

(Pro)Seminar, Summer Term 2020

Christoph Schommer, Last Update: 1 September 2020

{

Preliminary Discussion (Vorbesprechung): 1 September 2020
An Online Meeting invitation has been sent to you by 18 August 2020:

WEBEX

Where: https://unilu.webex.com/unilu/j.php?MTID=m43e46106464f5b1c008f00df14a8edf6

Meeting number (access code): 163 924 4216
Meeting password: IR1418PrelDisc

}

 

ORGANISATION

The course is a (pro)seminar with main focus on Information Retrieval. We will have a preliminary discussion (deutsch: Vorbesprechung) on Tuesday, 1 September 2020, at 10h00 c.t. An invitation to the Webex Online Meeting has been sent to you and can be found above.
 

The course takes place from 14 - 18 September 2020 as follows (see the final presentation schedule below under "PAPERS") :

Meeting Link (Monday - Friday):
https://unilu.webex.com/unilu/j.php?MTID=m6cfe93599ac5d1909719991a0a1311e0
Meeting number: 163 902 0702
Password: 3yJ739gb3rm
Host key: 886605
  • Monday: Lecture from 09h15 - 10h45
    • What is Information Retrieval?
    • Boolean Retrieval
    • Posting lists and Inverted Index Construction
  • Tuesday: Lecture from 09h15 - 10h45; Talks from 11h00 - 15h00
    • Natural Language Processing: Tokenization, Lemmatization, Stemming; Porter Stemmer.
    • Word Repair: n-grams and Jaccard, Soundex.
  • Wednesday: Lecture from 09h15 - 10h45; Talks from 11h00 - 15h00
    • Word Repair: Levenshtein (Edit) Distance
    • Wildcards and data structures (B-tree; reverse tree).
    • Ranking tf and idf, document frequency, calculation of a score.
    • Vector Space model: representing documents and queries as points in the space (-> vectors).
    • Vector Space model: use of the angle/cosine to find out the similarity/distance between 2 vectors; dot product.
  • Thursday: Lecture from 09h15 - 10h45; Talks from 11h00 - 15h00
    • PageRank idea
    • Evaluation with Precision and Recall; F-measure; van Rijsbergen's alpha.
    • kappa-model
  • Friday: Lecture from 09h15 - 10h45; Talks from 11h00 - 15h00
    • Query expansion; user feedback.
    • In short: apriori, Association Discovery for a Query expansion.

Please note that the course and the presentations will take place in English language.

CONTENT

In the lecture part, we discuss the selected aspects regarding a search engine, the role of Natural Language Processing, and typical aspects like Ranking, Evaluation, the role of Feedback, Query Extensions, Quality aspects, and more.

With the talks T1-T16, each candidate contributes once to the course by a talk Paper Review. For this purpose, each candidate has to deliver the following documents to a selected paper of his/her choice:

  • A talk of up to 30 minutes (where the paper review is presented) plus Q&A.
  • A written version of the paper review of up to 1000 - 1200 words (in pdf-format).

Each paper review should discuss the following points:

  • Give a short summary of the paper. What is the paper about and what is the main contribution of the paper?
    (as mentioned in the Vorbesprechung, you may use the abstract of the paper but you should write in your own words and extend this part by additional aspects, for example the structure of the paper. Important: this part has to be objective without any kind of subjective comments!).
  • Discuss the clarity of writing, i.e., the style, the presentation of figures and tables, the usage of acronyms (fluent reading guaranteed?), spelling errors, etc. Do you believe that the presented topic has been understood?
  • Does the paper sound technically sound? What is the scientific novelty? Have tests and experiments been made and do these convince? Is data sufficiently explained? Are the results sufficiently discussed? Are citations sufficiently made? Do you see a plagiarism?
  • Do you see a relevance of the work to other fields (cross-usage)?
  • Is there a critical reflection of the content given by the authors (e.g., a SWOT)? Are there convincing suggestions and explanations regarding a future work?
  • Which audience do you see as appropriate (industry and/or academia; practioners or theorists; only Computer Science or multi-disciplinary work)? Why?
  • What are the best and the weakest points in the paper? State one another positive and negative point, respectively.
  • How do you evaluate the speaker's presentation (see the link to the speaker's presentation behind each paper).
  • Please include your scores:
    • Overall Rating for the paper: 5 (Full Accept), 4 (Weak Accept), 3 (Borderline), 2 (Weak Reject), 1 (Reject).
    • What is your confidence in *your* rating: 5 (I am an expert), 4 (I am very convinced), 3 (I am confident), 2 (I am not sure), 1 (I am not confident at all).
    • Also: do you recommend the paper as a poster or as a short presentation or as full presentation?
    • Do you recommend the paper for a Best Paper Award? Please explain your answer!

Submission Deadline: Friday, 25 September 2020, 11h00 CEST ( confirmed )

 

 

EVALUATION

  • 40% Presentation of your paper review.
  • 50% Executive summary of your paper review.
  • 10% Presence during the course.

The main reference is the book by Manning, C., Schütze, H.: Introduction to Information Retrieval, Cambridge University Press. see https://nlp.stanford.edu/IR-book/information-retrieval-book.html .

PAPERS

The following papers have been presented at the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, in Paris, France. ACM SIGIR is the premier scientific conference in the broad area of Information Retrieval. Url: https://sigir.org/sigir2019/ Each paper has 10 pages. Please note that for each paper, a video is available} (recording of the speaker's presentation: http://www.sigir.org/sigir2019/program/schedule/

TUESDAY, 15 September

  • 11h00 - 11h45 Glenn Schneider: #A Hate Speech Detection is not easy as you may think
  • 11h45 - 12h30 Evghenii Orenciuc: #D Relational Collaborative Filtering - Modeling Multiple Item Relations for Recommendation
  • 12h30 - 13h15 Juri Torhoff: #H ENT Rank - Retrieving Entities for Topical Information Needs through Entity-Neighbor-Text Relations

WEDNESDAY, 16 September

  • 11h00 - 11h45 Victoriya Kralewa: #Q An Efficient Adaptive Transfer Neural Network for Social-aware Recommendation
  • 11h45 - 12h30 Nils Thiele: #G Asking Clarifying Questions in Open-Domain Information-Seeking Conversation
  • 12h30 - 13h15 Tim Kluge: #K Teach Machine How to Read: Reading Behavior Inspired Relevance

THURSDAY, 17 September

  • 11h00 - 11h45 Samuel Enderwitz: #C Adaptive Multi-Attention Network Incorporating Answer Information for Duplicate Question Detection
  • 11h45 - 12h30 Jonas Schäfer: #L Context-Aware Intent Identification in Email Conversations
  • 12h30 - 13h15 Fritz Cremer: #J Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

FRIDAY, 15 September

  • 11h00 - 11h45 Shiho Onitsuka: #O Health Cards for Consumer Health Search 
  • 11h45 - 12h30 -
  • 12h30 - 13h15  -

_____________________________________________________________________________________________________________________

NOT ASSIGNED (as of 1 September; Deadline of interest: Thursday, 10 September):

  • #B Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs
  • #E Neural Graph Collaborative Filtering
  • #F Context Attentive Document Ranking and Query Suggestion
  • #I Transparent Scrutable and Explainable User Models for Personalized Recommendation
  • #M DivGraphPointer - A Graph Pointer Network for Extracting Diverse Keyphrases
  • #N Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network - Towards Visually Explainable Recommendation
  • #P Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction
  • #R Online User Representation Learning Across Heterogeneous Social Networks

Contact

By Email: christoph.schommer@fu-berlin.de (or christoph.schommer@uni.lu)