This repository aggregates reading materials, lab templates, datasets and other electronic resourses useful to learn about search, recommendations and other IR things.
-
In 2021, there is still THE BOOK which is both good and old. That is why this book is mandatory reading, because it covers all necessary topics. But unfortunately it was written right before multimedia retrieval, recommender systems and machine learning became a common place.
-
Latent space approximation is an important topic, you can refer to discussion of ALS, Word2Vec, BERT.
-
Indexing is the blood system of search. Proximity graphs lay on the 0th level of theory. Higher you will find NSW and HNSW graphs. In search trees don't forget to read about Annoy. For modern inverted indices please refer to this paper and it's predecessor.
-
Written in 2003, still important paper of audio retrieval from Shazam creator. Also consider Query by Humming, Hum to Search by Google.
-
Images refrieval start with low level features, like SIFT, Haralick and Xerox features. And continues with machine learning with classifiers and autoencoders.
-
Topic modelling is usually a side topic for IR, but it is very important when things come to clustering, debiasing, analysis. To leading approaches are PLSA and LDA. Here are some interesting materials: topic modelling (rus), tutorials, LDA original paper.
Please find them on a separate page.
- Access this course, especially advanced part, if you want to know more about IR with neural networks.
- Here you can find IR-related labs, e.g. on LDA.