Okapi BM25

A Python implementation of the BM25 for file retrieval

Given a query Q, containing keywords q1,...,qn, BM25 score of a document is

Implementation

There are two main modules:

QueryParser parses the query to produce a list.

BuildIndex builds an inverted index and computes the scores of the documents according to the BM25 ranking function.

process_files: processes corpus files to produce a dictionary
index_one_file & regular_index: map words to their position in the corresponding document
inverted_index: return a dictionary with each word as the key and its value is another dictionary, whose key is filename and value is word position in that file
inverse_df: return a dictionary with each word as the key and the IDF as value
docLen and avgdocl: calculates the length of each document, the average document length in the text collection, respectively
BM25scores: return BM25 scores of the documents

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
doc_retrieval.py		doc_retrieval.py
document_retrieval.ipynb		document_retrieval.ipynb