Vector Space Model (VSM) For Information Retrieval

K214553 Syeda Rabia Hashmi

Introduction:

This project implements a basic Information Retrieval System using the Vector Space Model (VSM). It allows users to search for documents within a corpus based on a query. The system preprocesses the documents, calculates TF-IDF values, and ranks documents based on cosine similarity.

Features:

Tokenization: Tokenizes the documents, removing punctuation, numbers, and stop words. It also performs stemming using the Porter Stemmer algorithm.
Inverted Index: Builds an inverted index containing terms and their occurrences in documents. I have used indexes built during previous project Boolean Retrieval IR Model.
TF-IDF Calculation: Calculates TF-IDF values for each term-document pair.
Cosine Similarity: Computes and normalizes the cosine similarity between the query and documents to rank them.
Web Interface: Provides a simple web interface for users to enter search queries and view results.

Usage:

Ensure you have Python installed on your system.
Install the necessary dependencies using pip install (dependancy).
Place your documents in the ResearchPapers directory.
Run the Vector_Space_Model.py file to start the Flask web server.
Access the search interface in your web browser at http://localhost:5000.
Enter your query in the search bar and press Enter or click the Search button.
View the search results on the results page.

File Structure:

Vector_Space_Model.py: Main Flask application file that handles routing and search functionality.
inverted_indexA2.txt: Text file containing the inverted index data.
TF-IDF.txt: Text file containing TF-IDF values for documents.
Stopword-List.txt: List of stop-words.
ResearchPapers/: Directory containing the corpus of documents.
templates/: Directory containing HTML templates for the web interface.

Requirements:

Python 3.x
Flask
NLTK (Natural Language Toolkit)

Credits:

This project was created by Syeda Rabia Hashmi, roll no: K214553, NUCES FAST, Karachi. It is provided as an educational tool for learning about Information Retrieval and Vector Space Model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector Space Model (VSM) For Information Retrieval

Introduction:

Features:

Usage:

File Structure:

Requirements:

Credits:

Output:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ResearchPapers		ResearchPapers
templates		templates
README.md		README.md
TF-IDF.txt		TF-IDF.txt
Vector_Space_Model.py		Vector_Space_Model.py
inverted_indexA2.txt		inverted_indexA2.txt

SRAABIA/Vector-Space-Model-VSM-

Folders and files

Latest commit

History

Repository files navigation

Vector Space Model (VSM) For Information Retrieval

Introduction:

Features:

Usage:

File Structure:

Requirements:

Credits:

Output:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages