Skip to content

Parses a series of input HTML files and creates an inverted index and outputs a series of files keeping track of each token in each file. Program written in Python.

Notifications You must be signed in to change notification settings

katherine-atwell/inverted-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This program creates an inverted index based on a directory of input HTML files and outputs a set of files including: a directory file, a set of files containing the individual tokens in each input file, a set of files containing the TF-IDF scores for each unique token in each file, and a dictionary and postings file.

About

Parses a series of input HTML files and creates an inverted index and outputs a series of files keeping track of each token in each file. Program written in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published