Tilde Search Engine

Search engine for tilde-based websites

Discovery

Responsible for:

Not creepy at all. Responsible for:

Content explanantion

tokenize_corpus and Porter files - are responsible for cleaning corpus data into stemmed tokens. Needs stopwords.txt file in same dir
data file - interfaces with numerous text and json files for easy data management
parse_url file - handles html, including requests and parsing text and metadata
init_freq_dir file - creates and/or updates document frequency dictionary
crawl file - goes thru urls and gathers tags + metadata for dictionaries

This document last updated: Jul 20 2020

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
crawler		crawler
discovery		discovery
.gitignore		.gitignore
makefile		makefile
readme.md		readme.md
requirements.txt		requirements.txt