nlp

A python project for the course of Language Technology/NLP. Using the data gathered by some custom made crawlers made with scrapy for various news portals, and the nltk module, we create a vector space representation of our collection and an inverted index. Ultimately, the 2 basic functionalities are:

relevant article search, based on the tf-idf metric regarding the search query keywords
categorization of a text by looking at the top features (frequency-wise) of it's content

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Part 2		Part 2
cnn		cnn
guardian		guardian
nytimes		nytimes
onion		onion
reuters		reuters
washington_post		washington_post
.gitignore		.gitignore
GetXML.py		GetXML.py
README.md		README.md
SetXML.py		SetXML.py
command.txt		command.txt
inverted_idx.py		inverted_idx.py
main_file.py		main_file.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nlp

About

Releases

Packages

Languages

stikos/nlp

Folders and files

Latest commit

History

Repository files navigation

nlp

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages