GitHub - neofit77/newsPaper: NewsPaper is Python project that is used to scrapy key elements of blog posts (author, text, publication date, and keywords). It is especially useful for find keywords in the text for SEO or marketing.

NewsPaper is Python project that is used to scrapy key elements of blog posts (author, text, publication date, and keywords). It is especially useful for find keywords in the text for SEO or marketing.

Requirements:

Python 3 (I tested with version 3.6)
Pythons library newsPaper3K and nltk
Scrapy (version 1.7 is used in the program)

Instructions:

In the csvInput folder, remember the csv file with internet URLs of the articles you want to process (an example of an input file is inputFile.csv in the csvInput folder)
Open the terminal, go to newsPaper (the main folder)
Run command "scrapy crawl news -o <name_output_file_what_you_want.csv>"(or other extensions: json, xlm)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
newsPaper		newsPaper
README.md		README.md
output.csv		output.csv
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

neofit77/newsPaper

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages