Sentiment analysis with Pumpkin

The workflow contains three workers (seeds):

tweetinject.py - reads a file with tweets and sends them to the queue
filter.py - performs sentiment analysis on incoming tweets
collector.py - counts tweets and writes result into a file

Install missing dependencies if required

pip install pika

Prepare pumpkin environment

You should copy required files to you VM (as it is explained in the assignment).

Workers (seeds)

You should copy desired workers to ~/pmk-seeds directory - or directory that you specify with --taskdir option (see below in section Run Pumpkin).

Do the following if you want to run all three workers on one VM:

scp *.py pumpkin:pmk-seeds/

Classifier

Copy trained classifier to required directory.

mkdir -p /home/pumpkin/nltk_data/classifiers

scp movie_reviews_NaiveBayes.pickle pumpkin1:nltk_data/classifiers/

Configuration file

Complete (see slides) and copy pumpkin configuration file into working directory

scp pumpkin.cfg pumpkin1:pumpkin/

You should also change a group name. With the same value for different groups, your workers might communicate with workers of other people since the messages are identified in the system by a pair: message type and a group.

Run Pumpkin

In order to start pumpkin, execute: Replace $taskdir by a chosen directory such as '~/pmk-seeds'

python DRHarness.py --supernode --taskdir $taskdir --broadcast --endpoints="tcp://*:*" --gonzales

Prepare final result

Perform the computations as it is explained in the assignment. Then, you plot a graph that shows how the number of positive and negative tweets changes in time. First, if needed, sort the data:

sort --output=tweetstats.data tweetstats.data

and plot a graph (using gnuplot):

gnuplot plot.gnu

open the file tweetstats.png.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sentiment analysis with Pumpkin

Install missing dependencies if required

Prepare pumpkin environment

Workers (seeds)

Classifier

Configuration file

Run Pumpkin

Prepare final result

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sentiment analysis with Pumpkin

Install missing dependencies if required

Prepare pumpkin environment

Workers (seeds)

Classifier

Configuration file

Run Pumpkin

Prepare final result