GitHub - sourabhparvatikar/Hadoop-Projects-AWS

Hadoop MapReduce using AWS

Steps:

Create a maven project.
Create Map and Reduce class.
Update dependencies in pom.xml.
Run the project as Java application.
Test the application locally.
Export the project as a Runnable JAR file.
Create a Amazon S3 bucket and upload all input files and JAR's to it.
Create a cluster in Amazon EMR by selecting required EC2 instance type and number of instances. 1 will be a master instance and the others will be slave instances.
After instances are running, create a task by selecting the appropriate JAR, input files and ouput folder that were uploaded previously in S3 bucket.
After the task is completed, output will be written to the mentioned output folder in S3 bucket.
Download the output files if required.

Tasks Completed:

Counting number of occurrences of each word in the given input file.
Counting number of occurrences of each double word sequence in the given input file.
Counting number of occurrences of individual words in the given input file that are present in another txt file using Distributed Caching feature in AWS.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
DistributedWordCount		DistributedWordCount
WordCount		WordCount
WordCountDouble		WordCountDouble
README.md		README.md
project_report.pdf		project_report.pdf

Provide feedback