In the industry, it is fairly common to collect application logs and need to analyze aggregated data from it to evaluate performance of a service or system. In my project, I try to teach myself how this long, repeated and tedious process can be done with MapReduce in an highly efficient, paralleled and fun way. After the experiments, I compared the parallel analysis on different parallel methods and explored the reasons of such results.
Data Source: https://s3.amazonaws.com/amazon-reviews-pds/readme.html