This is a project which my teammates and me did in IST 707(Data Analytics) at Syracuse University.
Rotten Tomatoes staff first collect online reviews from writers who are certified members of various writing guilds or film critic-associations. To be accepted as a critic on the website, a critic's original reviews must garner a specific number of "likes" from users. Those classified as "Top Critics" generally write for major newspapers. The critics upload their reviews to the movie page on the website, and need to mark their review "fresh" if it's generally favorable or "rotten" otherwise. It is necessary for the critic to do so as some reviews are qualitative and do not grant a numeric score, making it impossible for the system to be automatic.
Data has been scraped from the publicly available Kaggle Website. Rotten-tomatoes-movies-and-critics-datasets.Rotten tomatoes is a review-aggregation website for movies. From best to worst, the rating consists of ‘certified fresh’, ‘fresh’ and ‘rotten’. Audiences use this website to express their opinions about movies. The reason for us choosing this dataset is it captures all essential attributes we need to build a robust model that predicts how good a model is. It captures the audience as well as the critic score which is essential for capturing the collinearity between the two opinions and make a balanced judgment on how good a movie will be on the basis of various attributes. So, this dataset can reflect the attitude of the market towards movies.