Skip to content

Latest commit

 

History

History
25 lines (22 loc) · 5.78 KB

README.md

File metadata and controls

25 lines (22 loc) · 5.78 KB

RecommenderSystems

Recommender Systems

In this notebook, I practiced building different types of recommender systems using the Pandas library and various machine learning algorithms.

Part 1 - Popularity-Based Recommenders

Our main goal in this section is to develop a recommender system that suggests restaurants to consumers. Initially, I started with 2 datasets from the University of California, Irvine, "Restaurant & consumer data". To achieve my goal, I made a popularity-based recommender that works based on the popularity of restaurants among users. The assumption is that the restaurant that has the highest counts of rating is the most popular. The downside to this method is that it cannot produce personalized results. By Looking at the rating counts, I found the top 5 popular places. Then, I looked out the cuisine of the top 5 rated restaurants which turned out most of them are Mexican.

table_1

Part 2 - Correlation-Based Recommenders

Here, I continue my efforts in building a recommender system that offers the best possible options to consumers. I decided to make a correlation-based recommender which unlike popularity-based systems, does take users' preferences into account. In correlation-based recommendation systems, items are recommended based on users' reviews. In other words, it chooses the items based on how well the items correlate with other items with respect to users' ratings. Correlation-based recommendation systems use Pearson's R correlation to offer the items which are most similar to the past chosen items by users.
In this section, I added another dataset from the same source which provides name, address, and other details about the restaurants. I separated the place ID and name column to merge with the previous datasets. Then, I calculated the average rating that each place is given. I also counted the number of ratings for each place to see how popular each place is. I found that there have been 130 unique places that have been reviewed in the data frame. A restaurant called 'Tortas Locas Hipocampo' from Mexican cuisine has the highest amount of ratings. I was curious to see what restaurants are the most similar to Tortas so I can recommend them to consumers. So, I calculated the correlation between users' ratings to Tortas and other restaurants. I got a list of restaurants with Pearson's R value. After careful post-processing, I recommended Restaurante 'El Reyecito' to the users who like 'Tortas'. This process is reproducible with any other restaurant.

Part 3. Classification-Based Collaborative Filtering (Machine Learning Based Recommenders - Logistic Regression)

Here, I wanted to build a system that can help banks to decide which new users they should offer their banking system to. So, I developed a classification-based collaborative filtering system that is able to make personalized recommendations since it is taking into account the users' attributes as well as purchase history and other contextual data(e.g. browser history). The data of this section has been obtained from the University of California, Irvine " Bank Marketing Data Set ". To predict the clients that have the highest chance of accepting the bank offer, I focused on the client dataset and used a logistic regression algorithm to train the model. The model has been evaluated and tested with various metrics (e.g. precision, recall, f1-score). I got 89% accuracy with my model.

Part 4. Collaborative Filtering System

In this section, I aimed to recommend movies to the users based on the previous movies that they enjoyed. The recommender system that I made in this section was collaborative filtering which works based on reactions by similar users. To build that, I start by obtaining two datasets from "grouplens of the University of Minnesota". I merged the two datasets and then I made a utility matrix by dividing the data into two categories: the users and the items. Each user likes certain items, and the rating value rij (from 1 to 5) is the data associated with each user i and item j and represents how much the user appreciates the item. By using the TruncatedSVD from Sci-Kit Learn, I decomposed a utility matrix into three compressed matrices. It was very useful as it provides efficiency to not refer back to the original dataset. In addition, SVD provides latent variables that are available and affecting the behavior of a dataset. In the next step, I used Pearson’s R correlation coefficient to find out how similar each movie to other movies on the basis of user tastes. As an example, I isolated the STAR WARS movie to find the next best movie for Star Wars lovers. Below is the list of results that I got:
table_1

Part 5. Content_based Recommender

In the last section, I was curious to make a recommender system that can recommend items based on certain attributes. Here, I used the mtcars dataset from kaggle. In this sense, if a client tells you his/her preference in finding a car with certain numbers of cylinders, forward gears, carburetor, etc, you will be able to recommend the best car from the dataset that matches the client's preferences. In order to do that, I used the nearest neighbors algorithm to train the model.