Note: "All new ideas/techniques welcome." Thanks
This uses memory based collaborative filtering technique.
Collaborative Filtering:
Data: ratings.txt is converted into user_id(index) and Movie_id(columns) table.
From the user_movie table found the user user similarity matrix.
Selected top k similar users i.e top k columns for a given user_id. Note: Here, k = 10 used. k value can be varied depending on importance of precision or recall.
Both pearson correlation and cosine similarity is experimented and finally selected cosine similarity, because of its simplicty and quickness.
For a given user id, ratings are predicted using the prediction formula.
Movies with highest predicted ratings are recommended, for which the active user have not rated yet.
Popular/Top rated model:
Arranged the movies in descending order according to the most number of user watched and the corresponding movie rating.
Selected top n movies.
Recommend those movies from top n for which the active user have not seen the movie yet.
Areas of improvlment:
Can combine machine learninbg techniques for personal profiling.
Implementatipon of new user new item or movie
SVD can be used to reduce sparse matrix.
Can combine content based filtering to lalready existing one.
Note: Update of User-Movie table takes lots of time nearly 30 minutes. If possible, this need to be reduced with efficipent coding.
Or can be implemented in Apache Spark
Front End:
Contains 3 pages, login page, home page and admin page.
Login page contains list of test user ids and test password for testing purpose. pwd: 123Swaroop
Home page, contains Top rated and Recommended movies for a given user id.
Admin page can be used to update user movie table. Admin pwd:(check the code)