-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #19 from emirkmo/collab_filter
Add Course 3 notes and code
- Loading branch information
Showing
5 changed files
with
177 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Collaborative Filtering Algorithm | ||
|
||
Learn both feature vector X and user parameter (linear regression) vector W and b. | ||
Users (samples) that rated, i.e. have a parameter for a given feature, are kept track of in | ||
an binar matrix R. Matrix Y are the ratings. Features X and parameters W and b must be learned collaboratively. | ||
|
||
Features = X | ||
User pars = w, b | ||
R = mapping between users and movie ratings | ||
Y = movie ratings | ||
|
||
Y(movie, user) = R(movie, user) * (w(user) . x(movie) + b(user)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# Content based filtering | ||
|
||
## Difference to collaborative filtering | ||
|
||
Learning to match features instead of learning | ||
from parameters on features. | ||
|
||
So users have features, movies have features, | ||
create vector for each feature set, predict user/movie | ||
rating match. (Recommend movie to user or predict user score for movie). | ||
|
||
No constant vector `b`. | ||
|
||
`V_M . V_U`. Must calculate from feature vector. | ||
|
||
### How to calculate V? Use deep learning (neural network NN) | ||
|
||
NN output layer should not have single unit, but many | ||
(unit per vector element) HOW MANY?? (idk, 32). Hidden layers can be any complexity, but output layers of `V_M`` and `V_U` must match! | ||
|
||
Instead of dot product, simply take sigmoid etc. of | ||
V_U and V_M, and find where g(V.V) = 1. | ||
|
||
## Cost Function | ||
|
||
```Latex | ||
J = Sum (v_u(j) . v_m(i) - y(i,j)) + NN regularization. | ||
``` | ||
|
||
Basically need labels Y, with existing movie/user ratings(matches). | ||
Same cost function for NN for both vectors. | ||
|
||
### Tips | ||
|
||
To Find: Similar movies take L2 norm distance. | ||
This can and should be pre-computed! | ||
Now you have a similarity matrix. Movies are related like | ||
a graph. | ||
|
||
NN benefit realized: Allows easily integrating movie and | ||
user NN by taking dot product of outer layer of each. | ||
Really powerful! | ||
|
||
The feature engineering is critical. | ||
|
||
Algorithm as described is computational expensive to run, | ||
need modifications to scale. | ||
|
||
## Scale up Recommender system | ||
|
||
Retrieval & Ranking | ||
|
||
### Retrieval | ||
|
||
Generate large list of plausible item candidates. | ||
|
||
Use pre-computed `||Vm(k) - V_m(j) || ^2` | ||
|
||
Find similar movies, most viewed 3 genres, top movies of | ||
all times, top X movies in same country, etc. | ||
|
||
### Ranking | ||
|
||
Now we have small list of movies, rank them. | ||
V_m can be pre-computed (since new users and user | ||
feature values change way more often). | ||
We only need to calculate V_u from pared retrieval | ||
step, which is fast. Can be done on edge. | ||
|
||
Retrieval step should be tuned using offline experiments | ||
and A/B testing, etc. | ||
|
||
## Ethics | ||
|
||
Don't be evil. Don't be naive. | ||
Think about goal. Think about bad actors. | ||
|
||
Be transparent with users. Need to be careful with exploitative recommendations. | ||
|
||
## Tensorflow Recommender Algorithm | ||
|
||
Same as NN, Sequential model from keras | ||
|
||
```Python | ||
import tensorflow as tf | ||
|
||
user_nn = tf.keras.models.Sequential([tf.keras.layers.Dense(..., activation='relu'), ...]) | ||
... | ||
|
||
# add input layer | ||
user_input = tf.keras.layers.Input(shape=(num_user_features)) | ||
|
||
vu = user_nn(input_user) | ||
vu = tf.linalg.l2_normalize(vu, axis=1) # normalize the L2 norm, Yo! | ||
# Repeat for item/movie | ||
vm = ... | ||
|
||
# Keras dot product layer | ||
tf.keras.layers.Dot(axes=1)([vu, vm]) | ||
|
||
# Use simple MSE for loss | ||
cost_fn = tf.keras.losses.MeanSquaredError # I guess, idk. | ||
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01) | ||
|
||
# Training model using keras api. | ||
n_iterations = 30 | ||
model = tf.keras.Model([input_user, input_item], output) | ||
model.compile(optimizer=optimizer, loss=cost_fn) | ||
model.fit([user_train, item_train], y_train, epochs=n_iterations) | ||
``` | ||
|
||
### Lab | ||
|
||
Using sklearn StandardScaler for user but MinMaxScaler for target. Not clear why. Uses `inverse_transform` of scaler to get back originals. Ready-made `test_train_split` for the split with a 20% test. | ||
|
||
Based on the fact that test loss is similar to training | ||
loss, we infer that model has not substantially overfit. | ||
(Weird to not use CV set, but model params and parts were | ||
just given, so no need.) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# PCA | ||
|
||
Each pc is projection that "explains" maximum variance. | ||
Used to be good for dimensionality reduction and compression | ||
, especiall during training or feature selection, | ||
but nowadays mainly used for visualization in AI/ML. | ||
|
||
Eigen vector and eigen value for deeper understanding. | ||
|
||
Just use sklearn. | ||
I published a paper on this so need for more. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,4 +66,7 @@ exclude = ''' | |
| htmlcov | ||
| .coverage | ||
)/ | ||
''' | ||
''' | ||
|
||
[tool.mypy] | ||
plugins = "numpy.typing.mypy_plugin" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
import numpy as np | ||
import numpy.typing as npt | ||
|
||
|
||
def cofi_cost_func( | ||
X: npt.NDArray[np.number], | ||
W: npt.NDArray[np.number], | ||
b: npt.NDArray[np.number], | ||
Y: npt.NDArray[np.number], | ||
R: npt.NDArray[np.number], | ||
lam: float, | ||
) -> float: | ||
"""Return cost with regularization using numpy for collaborative learning | ||
Args: | ||
X np(num_feature_samples, num_features)): matrix of feature samples | ||
W np(num_parameter_samples, num_features)) : matrix of parameter samples | ||
b np(1, num_parameter_samples) : constant parameter vector per param sample. | ||
Y np(num_feature_samples,num_parameter_samples) : matrix of pars per feature sample | ||
R np(num_feature_samples,num_parameter_samples) : R(i, j) = 1 if feature sample has parameters. | ||
lam (float): regularization parameter | ||
Simples example is X features of movies and W is features of user ratings (for movies) | ||
Y is matrix of user ratings for each movie and R just records if a user rated a movie. | ||
""" | ||
# Regularization is simple and applies to all values | ||
regularization: float = (np.sum(W**2) + np.sum(X**2)) * (lam / 2) | ||
|
||
# Linear regression analog vectorized implementation. | ||
cost: float = np.sum((R * (np.dot(X, W.T) + b - Y)) ** 2) / 2 | ||
|
||
return cost + regularization |
28d0a9b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coverage Report