Skip to content

Commit

Permalink
Merge pull request #19 from emirkmo/collab_filter
Browse files Browse the repository at this point in the history
Add Course 3 notes and code
  • Loading branch information
emirkmo authored Nov 19, 2023
2 parents 0991b9c + dbeabc1 commit 28d0a9b
Show file tree
Hide file tree
Showing 5 changed files with 177 additions and 1 deletion.
12 changes: 12 additions & 0 deletions Course3/Notes/collab_filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Collaborative Filtering Algorithm

Learn both feature vector X and user parameter (linear regression) vector W and b.
Users (samples) that rated, i.e. have a parameter for a given feature, are kept track of in
an binar matrix R. Matrix Y are the ratings. Features X and parameters W and b must be learned collaboratively.

Features = X
User pars = w, b
R = mapping between users and movie ratings
Y = movie ratings

Y(movie, user) = R(movie, user) * (w(user) . x(movie) + b(user))
119 changes: 119 additions & 0 deletions Course3/Notes/content_filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Content based filtering

## Difference to collaborative filtering

Learning to match features instead of learning
from parameters on features.

So users have features, movies have features,
create vector for each feature set, predict user/movie
rating match. (Recommend movie to user or predict user score for movie).

No constant vector `b`.

`V_M . V_U`. Must calculate from feature vector.

### How to calculate V? Use deep learning (neural network NN)

NN output layer should not have single unit, but many
(unit per vector element) HOW MANY?? (idk, 32). Hidden layers can be any complexity, but output layers of `V_M`` and `V_U` must match!

Instead of dot product, simply take sigmoid etc. of
V_U and V_M, and find where g(V.V) = 1.

## Cost Function

```Latex
J = Sum (v_u(j) . v_m(i) - y(i,j)) + NN regularization.
```

Basically need labels Y, with existing movie/user ratings(matches).
Same cost function for NN for both vectors.

### Tips

To Find: Similar movies take L2 norm distance.
This can and should be pre-computed!
Now you have a similarity matrix. Movies are related like
a graph.

NN benefit realized: Allows easily integrating movie and
user NN by taking dot product of outer layer of each.
Really powerful!

The feature engineering is critical.

Algorithm as described is computational expensive to run,
need modifications to scale.

## Scale up Recommender system

Retrieval & Ranking

### Retrieval

Generate large list of plausible item candidates.

Use pre-computed `||Vm(k) - V_m(j) || ^2`

Find similar movies, most viewed 3 genres, top movies of
all times, top X movies in same country, etc.

### Ranking

Now we have small list of movies, rank them.
V_m can be pre-computed (since new users and user
feature values change way more often).
We only need to calculate V_u from pared retrieval
step, which is fast. Can be done on edge.

Retrieval step should be tuned using offline experiments
and A/B testing, etc.

## Ethics

Don't be evil. Don't be naive.
Think about goal. Think about bad actors.

Be transparent with users. Need to be careful with exploitative recommendations.

## Tensorflow Recommender Algorithm

Same as NN, Sequential model from keras

```Python
import tensorflow as tf

user_nn = tf.keras.models.Sequential([tf.keras.layers.Dense(..., activation='relu'), ...])
...

# add input layer
user_input = tf.keras.layers.Input(shape=(num_user_features))

vu = user_nn(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1) # normalize the L2 norm, Yo!
# Repeat for item/movie
vm = ...

# Keras dot product layer
tf.keras.layers.Dot(axes=1)([vu, vm])

# Use simple MSE for loss
cost_fn = tf.keras.losses.MeanSquaredError # I guess, idk.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# Training model using keras api.
n_iterations = 30
model = tf.keras.Model([input_user, input_item], output)
model.compile(optimizer=optimizer, loss=cost_fn)
model.fit([user_train, item_train], y_train, epochs=n_iterations)
```

### Lab

Using sklearn StandardScaler for user but MinMaxScaler for target. Not clear why. Uses `inverse_transform` of scaler to get back originals. Ready-made `test_train_split` for the split with a 20% test.

Based on the fact that test loss is similar to training
loss, we infer that model has not substantially overfit.
(Weird to not use CV set, but model params and parts were
just given, so no need.)
11 changes: 11 additions & 0 deletions Course3/Notes/pca.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# PCA

Each pc is projection that "explains" maximum variance.
Used to be good for dimensionality reduction and compression
, especiall during training or feature selection,
but nowadays mainly used for visualization in AI/ML.

Eigen vector and eigen value for deeper understanding.

Just use sklearn.
I published a paper on this so need for more.
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,7 @@ exclude = '''
| htmlcov
| .coverage
)/
'''
'''

[tool.mypy]
plugins = "numpy.typing.mypy_plugin"
31 changes: 31 additions & 0 deletions rawsight/recommender/collaborative_filtering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import numpy as np
import numpy.typing as npt


def cofi_cost_func(
X: npt.NDArray[np.number],
W: npt.NDArray[np.number],
b: npt.NDArray[np.number],
Y: npt.NDArray[np.number],
R: npt.NDArray[np.number],
lam: float,
) -> float:
"""Return cost with regularization using numpy for collaborative learning
Args:
X np(num_feature_samples, num_features)): matrix of feature samples
W np(num_parameter_samples, num_features)) : matrix of parameter samples
b np(1, num_parameter_samples) : constant parameter vector per param sample.
Y np(num_feature_samples,num_parameter_samples) : matrix of pars per feature sample
R np(num_feature_samples,num_parameter_samples) : R(i, j) = 1 if feature sample has parameters.
lam (float): regularization parameter
Simples example is X features of movies and W is features of user ratings (for movies)
Y is matrix of user ratings for each movie and R just records if a user rated a movie.
"""
# Regularization is simple and applies to all values
regularization: float = (np.sum(W**2) + np.sum(X**2)) * (lam / 2)

# Linear regression analog vectorized implementation.
cost: float = np.sum((R * (np.dot(X, W.T) + b - Y)) ** 2) / 2

return cost + regularization

1 comment on commit 28d0a9b

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
rawsight
   __init__.py90100% 
   input_validation.py1444 71%
   normalization.py952121 78%
   optimizers.py411515 63%
   regression.py972121 78%
   scoring.py311 67%
rawsight/cost_functions
   __init__.py20100% 
   cost_function_factory.py5588 85%
   cost_functions.py4077 82%
   regularization.py2333 87%
rawsight/datasets
   __init__.py10100% 
   datasets.py943939 59%
rawsight/models
   __init__.py40100% 
   linear.py2333 87%
   logistic.py2744 85%
   model.py1123434 70%
   polynomial.py1188 27%
   softmax.py341818 47%
rawsight/nn
   __init__.py00100% 
   layers.py575757 0%
   networks.py282828 0%
rawsight/tests
   __init__.py00100% 
   test_binary_tree.py260100% 
   test_linear_regression.py6433 95%
   test_logistic_regression.py5644 93%
   test_normalization.py4555 89%
   test_softmax.py431616 63%
rawsight/trees
   __init__.py40100% 
   _splitter_protocol.py611 83%
   binary_tree.py290100% 
   infogain.py2922 93%
   splitting.py3011 97%
   tree.py50100% 
   tree_builder.py4711 98%
TOTAL115430474% 

Please sign in to comment.