Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snow Leopards- Ja Hopkins #153

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Snow Leopards- Ja Hopkins #153

wants to merge 15 commits into from

Conversation

jahopkins
Copy link

I couldn't finish wave 3

Copy link

@anselrognlie anselrognlie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start.

The tests for the waves you completed are passing, and that code is well-organized. I included feedback on the first function of wave 3, as well as an example of applying the suggested changes to that function.

Take a look at that, and then let's get together to look over the rest of the project!

raise Exception("Test needs to be completed.")
# raise Exception("Test needs to be completed.")

assert updated_data["watched"][0]["title"] == MOVIE_TITLE_1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +152 to +153
assert updated_data["watched"][1] == movie_to_watch
assert movie_to_watch in updated_data["watched"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two checks are more or less doing the same thing. Consider using three in checks, one for checking which movie is remaining int the watchlist list, and two for each of the movies we expect to find in the watched list

@@ -27,7 +27,7 @@ def test_empty_watched_average_rating_is_zero():
# Assert

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 It looks like you missed uncommenting the skip on the test on line 18. When I did so, the test passed. 👍

# dictionary contains 3 key-value pairs
# return none if falsy
title_genre_rating = {}
if title and genre and rating:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice truthy checks. But consider reversing the sense of the comparison so that the more error-like condition is handled within the conditional check, and the main logic (making the dictionary) doesn't need to be nested.

Structuring the checks to look for the more error-like conditions is a pattern referred to as a guard clause, and it's a common way of organizing our code.

    if not title or not genre or not rating:
        # code to run when one of the inputs is falsy

    # code to run when all inputs are truthy

# return none if falsy
title_genre_rating = {}
if title and genre and rating:
title_genre_rating.update({'title': title, 'genre': genre, 'rating': rating})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that the argument to update is a newly created dictionary. We could assign that new dictionary directly to our variable, or even return it, rather than updating the empty dictionary we started with.

        title_genre_rating = {'title': title, 'genre': genre, 'rating': rating}

genres_watched[movie["genre"]] += 1

num_of_genres_watched = genres_watched.values()
max_num_genres_watched = max(num_of_genres_watched)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of max to find the highest count, but notice then that we must iterate back through the data again and to find a genre that had this count. Alternatively, we could iterate over the frequency map ourselves to find the max count, keeping track of the key (genre) ourself.

We'll see that there's really not a performance difference, though, and arguably this code is a little simpler. (but writing a max finder ourself is great practice!).

unique_movies = []
for watched_movie in user_data["watched"]:
# friends_movies.append(watched_movie)
for movie in user_data["friends"]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more accurate name for what this variable represents is friend. We're iterating over the friends list, which is a list of friend dictionaries, not movie dictionaries.

# friends_movies.append(watched_movie)
for movie in user_data["friends"]:
# for movie in user_data["friends"]:
if watched_movie["title"] in unique_movies:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is supposed to be building up a list of movies not their titles, so I would expect unique_movies to contain movies, not titles.

# for movie in user_data["friends"]:
if watched_movie["title"] in unique_movies:
continue
elif watched_movie["title"] not in movie["watched"]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From above, movie would be more accurately called friend. So this code is checking whether the title (a string) of the movie we're looking at is in a list of movies (each is a dictionary), which will never be true. A string does not equal a dictionary (python will not look inside the dictionaries for matching values).

So to determine whether a the watched_movie is something this friend has watched, we should check something like

            elif watched_movie not in friend["watched"]:

if watched_movie["title"] in unique_movies:
continue
elif watched_movie["title"] not in movie["watched"]:
unique_movies.append(watched_movie["title"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we make the above change, we can't be certain that watched_movie should be added to the reasult list. It hasn't been watched by this friend, but it could still have been watched by a different friend.

What we could do is use a boolean value to see whether we've seen the movie in any of the friend's lists (initialized to False, since we wouldn't have found the movie yet). If we find the movie, set the flag to True. If we get through all the friends and haven't found the movie, then we can add it to the result list.

Try adding this additional logic yourself. I have one possible way of doing this in the next comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    unique_movies = []
    for watched_movie in user_data["watched"]:
        found_movie = False
        for friend in user_data["friends"]:
            if watched_movie in unique_movies:
                found_movie = True
            elif watched_movie in friend["watched"]:
                found_movie = True
                
        if not found_movie:
            unique_movies.append(watched_movie)

    return unique_movies

Copy link

@anselrognlie anselrognlie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I added a few more comments primarily around how we could think about improving the time complexity in Wave 3.

Comment on lines +60 to +61
assert friends_unique_movies[-2] == HORROR_1
assert friends_unique_movies[0] == FANTASY_4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There weren't any particular specifications as to the ordering of the movies in the list of results. As a result, I would tend to use an in check for the movies we expect to see, since that won't require knowing at what exact position each movie ended up. This lets us enforce the required behaviors in the test without being overly strict.

In fact, the checks from the previous test are exctly what we would need to add.

    assert INTRIGUE_3 in friends_unique_movies
    assert HORROR_1 in friends_unique_movies
    assert FANTASY_4 in friends_unique_movies

Since this test already has a check for the list having 3 items, if we then check for 3 expected items and find them all, this also means that each item appears in the result only once (no duplication).

Comment on lines +95 to +98
for friend in user_data["friends"]:
for movie_details in friend["watched"]:
movie_title = movie_details["title"]
movies_watched_by_friends.append(movie_title)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way to build up a list of movie titles the friends have watched. Notice that since you decided to use the title as representative of the entire movie record ( areasonable decision given our data), we could use a set for the movies_watched_by_friends collection rather than a list. This would improve the complexity of the in check on line 102 (since sets have constant time lookup, while a list is linear time).

if movie_watched not in movies_watched_by_friends:
unique_movies.append(movie)
# watched_titles.append(movie["title"])
return(unique_movies)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for () around the return value. We can wrap the value with () if we are returning the result of a long expression and need to line wrap, but otherwise, we generally leave them off. We would also leave a space between return and the () around the expression, since return isn't a function and we don't want it to look like we're doing a function call.

Comment on lines +111 to +114
for movie in user_data["watched"]:
movie_watched = movie["title"]
# add all the movies watched to new list
all_movies_watched.append(movie_watched)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, since you're using the title as a proxy for the entire moview, we could use a set for all_movies_watched for improved in check performance.

for movie_details in friend["watched"]:
movie_title = movie_details["title"]
if (movie_title not in all_movies_watched and
movie_details not in unique_movies_watched_by_friends):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way to filter out a movie that's already been added from another friend. Here, we do need to build a list of movie detail records, not just the title. So unique_movies_watched_by_friends must be a list. However, if we still wanted to gain the constant in lookup even for the result list, we could create another set in which we store only the titles of the movies we've added to unique_movies_watched_by_friends, and look up a movie's title in that set before adding it to the result list, and adding the title to the set if we do end up adding a movie to the result list.

As it is, doing the in checks (which are linear) in a loop (essentially linear) gives us an overall quadratic time complexity. But we could get this down to linear by building that extra set, since the operations in the loop would become constant.


# -----------------------------------------
# ------------- WAVE 4 --------------------
# -----------------------------------------

def get_available_recs(user_data):
recommended_movies = []
friends_watched = get_friends_unique_watched(user_data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Great reuse of your get_friends_unique_watched function to simplify the implementation here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants