Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Trending Field to Solr #10057

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

benbdeitch
Copy link
Collaborator

Closes #7429

This PR adds support for trending scores to Solr, allowing us to better track which works are achieving a statistically notable increase in popularity. It adds several new fields, and comes with two scripts to be run-- one daily, the other hourly, to keep this information constantly up to date.

Currently, it's still in draft mode, as there is currently no code to automatically run the scripts.

Technical

This implementation uses Solr's ability to update documents in place, which requires the new trending fields to not be stored or indexed, and instead treated as a docValue. Essentially, they are left out of Solr's inverted index, and instead treated as a more usual document-to-value mapping.

This is both A) more performant than atomic updates, and B) avoids the issues that atomic updates can have with copyfield values.

The relevant cron commands are located in an added file, docker/cron.local

  1. Delete your solr container and all related volumes.
  2. Run docker compose up.
  3. Going to your local solr instance, run a search for a work on Solr (e.g. key:"/works/OL54120W"), and check to ensure that the new fields are present.
  4. Save a work to your 'want-to-read' list.
  5. Set up a docker/cron.local file to run the cron jobs in, along with a new container. Change the times on the cron tasks to run more frequently; (* * * * *) will make them run every minute.
  6. Make sure the container has access to both dbnet and webnet networks, and has depends on: db.
  7. After a minute or so, run the search on Solr again, and see if the appropriate trending fields have updated. You can also check the logs of the cron-jobs container in Docker, to see if they're running correctly.

Screenshot

Stakeholders

@cdrini

@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Nov 20, 2024
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niiiiice! Getting super close; next week after these changes, let's start adding these fields to prod solr I think!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbdeitch notes this is for testing and should be deleted 😁

openlibrary/solr/data_provider.py Outdated Show resolved Hide resolved
openlibrary/solr/data_provider.py Outdated Show resolved Hide resolved
openlibrary/solr/data_provider.py Outdated Show resolved Hide resolved
openlibrary/tests/solr/test_update.py Outdated Show resolved Hide resolved
scripts/calculate_trending_scores_hourly.py Outdated Show resolved Hide resolved
scripts/calculate_trending_scores_hourly.py Outdated Show resolved Hide resolved
scripts/calculate_trending_scores_hourly.py Outdated Show resolved Hide resolved
scripts/calculate_trending_scores_hourly.py Outdated Show resolved Hide resolved
return doc_data


# If the arithmetic mean is below 10/7 (i.e: there have been)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran out of time will finish the rest

@cdrini
Copy link
Collaborator

cdrini commented Dec 11, 2024

Oh I forgot, also add a dummy override of the get_trending_scores method to

class LocalPostgresDataProvider(DataProvider):
with 0s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 2 Important, as time permits. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add trending score to solr
2 participants