Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: API to perform ORA Workflows batch update #2031

Merged
merged 21 commits into from
Sep 27, 2023
Merged

Conversation

jszewczulak
Copy link
Contributor

@jszewczulak jszewczulak commented Aug 18, 2023

TL;DR - Added API to batch-update ORA Workflows for different scopes. Each endpoint is decorated as a Celery task to be executed asynchronously

JIRA: JIRA-1295

What changed?

  • added new module with API (Tasks

Developer Checklist

Testing Instructions

  • pytest openassessment/tests/test_workflow_batch_update_api.py::TestWorkflowBatchUpdateAPI
  • end-to-end testing will be performed in stage env when Celery worker and queue are in place (separate PR) and scheduler is configured (WIP)

Reviewer Checklist

Collectively, these should be completed by reviewers of this PR:

  • I've done a visual code review
  • I've tested the new functionality

FYI: @openedx/content-aurora

@jszewczulak jszewczulak requested a review from a team as a code owner August 18, 2023 13:37
@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

Attention: 45 lines in your changes are missing coverage. Please review.

Comparison is base (b897495) 95.04% compared to head (5f75d72) 94.92%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2031      +/-   ##
==========================================
- Coverage   95.04%   94.92%   -0.12%     
==========================================
  Files         178      183       +5     
  Lines       18145    18619     +474     
  Branches     1656     1704      +48     
==========================================
+ Hits        17246    17675     +429     
- Misses        677      712      +35     
- Partials      222      232      +10     
Flag Coverage Δ
unittests 94.92% <90.52%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
openassessment/__init__.py 100.00% <100.00%> (ø)
...ment/management/tests/test_update_ora_workflows.py 100.00% <100.00%> (ø)
...ssment/management/commands/update_ora_workflows.py 91.30% <91.30%> (ø)
...nt/workflow/test/test_workflow_batch_update_api.py 99.18% <99.18%> (ø)
openassessment/workflow/tasks.py 52.94% <52.94%> (ø)
...enassessment/workflow/workflow_batch_update_api.py 80.00% <80.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@jansenk jansenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions about the overall approach.

openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@nsprenkle nsprenkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a start but still feels unpolished. There are certainly more elegant ways and with less code duplication by focusing on more modular scope (single submission, group of submissions for a single ORA, and all candidate submissions for an ORA) instead of re-writing the function over and over with slight modification.

I think @jansenk has good points of how to make a smaller set of functions that can be called with the appropriate scope.

openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
@jszewczulak
Copy link
Contributor Author

Based on our discussions I have introduced chained asynchronous task execution and data caching.
These are the scopes (async tasks) that will be executed:

>update_workflows_for_all_blocked_submissions_task
>>update_workflows_for_course_task
>>>update_workflows_for_ora_block_task
>>>>update_workflow_for_submission_task

I left out the (update workflows for) student scope because it breaks the scalability pattern thus requires further complexity in logic and local data cache handling. Out of scope for this effort.

Additional thoughts on implemented changes comparing to earlier (flattened) execution sequence:

Chained/nested Celery tasks execution may prove to be problematic in maintenance:

  1. Celery provides means to control and monitor tasks but they are designed to cover all use cases thus may not be completely coverying our unique needs. E.g. Handling errors when updating individual submissions - before we could set threshold to ignore certain number of errors, then stop execution if the number of errors are greater than threshold.
  2. Asynchronous execution is always challenging to orchestrate, with proper error handling, etc. Stacking multiple async layers, delegating process control and having no single place to handle errors (the way we need) is risky.
  3. This approach scales but most probably we will never need to handle amount of data that would prove to be problematic with earlier, flatted execution chain flow. Submitting a single async job with all submission level updates done synchronously would still delegate task to be executed on separate process but giving us better control over process. I don't think the goal is the infinite scalability here. Single celery process execution environment could be further scaled vertically if needed.
  4. We are losing capability to invoke jobs synchronously
  5. we are forced to keep @shared_task wrappers in the same module as the other API, breaking tasks.py module convention

Data cache - Implemented approach is not the most efficient but it is a compromise to achieve scalability and the API clarity:
update_workflows_for_course(course_id, workflow_update_parameters=None)
For all scopes the workflow_update_parameters data cache dict structure is the same, with each scope consuming the data it requires
To optimize it (lookup) we would need to either introduce more cache dict structures or duplicate data, or create separate DTO class, or introduce class level state, all of which introduce additional complexity, risks and other overheads, hence the compromise.

Copy link
Contributor

@nsprenkle nsprenkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving in the right direction, but could still use some cleanup for better modularity and removing duplicate item lookups.

openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
@jansenk jansenk self-requested a review September 13, 2023 21:10
Copy link
Contributor

@nsprenkle nsprenkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still one necessary optimization blocking approval, then a few nits which can be taken into consideration or ignored.

openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jansenk jansenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions. it seems like you may be making this more complicated than this needs to be?

openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jansenk jansenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I want to merge this too, but I noticed a couple more issues

openassessment/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow/workflow_batch_update_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jansenk jansenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few small nits I'd like if you'd address but other than that I believe this is complete

@@ -259,7 +262,7 @@ def get_blocked_peer_workflows(course_id=None, item_id=None, submission_uuid=Non

filters = {
'created_at__lte': timezone.now() - datetime.timedelta(days=7),
'completed_at__isnull': True,
'grading_completed_at__isnull': True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add completed_at__isnull: False here. If a learner hasn't completed the number of peer reviews they need to complete, their workflow isn't going to do anything when we update it.

openassessment/workflow/workflow_batch_update_api.py Outdated Show resolved Hide resolved
openassessment/workflow/workflow_batch_update_api.py Outdated Show resolved Hide resolved
@jszewczulak jszewczulak merged commit 5a09368 into master Sep 27, 2023
7 of 9 checks passed
@jszewczulak jszewczulak deleted the js/ora_batch_update branch September 27, 2023 00:18
nsprenkle pushed a commit that referenced this pull request Nov 6, 2023
* feat: added API to perform batch ORA workflow update
* feat: added Celery async tasks for batch ORA workflow update
BryanttV pushed a commit to eduNEXT/edx-ora2 that referenced this pull request Feb 6, 2024
* feat: added API to perform batch ORA workflow update
* feat: added Celery async tasks for batch ORA workflow update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants