Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add auto switch scheduler e2e test #3486

Merged
merged 4 commits into from
Sep 29, 2024

Conversation

PoloDyBala
Copy link
Contributor

@PoloDyBala PoloDyBala commented Sep 8, 2024

Description

This pull request introduces a test to evaluate the behavior of the Dragonfly system when certain scheduler instances are stopped during a preheat job. The test aims to ensure that the system can handle the preheat job correctly even if some scheduler instances are not available.

Test Summary

  1. Preheat Job Execution:

    • Start a preheat job for the file /bin/md5sum using the Dragonfly manager.
    • Verify that the preheat job starts successfully and retrieve its job ID.
  2. Scheduler Scaling:

    • Before scaling down, retrieve and print the IP addresses of the scheduler pods to ensure they are active.
    • Wait for a short period to ensure that all scheduler instances are active.
    • Scale down the number of scheduler replicas to 1.
    • The replica count is restored to its original value after the job completes.
  3. Job Completion Check:

    • Monitor the status of the preheat job until it completes.
    • Ensure that the job completes successfully.
  4. Scheduler IP Verification:

    • After scaling down, retrieve and print the IP addresses of the remaining scheduler pods.
  5. File Integrity Check:

    • Verify the integrity of the preheated file by comparing its SHA256 hash with the expected value.
  6. View logs in /tmp/artifact:

    • Nodes selected without deleting the scheduler

00935b3d8d634430899af37ef335991

  • The selected nodes when deleting the scheduler

63a9314862a077f704cd51359d76576

Key Points

  • Scaling Down Scheduler: The test involves scaling down the scheduler replicas to test the system's robustness. The scaling operation is reversed after the test to restore the original state.
  • Job Monitoring: A mechanism is in place to check the completion status of the preheat job and ensure it succeeds.
  • File Integrity: After the job completion, file integrity is verified by comparing the computed SHA256 hash with the expected hash.

Related Issue

dragonflyoss/client#591

@PoloDyBala PoloDyBala requested a review from a team as a code owner September 8, 2024 14:06
@PoloDyBala PoloDyBala changed the title feature:add auto switch scheduler e2e test feat:add auto switch scheduler e2e test Sep 8, 2024
@PoloDyBala PoloDyBala changed the title feat:add auto switch scheduler e2e test feat: add auto switch scheduler e2e test Sep 8, 2024
Copy link
Member

@gaius-qi gaius-qi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gaius-qi gaius-qi merged commit 4a7ae85 into dragonflyoss:main Sep 29, 2024
26 checks passed
Copy link

codecov bot commented Sep 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 51.82%. Comparing base (6b46d0e) to head (f26ba83).
Report is 27 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           main    #3486       +/-   ##
=========================================
+ Coverage      0   51.82%   +51.82%     
=========================================
  Files         0      190      +190     
  Lines         0    20377    +20377     
=========================================
+ Hits          0    10561    +10561     
- Misses        0     9012     +9012     
- Partials      0      804      +804     
Flag Coverage Δ
unittests 51.82% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 190 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants