Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs/ufs: change default locking protocol #12756

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

edgargabriel
Copy link
Member

@edgargabriel edgargabriel commented Aug 13, 2024

The fs/ufs component by default disables all file locking before read/write operations (except for NFS file systems). This was based on the assumption, that the OS itself performs the required locking operation and hence we don't have to add to it.

This assumption is incorrect when using data sieving. In data sieving, the code 'ignores' small gaps when we write to a file, and perform instead a read-modify-write sequence ourselves for performance reasons. The OS can however not protect the gap in the read-modify-write sequence. In addition, within a collective operation not all aggregators might want to use data sieving. Hence, enabling locking just for the data-sieving routines is insufficient, all processes have to perform the locking. Therefore, our two options are: a) either disable write data-sieving by default, or b) enable range-locking by default.

After some testing, I think enabling range-locking be default is the safer and better approach. It doesn't seem to show a significant performance impact on my test systems.

Note, that on Lustre file systems, we can keep the default to no-locking as far as I can see, since the collective algorithm used by Lustre is unlikely (maybe even impossible) to produce this pattern. I did add in however an mca parameter that allows us to control the locking algorithm used by the Lustre component as well, in case we need to change that for a particular use-case or platform.

Fixes Issue #12718

The fs/ufs component by default disabled all file locking before
read/write operations (except for NFS file systems). This was based on
the assumption, that the file system itself performs the required
locking operation and hence we don't have to add to it.

This assumption is incorrect when using data sieving. In data sieving,
the code 'ignore' small gaps when we write to a file, and perform
instead a read-modify-write sequence ourselves for performance reasons.
The problem is however that even within a collective operation not all
aggregators might want to use data sieving. Hence, enabling locking just
for the data-sieving routines is insufficient, all processes have to
perform the locking. Therefore, our two options are: a) either disable
write data-sieving by default, or b) enable range-locking by default.

After some testing, I think enabling range-locking be default is the
safer and better approach. It doesn't seem to show any significant
performance impact on my test systems.

Note, that on Lustre file systems, we can keep the default to no-locking
as far as I can see, since the collective algorithm used by Lustre is
unlikely to produce this pattern. I did add in however an mca parameter
that allows us to control the locking algorithm used by the Lustre
component as well, in case we need to change that for a particular
use-case or platform.

Fixes Issue open-mpi#12718

Signed-off-by: Edgar Gabriel <[email protected]>
@@ -93,6 +95,15 @@ lustre_register(void)
MCA_BASE_VAR_TYPE_INT, NULL, 0, 0,
OPAL_INFO_LVL_9,
MCA_BASE_VAR_SCOPE_READONLY, &mca_fs_lustre_stripe_width);
mca_fs_lustre_lock_algorithm = 0;
(void) mca_base_component_var_register(&mca_fs_lustre_component.fsm_version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (feel free to ignore) This could be more user friendly if it was implemented with mca_base_var_enum_t, but requires more code.

}
}
else {
fh->f_flags |= OMPIO_LOCK_NEVER;
}
free (fstype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not knowing this code path very well, it looks to me like you handled Lustre and NFS just fine. However, is there any case other than NFS and Lustre? Seems that previously such a case would have set OMPIO_LOCK_NEVER but now it would be unset (LOCK_RANGES).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fs/ufs component handles everything that is not Lustre at the moment (e.g. NFS, local file systems, BeeGFS, etc.). We treat all of them equal except for NFS. Lustre requires special handling because it allows us to modify some additional parameters (stripe size, stripe count, etc.), hence it has its own component. So from the fs/ufs component perspective its either NFS (in which case we activate locking) or everything else (in which case we now do range locking instead of no locking).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot that we also have separate components for GPFS and IME, not sure how much they are in use however. We also used to have separate components for PVFS2/OrangeFS, but that was removed a few weeks ago.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-read the commit message. Your intent is to change those paths from LOCK_NEVER to LOCK_RANGE, which is what you have done. 👍

}
}
else {
fh->f_flags |= OMPIO_LOCK_NEVER;
}
free (fstype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-read the commit message. Your intent is to change those paths from LOCK_NEVER to LOCK_RANGE, which is what you have done. 👍

@edgargabriel
Copy link
Member Author

@lrbison thank you!

@edgargabriel edgargabriel merged commit 247cfb6 into open-mpi:main Aug 14, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants