-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stale] Compare identical environments by using a same_file::Handle
#179
Draft
darnuria
wants to merge
11
commits into
meilisearch:main
Choose a base branch
from
darnuria:open_env_device_inode
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
e880cf7
Check env file already open with samefile.
darnuria 092ab54
Apply Review suggestion: deconstruct explicitly handle.
darnuria ded5dd8
Fixup documentation of the OPENED_ENV.
darnuria 86da304
Review: we don't need canonnicalize anymore.
darnuria 295d159
Fix hardlink test.
darnuria d720f09
fixup! document path changes
darnuria b065e91
Fixup! Hardlink will not work reviewed my unixbasics
darnuria 641d229
Fixup! Integrate suggestion from review about the tests.
darnuria 833e433
Fix symlink no subdir
darnuria 911ba89
Fix windows details.
darnuria 83b4386
Windows moving dir is just a burden.
darnuria File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ use std::{ | |
use std::{fmt, io, mem, ptr, sync}; | ||
|
||
use once_cell::sync::Lazy; | ||
use same_file::Handle; | ||
use synchronoise::event::SignalEvent; | ||
|
||
use crate::mdb::error::mdb_result; | ||
|
@@ -29,34 +30,25 @@ use crate::{ | |
|
||
/// The list of opened environments, the value is an optional environment, it is None | ||
/// when someone asks to close the environment, closing is a two-phase step, to make sure | ||
/// noone tries to open the same environment between these two phases. | ||
/// none tries to open the same environment between these two phases. | ||
/// | ||
/// Trying to open a None marked environment returns an error to the user trying to open it. | ||
static OPENED_ENV: Lazy<RwLock<HashMap<PathBuf, EnvEntry>>> = Lazy::new(RwLock::default); | ||
/// Trying to open a `None` marked environment returns an error to the user trying to open it. | ||
/// | ||
/// [same_file::Handle] abstract the platform details of checking if the given paths points to | ||
/// the same file on the filesystem. | ||
/// | ||
/// ## Safety | ||
/// | ||
/// Mind that Handle currently open the file, so avoid writing through the fd held by [same_file::Handle], | ||
/// since the file will also be opened by LMDB. | ||
static OPENED_ENV: Lazy<RwLock<HashMap<Handle, EnvEntry>>> = Lazy::new(RwLock::default); | ||
|
||
struct EnvEntry { | ||
env: Option<Env>, | ||
signal_event: Arc<SignalEvent>, | ||
options: EnvOpenOptions, | ||
} | ||
|
||
// Thanks to the mozilla/rkv project | ||
// Workaround the UNC path on Windows, see https://github.com/rust-lang/rust/issues/42869. | ||
// Otherwise, `Env::from_env()` will panic with error_no(123). | ||
#[cfg(not(windows))] | ||
fn canonicalize_path(path: &Path) -> io::Result<PathBuf> { | ||
path.canonicalize() | ||
} | ||
|
||
#[cfg(windows)] | ||
fn canonicalize_path(path: &Path) -> io::Result<PathBuf> { | ||
let canonical = path.canonicalize()?; | ||
let url = url::Url::from_file_path(&canonical) | ||
.map_err(|_e| io::Error::new(io::ErrorKind::Other, "URL passing error"))?; | ||
url.to_file_path() | ||
.map_err(|_e| io::Error::new(io::ErrorKind::Other, "path canonicalization error")) | ||
} | ||
|
||
#[cfg(windows)] | ||
/// Adding a 'missing' trait from windows OsStrExt | ||
trait OsStrExtLmdb { | ||
|
@@ -180,22 +172,26 @@ impl EnvOpenOptions { | |
pub fn open<P: AsRef<Path>>(&self, path: P) -> Result<Env> { | ||
let mut lock = OPENED_ENV.write().unwrap(); | ||
|
||
let path = match canonicalize_path(path.as_ref()) { | ||
let (path, handle) = match Handle::from_path(&path) { | ||
Err(err) => { | ||
if err.kind() == NotFound && self.flags & (Flag::NoSubDir as u32) != 0 { | ||
let path = path.as_ref(); | ||
match path.parent().zip(path.file_name()) { | ||
Some((dir, file_name)) => canonicalize_path(dir)?.join(file_name), | ||
Some((dir, file_name)) => { | ||
let handle = Handle::from_path(&dir)?; | ||
let path = dir.join(file_name); | ||
(path, handle) | ||
} | ||
None => return Err(err.into()), | ||
} | ||
} else { | ||
return Err(err.into()); | ||
} | ||
} | ||
Ok(path) => path, | ||
Ok(handle) => (path.as_ref().to_path_buf(), handle), | ||
}; | ||
|
||
match lock.entry(path) { | ||
match lock.entry(handle) { | ||
Entry::Occupied(entry) => { | ||
let env = entry.get().env.clone().ok_or(Error::DatabaseClosing)?; | ||
let options = entry.get().options.clone(); | ||
|
@@ -206,7 +202,6 @@ impl EnvOpenOptions { | |
} | ||
} | ||
Entry::Vacant(entry) => { | ||
let path = entry.key(); | ||
let path_str = CString::new(path.as_os_str().as_bytes()).unwrap(); | ||
|
||
unsafe { | ||
|
@@ -255,6 +250,7 @@ impl EnvOpenOptions { | |
env, | ||
dbi_open_mutex: sync::Mutex::default(), | ||
path: path.clone(), | ||
handle: Handle::from_path(path)?, | ||
}; | ||
let env = Env(Arc::new(inner)); | ||
let cache_entry = EnvEntry { | ||
|
@@ -277,9 +273,11 @@ impl EnvOpenOptions { | |
} | ||
|
||
/// Returns a struct that allows to wait for the effective closing of an environment. | ||
pub fn env_closing_event<P: AsRef<Path>>(path: P) -> Option<EnvClosingEvent> { | ||
pub fn env_closing_event<P: AsRef<Path>>(path: P) -> Result<Option<EnvClosingEvent>> { | ||
Kerollmops marked this conversation as resolved.
Show resolved
Hide resolved
|
||
let lock = OPENED_ENV.read().unwrap(); | ||
lock.get(path.as_ref()).map(|e| EnvClosingEvent(e.signal_event.clone())) | ||
let handle = Handle::from_path(path)?; | ||
|
||
Ok(lock.get(&handle).map(|e| EnvClosingEvent(e.signal_event.clone()))) | ||
} | ||
|
||
/// An environment handle constructed by using [`EnvOpenOptions`]. | ||
|
@@ -288,7 +286,8 @@ pub struct Env(Arc<EnvInner>); | |
|
||
impl fmt::Debug for Env { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
let EnvInner { env: _, dbi_open_mutex: _, path } = self.0.as_ref(); | ||
// We don't include the header since it's containing platform specifics details | ||
let EnvInner { env: _, dbi_open_mutex: _, path, handle: _ } = self.0.as_ref(); | ||
f.debug_struct("Env").field("path", &path.display()).finish_non_exhaustive() | ||
} | ||
} | ||
|
@@ -297,6 +296,7 @@ struct EnvInner { | |
env: *mut ffi::MDB_env, | ||
dbi_open_mutex: sync::Mutex<HashMap<u32, Option<(TypeId, TypeId)>>>, | ||
path: PathBuf, | ||
handle: Handle, | ||
} | ||
|
||
unsafe impl Send for EnvInner {} | ||
|
@@ -307,7 +307,7 @@ impl Drop for EnvInner { | |
fn drop(&mut self) { | ||
let mut lock = OPENED_ENV.write().unwrap(); | ||
|
||
match lock.remove(&self.path) { | ||
match lock.remove(&self.handle) { | ||
None => panic!("It seems another env closed this env before"), | ||
Some(EnvEntry { signal_event, .. }) => { | ||
unsafe { | ||
|
@@ -641,7 +641,10 @@ impl Env { | |
Ok(()) | ||
} | ||
|
||
/// Returns the canonicalized path where this env lives. | ||
/// Returns the `Path` where this [Env] lives. | ||
/// | ||
/// The `Path` returned will be the one from the first opening. Like if you `EnvOpenOptions::open` through | ||
/// a symlink or a moved the directory containing it will be the case. | ||
pub fn path(&self) -> &Path { | ||
&self.0.path | ||
} | ||
|
@@ -653,7 +656,7 @@ impl Env { | |
/// when all references are dropped, the last one will eventually close the environment. | ||
pub fn prepare_for_closing(self) -> EnvClosingEvent { | ||
let mut lock = OPENED_ENV.write().unwrap(); | ||
match lock.get_mut(self.path()) { | ||
match lock.get_mut(&self.0.handle) { | ||
None => panic!("cannot find the env that we are trying to close"), | ||
Some(EnvEntry { env, signal_event, .. }) => { | ||
// We remove the env from the global list and replace it with a None. | ||
|
@@ -797,7 +800,7 @@ mod tests { | |
eprintln!("env closed successfully"); | ||
|
||
// Make sure we don't have a reference to the env | ||
assert!(env_closing_event(&dir.path()).is_none()); | ||
assert!(env_closing_event(&dir.path()).unwrap().is_none()); | ||
Kerollmops marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
#[test] | ||
|
@@ -830,6 +833,59 @@ mod tests { | |
.unwrap(); | ||
} | ||
|
||
#[test] | ||
#[cfg(unix)] | ||
fn open_env_with_named_path_symlink() { | ||
let dir = tempfile::tempdir().unwrap(); | ||
let dir_symlink = tempfile::tempdir().unwrap(); | ||
|
||
let env_name = dir.path().join("babar.mdb"); | ||
let symlink_name = dir_symlink.path().join("babar.mdb.link"); | ||
fs::create_dir_all(&env_name).unwrap(); | ||
|
||
let env = EnvOpenOptions::new() | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&env_name) | ||
.unwrap(); | ||
assert_eq!(env_name, env.path()); | ||
|
||
std::os::unix::fs::symlink(&env_name, &symlink_name).unwrap(); | ||
|
||
let env = EnvOpenOptions::new() | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&symlink_name) | ||
.unwrap(); | ||
|
||
// The path is recycle from the first opening of the env. | ||
assert_eq!(env_name, env.path()); | ||
} | ||
|
||
// Unix only since managing to support *moving* directory on windows was too | ||
// much pain. By moving we mean not copy-recreating, just changing name. | ||
// https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-movefileexa | ||
#[test] | ||
#[cfg(unix)] | ||
fn open_env_with_named_path_rename() { | ||
let dir = tempfile::tempdir().unwrap(); | ||
let env_name = dir.path().join("babar.mdb"); | ||
fs::create_dir_all(&env_name).unwrap(); | ||
|
||
let env = EnvOpenOptions::new() | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&env_name) | ||
.unwrap(); | ||
assert_eq!(env_name, env.path()); | ||
|
||
let env_renamed = dir.path().join("serafina.mdb"); | ||
std::fs::rename(&env_name, &env_renamed).unwrap(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for the windows part maybe the little UNIX api? https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/rename-wrename?view=msvc-170 |
||
|
||
let env = EnvOpenOptions::new() | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&env_renamed) | ||
.unwrap(); | ||
assert_eq!(env_name, env.path()); | ||
} | ||
|
||
#[test] | ||
#[cfg(not(windows))] | ||
fn open_database_with_writemap_flag() { | ||
|
@@ -853,6 +909,95 @@ mod tests { | |
let _env = envbuilder.open(&dir.path().join("data.mdb")).unwrap(); | ||
} | ||
|
||
#[test] | ||
#[cfg(unix)] | ||
fn open_env_with_named_path_symlink_and_no_subdir() { | ||
let env_dir = tempfile::Builder::new().suffix("heed_test_no_subdir_env").tempdir().unwrap(); | ||
let symlink_parent = tempfile::tempdir().unwrap(); | ||
let env_path = env_dir.path(); | ||
let db_file = env_dir.path().join("data.mdb"); | ||
|
||
let symlink_path = symlink_parent.path().join("env.link"); | ||
|
||
let mut envbuilder = EnvOpenOptions::new(); | ||
unsafe { envbuilder.flag(crate::Flag::NoSubDir) }; | ||
let env = envbuilder | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&db_file) | ||
.unwrap(); | ||
assert_eq!(db_file, env.path()); | ||
|
||
std::os::unix::fs::symlink(&env_path, &symlink_path).unwrap(); | ||
let _env = envbuilder | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&symlink_path) | ||
.unwrap(); | ||
|
||
// Checkout that we get the path of the first openning. | ||
assert_eq!(db_file, env.path()); | ||
assert_ne!(symlink_path, env.path()); | ||
|
||
let _env = envbuilder | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&env_path) | ||
.unwrap(); | ||
assert_eq!(db_file, env.path()); | ||
assert_ne!(symlink_path, env.path()); | ||
} | ||
|
||
#[test] | ||
#[cfg(windows)] | ||
fn open_env_with_named_path_symlinkfile_and_no_subdir() { | ||
let dir = tempfile::tempdir().unwrap(); | ||
let dir_symlink = tempfile::tempdir().unwrap(); | ||
// We do a temp dir because CI returned | ||
// Os { code: 5, kind: PermissionDenied, message: "Access is denied." } | ||
let parent_symlink = tempfile::tempdir().unwrap(); | ||
|
||
let env_name = dir.path().join("babar.mdb"); | ||
let symlink_name = parent_symlink.path().join("babar.mdb.link"); | ||
|
||
let mut envbuilder = EnvOpenOptions::new(); | ||
unsafe { envbuilder.flag(crate::Flag::NoSubDir) }; | ||
let _env = envbuilder | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&env_name) | ||
.unwrap(); | ||
|
||
std::os::windows::fs::symlink_dir(&dir.path(), &symlink_name).unwrap(); | ||
let _env = envbuilder | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&symlink_name) | ||
.unwrap(); | ||
|
||
let _env = envbuilder | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&env_name) | ||
.unwrap(); | ||
} | ||
|
||
#[test] | ||
#[cfg(windows)] | ||
fn open_env_with_named_path_symlink() { | ||
let dir = tempfile::tempdir().unwrap(); | ||
let dir_symlink = tempfile::tempdir().unwrap(); | ||
|
||
let env_name = dir.path().join("babar.mdb"); | ||
let symlink_name = dir_symlink.path().join("babar.mdb.link"); | ||
fs::create_dir_all(&env_name).unwrap(); | ||
|
||
std::os::windows::fs::symlink_dir(&env_name, &symlink_name).unwrap(); | ||
let env = EnvOpenOptions::new() | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&symlink_name) | ||
.unwrap(); | ||
|
||
let env = EnvOpenOptions::new() | ||
.map_size(10 * 1024 * 1024) // 10MB | ||
.open(&env_name) | ||
.unwrap(); | ||
} | ||
|
||
#[test] | ||
fn create_database_without_commit() { | ||
let dir = tempfile::tempdir().unwrap(); | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaved a mark since
Handle
open the file and hold a file descriptor https://github.com/BurntSushi/same-file/blob/master/src/unix.rs#L10There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so can we add a comment about it? I mean a really documentation comment in the lib.rs file. Under an Important/Remark header, please? I don't know how LMDB behaves towards that.
I am not sure that we really care about this sentence as we keep it open the whole env-lifetime right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(backported in description in case this PR got stuck/closed to help somebody)
Yeah my worries comes from that line, and my first idea to just check
(dev_t, ino_t
)For sure it may break the
flock()
ing (not the best system api..), and the multi-process features of lmdb is really handy.Checked-out lmdb sources:
About
flock
ing in lmdb.h the sentense you pointed changed with:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what do we conclude? Can we still use the
same_file::Handle
struct? If not, what do we plan? Is it even possible to get thedev_t
andino_t
without opening the file?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat
function family (except thefstatat
) check metadata and return and don't maintain a file descriptor.It's the kind of function used by
metadata
in rust std, sadly posix never standardized a "device / FSID" so ino_t/dev_t is left to implementation detail.For windows it's more https://docs.rs/winapi-util/latest/winapi_util/file/fn.information.html
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfileinformationbyhandle