Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video files organize #841

Merged
merged 76 commits into from
Feb 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
7908c99
get images method
Saksham20 Nov 24, 2021
ddd48a6
rename, symlinks
Saksham20 Nov 24, 2021
0a8c2e0
docstrings, suggested changes
Saksham20 Nov 27, 2021
7dc800e
docstrings
Saksham20 Nov 27, 2021
2abc92e
create external-paths option, separate method for organizing files, r…
Saksham20 Nov 29, 2021
4129109
add typehints, update docstrings, check external paths as option
Saksham20 Nov 29, 2021
fb572b8
fix:open nwbfile in r+
Saksham20 Nov 30, 2021
d2eaf03
using os.path for handling os paths
Saksham20 Nov 30, 2021
8d14488
add external-files-mode, assert files-mode to be copy/move when rewrite
Saksham20 Nov 30, 2021
ab8c7a1
rewrite option change to "external-file"
Saksham20 Dec 3, 2021
eccc1d3
using logger for warnings
Saksham20 Dec 3, 2021
71fbb0b
assert external_files_mode as one of 'move/copy'
Saksham20 Dec 4, 2021
2dd53b8
testing local files pass
Saksham20 Dec 6, 2021
4ae10bb
error string fix
Saksham20 Dec 6, 2021
d0c754f
add tests
Saksham20 Dec 12, 2021
e23dca8
video test cli fixes
Saksham20 Dec 13, 2021
a3ca690
add validation for specific video files on upload
Saksham20 Dec 13, 2021
4c854e5
open nwbfiles and test
Saksham20 Dec 13, 2021
a6f4242
imageseries uuid copy bug fix
Saksham20 Dec 13, 2021
4145445
update setup to include datalad
Saksham20 Dec 13, 2021
6ec8e92
datalad as test dependency
Saksham20 Dec 13, 2021
80a0616
keep only double dash for video options
Saksham20 Dec 13, 2021
dd2b787
logger use %s formatting
Saksham20 Dec 13, 2021
5ae93ef
typehints, docstrings update
Saksham20 Dec 13, 2021
3f3f095
using tmp_path fixtures
Saksham20 Dec 13, 2021
3e14aa9
filter future warning from datalad
Saksham20 Dec 14, 2021
ac7c0fb
Merge branch 'master' into video_files_organize
Saksham20 Dec 15, 2021
69408b2
run pre-commit
Saksham20 Dec 15, 2021
fe3c33a
code review suggestions
Saksham20 Dec 16, 2021
d16b932
run pre-commit
Saksham20 Dec 16, 2021
e7c27b7
fixing test_metadata.test_get_metadata
Saksham20 Jan 15, 2022
432d9a6
Merge branch 'master' into video_files_organize
Saksham20 Jan 15, 2022
2b3c97b
update test.yml
Saksham20 Jan 16, 2022
1c3d55e
manual precommit
Saksham20 Jan 16, 2022
0832f3f
external file extensions const upper case
Saksham20 Jan 18, 2022
effc815
add dated wtf to debug
Saksham20 Jan 19, 2022
dff2e86
Apply suggestions from code review
Saksham20 Jan 19, 2022
5a49a8a
actively assert pattern in external_files
Saksham20 Jan 19, 2022
3b435b3
Update dandi/cli/cmd_organize.py
Saksham20 Jan 19, 2022
2a8c992
add git-annex before run all tests
Saksham20 Jan 19, 2022
5d60bfa
fix clirunner error
Saksham20 Jan 19, 2022
221b48b
external file name assertion fix
Saksham20 Jan 19, 2022
4cc47d6
remove pyfakefs
Saksham20 Jan 21, 2022
90978c7
pip installation of datalad-installer in ubuntu
Saksham20 Jan 21, 2022
37e951e
external files mode supplied only when video_mode is not None
Saksham20 Jan 21, 2022
d37a975
improve doc for external-files-mode cmd option
Saksham20 Jan 21, 2022
5a3a644
rename extentions supported to VIDEO_FILE_EXTENSIONS
Saksham20 Jan 24, 2022
c4a2bb4
separate method for video files validation
Saksham20 Jan 24, 2022
d6f6522
use one option to rewrite external files
Saksham20 Jan 24, 2022
602098b
update tests with new rewroting options
Saksham20 Jan 24, 2022
5a2e238
create video files using opencv
Saksham20 Jan 24, 2022
52e26cf
tests bug fix
Saksham20 Jan 25, 2022
fdde52e
platform independent file loc
Saksham20 Jan 25, 2022
0ed3fa3
validation tests
Saksham20 Jan 25, 2022
5b13d32
implementing intuitive naming of options, using symlinks/hardlinks fo…
Saksham20 Jan 27, 2022
8988cbb
merge master
Saksham20 Jan 31, 2022
4bb0f96
remove validation test
Saksham20 Jan 31, 2022
09a5ee8
Merge branch 'master' into video_files_organize
Saksham20 Jan 31, 2022
49375d9
optimize imports in command line commands
Saksham20 Feb 1, 2022
0398f95
Merge remote-tracking branch 'CN/video_files_organize' into video_fil…
Saksham20 Feb 1, 2022
9877870
bug fix
Saksham20 Feb 2, 2022
525534d
np all remove
Saksham20 Feb 3, 2022
5ec96a4
np all remove
Saksham20 Feb 3, 2022
abef9b3
Merge branch 'dandi:master' into video_files_organize
Saksham20 Feb 4, 2022
56a8e50
Merge branch 'masterCN' into video_files_organize
Saksham20 Feb 4, 2022
832ac63
Merge remote-tracking branch 'CN/video_files_organize' into video_fil…
Saksham20 Feb 4, 2022
4241edc
update click option: --update-external-file-paths and comments
Saksham20 Feb 5, 2022
ba0b74e
raise exception using click.UsageError
Saksham20 Feb 5, 2022
9d95d1d
explicit type hints for list of dicts
Saksham20 Feb 5, 2022
13b9927
code review suggestions
Saksham20 Feb 5, 2022
a43c81a
code review suggestions, remane fixture names to nouns
Saksham20 Feb 5, 2022
dedf55e
Merge remote-tracking branch 'CN/video_files_organize' into video_fil…
Saksham20 Feb 5, 2022
af91844
fixture fix
Saksham20 Feb 5, 2022
58c9de3
Apply suggestions from code review
Saksham20 Feb 8, 2022
4c894a8
PEP 257
Saksham20 Feb 8, 2022
d1a71e5
stringifying list
Saksham20 Feb 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 64 additions & 3 deletions dandi/cli/cmd_organize.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,22 @@
default="auto",
show_default=True,
)
@click.option(
"--update-external-file-paths",
is_flag=True,
default=False,
help="Rewrite the 'external_file' arguments of ImageSeries in NWB files. "
"The new values will correspond to the new locations of the video files "
"after being organized. "
"This option requires --files-mode to be 'copy' or 'move'",
)
@click.option(
"--media-files-mode",
type=click.Choice(["copy", "move", "symlink", "hardlink"]),
default=None,
help="This option works on the video files on disc while being organized "
"along side nwb files.",
)
@click.argument("paths", nargs=-1, type=click.Path(exists=True))
@devel_debug_option()
@map_to_click_exceptions
Expand All @@ -43,6 +59,8 @@ def organize(
invalid="fail",
files_mode="auto",
devel_debug=False,
update_external_file_paths=False,
media_files_mode=None,
):
"""(Re)organize files according to the metadata.

Expand Down Expand Up @@ -80,11 +98,13 @@ def organize(
from ..dandiset import Dandiset
from ..metadata import get_metadata
from ..organize import (
_create_external_file_names,
create_unique_filenames_from_metadata,
detect_link_type,
filter_invalid_metadata_rows,
organize_external_files,
)
from ..pynwb_utils import ignore_benign_pynwb_warnings
from ..pynwb_utils import ignore_benign_pynwb_warnings, rename_nwb_external_files
from ..utils import Parallel, copy_file, delayed, find_files, load_jsonl, move_file

in_place = False # If we deduce that we are organizing in-place
Expand All @@ -104,6 +124,11 @@ def act(func, *args, **kwargs):
lgr.debug("%s %s %s", func.__name__, args, kwargs)
return func(*args, **kwargs)

if update_external_file_paths and files_mode not in ["copy", "move"]:
raise click.UsageError(
"--files-mode needs to be one of 'copy/move' for the rewrite option to work"
)

if dandiset_path is None:
dandiset = Dandiset.find(os.curdir)
if not dandiset:
Expand Down Expand Up @@ -140,7 +165,7 @@ def act(func, *args, **kwargs):
"Only 'dry' or 'move' mode could be used to operate in-place "
"within a dandiset (no paths were provided)"
)
lgr.info(f"We will organize {dandiset_path} in-place")
lgr.info("We will organize %s in-place", dandiset_path)
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
in_place = True
paths = dandiset_path

Expand Down Expand Up @@ -214,6 +239,37 @@ def _get_metadata(path):

metadata = create_unique_filenames_from_metadata(metadata)

# update metadata with external_file information:
external_files_missing_in_nwbfiles = [
len(m["external_file_objects"]) == 0 for m in metadata
]

if all(external_files_missing_in_nwbfiles) and update_external_file_paths:
lgr.warning(
"--update-external-file-paths specified but no external_files found "
"linked to any nwbfile found in %s",
paths,
)
elif not all(external_files_missing_in_nwbfiles) and not update_external_file_paths:
files_list = [
metadata[no]["path"]
for no, a in enumerate(external_files_missing_in_nwbfiles)
if not a
]
raise click.UsageError(
"--update-external-file-paths option not specified but found "
"external video files linked to the nwbfiles "
f"{', '.join(files_list)}"
)

if update_external_file_paths and media_files_mode is None:
media_files_mode = "symlink"
lgr.warning(
"--media-files-mode not specified, setting to recommended mode: 'symlink' "
)

metadata = _create_external_file_names(metadata)

# Verify first that the target paths do not exist yet, and fail if they do
# Note: in "simulate" mode we do early check as well, so this would be
# duplicate but shouldn't hurt
Expand Down Expand Up @@ -313,10 +369,15 @@ def _get_metadata(path):
if op.exists(d):
try:
os.rmdir(d)
lgr.info(f"Removed empty directory {d}")
lgr.info("Removed empty directory %s", d)
except Exception as exc:
lgr.debug("Failed to remove directory %s: %s", d, exc)

# create video file name and re write nwb file external files:
if update_external_file_paths:
rename_nwb_external_files(metadata, dandiset_path)
organize_external_files(metadata, dandiset_path, media_files_mode)

def msg_(msg, n, cond=None):
if hasattr(n, "__len__"):
n = len(n)
Expand Down
3 changes: 3 additions & 0 deletions dandi/consts.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ class DandiInstance(NamedTuple):
#: of retries)
RETRY_STATUSES = (500, 502, 503, 504)

VIDEO_FILE_EXTENSIONS = [".mp4", ".avi", ".wmv", ".mov", ".flv"]
VIDEO_FILE_MODULES = ["processing", "acquisition"]

#: Maximum allowed depth of a Zarr directory tree
MAX_ZARR_DEPTH = 5

Expand Down
82 changes: 81 additions & 1 deletion dandi/organize.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@
import os.path as op
from pathlib import Path
import re
from typing import List
import uuid

import numpy as np

from . import get_logger
from .exceptions import OrganizeImpossibleError
from .pynwb_utils import get_neurodata_types_to_modalities_map, get_object_id
from .utils import ensure_datetime, flattened, yaml_load
from .utils import copy_file, ensure_datetime, flattened, move_file, yaml_load

lgr = get_logger()

Expand Down Expand Up @@ -172,6 +174,84 @@ def create_unique_filenames_from_metadata(metadata):
return metadata


def _create_external_file_names(metadata: List[dict]) -> List[dict]:
"""Updates the metadata dict with renamed external files.

Renames the external_file attribute in an ImageSeries according to the rule:
jwodder marked this conversation as resolved.
Show resolved Hide resolved
<nwbfile name>/<ImageSeries uuid>_external_file_<no><.ext>
Example, the Initial name of file:
external_file = [name1.mp4]
rename to:
external_file = [dandiset-path-of-nwbfile/
dandi-renamed-nwbfile_name(folder without extension .nwb)/
f'{ImageSeries.object_id}_external_file_0.mp4'
This is stored in a new field in the metadata:
metadata['external_file_objects'][0]['external_files_renamed'] = <renamed_string>

Parameters
----------
metadata: list
list of metadata dictionaries created during the call to pynwb_utils._get_pynwb_metadata
Returns
-------
metadata: list
updated list of metadata dictionaries
"""
metadata = deepcopy(metadata)
for meta in metadata:
if "dandi_path" not in meta or "external_file_objects" not in meta:
continue
nwb_folder_name = op.splitext(op.basename(meta["dandi_path"]))[0]
for ext_file_dict in meta["external_file_objects"]:
renamed_path_list = []
uuid_str = ext_file_dict.get("id", str(uuid.uuid4()))
for no, ext_file in enumerate(ext_file_dict["external_files"]):
renamed = op.join(
nwb_folder_name, f"{uuid_str}_external_file_{no}{ext_file.suffix}"
)
renamed_path_list.append(renamed)
ext_file_dict["external_files_renamed"] = renamed_path_list
return metadata


def organize_external_files(
metadata: List[dict], dandiset_path: str, files_mode: str
) -> None:
"""Organizes the external_files into the new Dandiset folder structure.

Parameters
----------
metadata: list
list of metadata dictionaries created during the call to pynwb_utils._get_pynwb_metadata
dandiset_path: str
full path of the main dandiset folder.
files_mode: str
one of "symlink", "copy", "move", "hardlink"

"""
for e in metadata:
for ext_file_dict in e["external_file_objects"]:
for no, (name_old, name_new) in enumerate(
zip(
ext_file_dict["external_files"],
ext_file_dict["external_files_renamed"],
)
):
new_path = op.join(dandiset_path, op.dirname(e["dandi_path"]), name_new)
name_old_str = str(name_old)
os.makedirs(op.dirname(new_path), exist_ok=True)
if files_mode == "symlink":
os.symlink(name_old_str, new_path)
elif files_mode == "hardlink":
os.link(name_old_str, new_path)
elif files_mode == "copy":
copy_file(name_old_str, new_path)
elif files_mode == "move":
move_file(name_old_str, new_path)
else:
raise NotImplementedError(files_mode)


def _assign_obj_id(metadata, non_unique):
msg = "%d out of %d paths are not unique" % (len(non_unique), len(metadata))

Expand Down
85 changes: 85 additions & 0 deletions dandi/pynwb_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

from . import __version__, get_logger
from .consts import (
VIDEO_FILE_EXTENSIONS,
VIDEO_FILE_MODULES,
metadata_nwb_computed_fields,
metadata_nwb_file_fields,
metadata_nwb_subject_fields,
Expand Down Expand Up @@ -230,9 +232,92 @@ def _get_pynwb_metadata(path: Union[str, Path]) -> Dict[str, Any]:
key = f[len("number_of_") :]
out[f] = len(getattr(nwb, key, []) or [])

# get external_file data:
out["external_file_objects"] = _get_image_series(nwb)

return out


def _get_image_series(nwb: pynwb.NWBFile) -> List[dict]:
"""Retrieves all ImageSeries related metadata from an open nwb file.

Specifically it pulls out the ImageSeries uuid, name and all the
externally linked files named under the argument 'external_file'.

Parameters
----------
nwb: pynwb.NWBFile

Returns
-------
out: List[dict]
list of dicts : [{id: <ImageSeries uuid>, name: <IMageSeries name>,
external_files=[ImageSeries.external_file]}]
if no ImageSeries found in the given modules to check, then it returns an empty list.
"""
out = []
for module_name in VIDEO_FILE_MODULES:
module_cont = getattr(nwb, module_name)
for name, ob in module_cont.items():
if isinstance(ob, pynwb.image.ImageSeries) and ob.external_file is not None:
out_dict = dict(id=ob.object_id, name=ob.name, external_files=[])
for ext_file in ob.external_file:
if Path(ext_file).suffix in VIDEO_FILE_EXTENSIONS:
out_dict["external_files"].append(Path(ext_file))
else:
lgr.warning(
"external file %s should be one of: %s",
ext_file,
", ".join(VIDEO_FILE_EXTENSIONS),
)
out.append(out_dict)
return out


def rename_nwb_external_files(metadata: List[dict], dandiset_path: str) -> None:
"""Renames the external_file attribute in an ImageSeries datatype in an open nwb file.

It pulls information about the ImageSeries objects from metadata:
metadata["external_file_objects"] populated during _get_pynwb_metadata() call.

Parameters
----------
metadata: List[dict]
list of dictionaries containing the metadata gathered from the nwbfile
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
dandiset_path: str
base path of dandiset
"""
for meta in metadata:
if not all(i in meta for i in ["path", "dandi_path", "external_file_objects"]):
lgr.warning(
"could not rename external files, update metadata "
'with "path", "dandi_path", "external_file_objects"'
)
return
dandiset_nwbfile_path = op.join(dandiset_path, meta["dandi_path"])
with NWBHDF5IO(dandiset_nwbfile_path, mode="r+", load_namespaces=True) as io:
nwb = io.read()
for ext_file_dict in meta["external_file_objects"]:
# retrieve nwb neurodata object of the given object id:
container_list = [
child
for child in nwb.children
if ext_file_dict["id"] == child.object_id
]
if len(container_list) == 0:
continue
else:
container = container_list[0]
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
# rename all external files:
for no, (name_old, name_new) in enumerate(
zip(
ext_file_dict["external_files"],
ext_file_dict["external_files_renamed"],
)
):
container.external_file[no] = str(name_new)
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved


@validate_cache.memoize_path
def validate(path: Union[str, Path], devel_debug: bool = False) -> List[str]:
"""Run validation on a file and return errors
Expand Down
Loading