Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video files organize #841

Merged
merged 76 commits into from
Feb 8, 2022
Merged
Show file tree
Hide file tree
Changes from 61 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
7908c99
get images method
Saksham20 Nov 24, 2021
ddd48a6
rename, symlinks
Saksham20 Nov 24, 2021
0a8c2e0
docstrings, suggested changes
Saksham20 Nov 27, 2021
7dc800e
docstrings
Saksham20 Nov 27, 2021
2abc92e
create external-paths option, separate method for organizing files, r…
Saksham20 Nov 29, 2021
4129109
add typehints, update docstrings, check external paths as option
Saksham20 Nov 29, 2021
fb572b8
fix:open nwbfile in r+
Saksham20 Nov 30, 2021
d2eaf03
using os.path for handling os paths
Saksham20 Nov 30, 2021
8d14488
add external-files-mode, assert files-mode to be copy/move when rewrite
Saksham20 Nov 30, 2021
ab8c7a1
rewrite option change to "external-file"
Saksham20 Dec 3, 2021
eccc1d3
using logger for warnings
Saksham20 Dec 3, 2021
71fbb0b
assert external_files_mode as one of 'move/copy'
Saksham20 Dec 4, 2021
2dd53b8
testing local files pass
Saksham20 Dec 6, 2021
4ae10bb
error string fix
Saksham20 Dec 6, 2021
d0c754f
add tests
Saksham20 Dec 12, 2021
e23dca8
video test cli fixes
Saksham20 Dec 13, 2021
a3ca690
add validation for specific video files on upload
Saksham20 Dec 13, 2021
4c854e5
open nwbfiles and test
Saksham20 Dec 13, 2021
a6f4242
imageseries uuid copy bug fix
Saksham20 Dec 13, 2021
4145445
update setup to include datalad
Saksham20 Dec 13, 2021
6ec8e92
datalad as test dependency
Saksham20 Dec 13, 2021
80a0616
keep only double dash for video options
Saksham20 Dec 13, 2021
dd2b787
logger use %s formatting
Saksham20 Dec 13, 2021
5ae93ef
typehints, docstrings update
Saksham20 Dec 13, 2021
3f3f095
using tmp_path fixtures
Saksham20 Dec 13, 2021
3e14aa9
filter future warning from datalad
Saksham20 Dec 14, 2021
ac7c0fb
Merge branch 'master' into video_files_organize
Saksham20 Dec 15, 2021
69408b2
run pre-commit
Saksham20 Dec 15, 2021
fe3c33a
code review suggestions
Saksham20 Dec 16, 2021
d16b932
run pre-commit
Saksham20 Dec 16, 2021
e7c27b7
fixing test_metadata.test_get_metadata
Saksham20 Jan 15, 2022
432d9a6
Merge branch 'master' into video_files_organize
Saksham20 Jan 15, 2022
2b3c97b
update test.yml
Saksham20 Jan 16, 2022
1c3d55e
manual precommit
Saksham20 Jan 16, 2022
0832f3f
external file extensions const upper case
Saksham20 Jan 18, 2022
effc815
add dated wtf to debug
Saksham20 Jan 19, 2022
dff2e86
Apply suggestions from code review
Saksham20 Jan 19, 2022
5a49a8a
actively assert pattern in external_files
Saksham20 Jan 19, 2022
3b435b3
Update dandi/cli/cmd_organize.py
Saksham20 Jan 19, 2022
2a8c992
add git-annex before run all tests
Saksham20 Jan 19, 2022
5d60bfa
fix clirunner error
Saksham20 Jan 19, 2022
221b48b
external file name assertion fix
Saksham20 Jan 19, 2022
4cc47d6
remove pyfakefs
Saksham20 Jan 21, 2022
90978c7
pip installation of datalad-installer in ubuntu
Saksham20 Jan 21, 2022
37e951e
external files mode supplied only when video_mode is not None
Saksham20 Jan 21, 2022
d37a975
improve doc for external-files-mode cmd option
Saksham20 Jan 21, 2022
5a3a644
rename extentions supported to VIDEO_FILE_EXTENSIONS
Saksham20 Jan 24, 2022
c4a2bb4
separate method for video files validation
Saksham20 Jan 24, 2022
d6f6522
use one option to rewrite external files
Saksham20 Jan 24, 2022
602098b
update tests with new rewroting options
Saksham20 Jan 24, 2022
5a2e238
create video files using opencv
Saksham20 Jan 24, 2022
52e26cf
tests bug fix
Saksham20 Jan 25, 2022
fdde52e
platform independent file loc
Saksham20 Jan 25, 2022
0ed3fa3
validation tests
Saksham20 Jan 25, 2022
5b13d32
implementing intuitive naming of options, using symlinks/hardlinks fo…
Saksham20 Jan 27, 2022
8988cbb
merge master
Saksham20 Jan 31, 2022
4bb0f96
remove validation test
Saksham20 Jan 31, 2022
09a5ee8
Merge branch 'master' into video_files_organize
Saksham20 Jan 31, 2022
49375d9
optimize imports in command line commands
Saksham20 Feb 1, 2022
0398f95
Merge remote-tracking branch 'CN/video_files_organize' into video_fil…
Saksham20 Feb 1, 2022
9877870
bug fix
Saksham20 Feb 2, 2022
525534d
np all remove
Saksham20 Feb 3, 2022
5ec96a4
np all remove
Saksham20 Feb 3, 2022
abef9b3
Merge branch 'dandi:master' into video_files_organize
Saksham20 Feb 4, 2022
56a8e50
Merge branch 'masterCN' into video_files_organize
Saksham20 Feb 4, 2022
832ac63
Merge remote-tracking branch 'CN/video_files_organize' into video_fil…
Saksham20 Feb 4, 2022
4241edc
update click option: --update-external-file-paths and comments
Saksham20 Feb 5, 2022
ba0b74e
raise exception using click.UsageError
Saksham20 Feb 5, 2022
9d95d1d
explicit type hints for list of dicts
Saksham20 Feb 5, 2022
13b9927
code review suggestions
Saksham20 Feb 5, 2022
a43c81a
code review suggestions, remane fixture names to nouns
Saksham20 Feb 5, 2022
dedf55e
Merge remote-tracking branch 'CN/video_files_organize' into video_fil…
Saksham20 Feb 5, 2022
af91844
fixture fix
Saksham20 Feb 5, 2022
58c9de3
Apply suggestions from code review
Saksham20 Feb 8, 2022
4c894a8
PEP 257
Saksham20 Feb 8, 2022
d1a71e5
stringifying list
Saksham20 Feb 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 63 additions & 3 deletions dandi/cli/cmd_organize.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,22 @@
default="auto",
show_default=True,
)
@click.option(
"--modify-external-file-fields",
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
is_flag=True,
default=False,
help="using this options will rewrite the 'external_file' argument of the "
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
"ImageSeries in the nwb file. The value written will correspond to the"
" new location of the video files after being organized. This option will "
"work only if --files-mode is copy/move since the original nwb file is "
"modified",
)
@click.option(
"--media-files-mode",
type=click.Choice(["copy", "move", "symlink", "hardlink"]),
default=None,
help="Supply one of copy/move/symlink/hardlink as the value",
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
)
@click.argument("paths", nargs=-1, type=click.Path(exists=True))
@devel_debug_option()
@map_to_click_exceptions
Expand All @@ -43,6 +59,8 @@ def organize(
invalid="fail",
files_mode="auto",
devel_debug=False,
modify_external_file_fields=False,
media_files_mode=None,
):
"""(Re)organize files according to the metadata.

Expand Down Expand Up @@ -80,11 +98,13 @@ def organize(
from ..dandiset import Dandiset
from ..metadata import get_metadata
from ..organize import (
_create_external_file_names,
create_unique_filenames_from_metadata,
detect_link_type,
filter_invalid_metadata_rows,
organize_external_files,
)
from ..pynwb_utils import ignore_benign_pynwb_warnings
from ..pynwb_utils import ignore_benign_pynwb_warnings, rename_nwb_external_files
from ..utils import Parallel, copy_file, delayed, find_files, load_jsonl, move_file

in_place = False # If we deduce that we are organizing in-place
Expand All @@ -104,6 +124,11 @@ def act(func, *args, **kwargs):
lgr.debug("%s %s %s", func.__name__, args, kwargs)
return func(*args, **kwargs)

if modify_external_file_fields and files_mode not in ["copy", "move"]:
raise ValueError(
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
"files_mode needs to be one of 'copy/move' for the rewrite option to work"
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
)

if dandiset_path is None:
dandiset = Dandiset.find(os.curdir)
if not dandiset:
Expand Down Expand Up @@ -140,7 +165,7 @@ def act(func, *args, **kwargs):
"Only 'dry' or 'move' mode could be used to operate in-place "
"within a dandiset (no paths were provided)"
)
lgr.info(f"We will organize {dandiset_path} in-place")
lgr.info("We will organize %s in-place", dandiset_path)
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
in_place = True
paths = dandiset_path

Expand Down Expand Up @@ -214,6 +239,36 @@ def _get_metadata(path):

metadata = create_unique_filenames_from_metadata(metadata)

# update metadata with external_file information:
external_files_missing_in_nwbfiles = [
len(m["external_file_objects"]) == 0 for m in metadata
]

if all(external_files_missing_in_nwbfiles) and modify_external_file_fields:
lgr.warning(
"modify_external_file_fields specified but no external_files found "
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
"linked to any nwbfile found in %s",
paths,
)
elif (
not all(external_files_missing_in_nwbfiles) and not modify_external_file_fields
):
raise ValueError(
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
"modify_external_file_fields option not specified but found "
jwodder marked this conversation as resolved.
Show resolved Hide resolved
"external video files linked to the nwbfiles "
f"""{[metadata[no]['path']
jwodder marked this conversation as resolved.
Show resolved Hide resolved
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
for no, a in enumerate(external_files_missing_in_nwbfiles)
if not a]}"""
)

if modify_external_file_fields and media_files_mode is None:
media_files_mode = "symlink"
lgr.warning(
"media_files_mode not specified, setting to recommended mode: 'symlink' "
jwodder marked this conversation as resolved.
Show resolved Hide resolved
)

metadata = _create_external_file_names(metadata)

# Verify first that the target paths do not exist yet, and fail if they do
# Note: in "simulate" mode we do early check as well, so this would be
# duplicate but shouldn't hurt
Expand Down Expand Up @@ -313,10 +368,15 @@ def _get_metadata(path):
if op.exists(d):
try:
os.rmdir(d)
lgr.info(f"Removed empty directory {d}")
lgr.info("Removed empty directory %s", d)
except Exception as exc:
lgr.debug("Failed to remove directory %s: %s", d, exc)

# create video file name and re write nwb file external files:
if modify_external_file_fields:
rename_nwb_external_files(metadata, dandiset_path)
organize_external_files(metadata, dandiset_path, media_files_mode)

def msg_(msg, n, cond=None):
if hasattr(n, "__len__"):
n = len(n)
Expand Down
3 changes: 3 additions & 0 deletions dandi/consts.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ class DandiInstance(NamedTuple):
#: of retries)
RETRY_STATUSES = (500, 502, 503, 504)

VIDEO_FILE_EXTENSIONS = [".mp4", ".avi", ".wmv", ".mov", ".flv"]
VIDEO_FILE_MODULES = ["processing", "acquisition"]

#: Maximum allowed depth of a Zarr directory tree
MAX_ZARR_DEPTH = 5

Expand Down
81 changes: 80 additions & 1 deletion dandi/organize.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@
import os.path as op
from pathlib import Path
import re
from typing import List
import uuid

import numpy as np

from . import get_logger
from .exceptions import OrganizeImpossibleError
from .pynwb_utils import get_neurodata_types_to_modalities_map, get_object_id
from .utils import ensure_datetime, flattened, yaml_load
from .utils import copy_file, ensure_datetime, flattened, move_file, yaml_load

lgr = get_logger()

Expand Down Expand Up @@ -172,6 +174,83 @@ def create_unique_filenames_from_metadata(metadata):
return metadata


def _create_external_file_names(metadata: list) -> List[dict]:
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
"""
Renames the external_file attribute in an ImageSeries according to the rule:
jwodder marked this conversation as resolved.
Show resolved Hide resolved
Example, the Initial name of file:
external_file = [name1.mp4]
rename to:
external_file = [dandiset-path-of-nwbfile/
dandi-renamed-nwbfile_name(folder without extension .nwb)/
f'{ImageSeries.object_id}_external_file_0.mp4'
This is stored in a new field in the
metadata['external_file_objects'][0]['external_files_renamed']
Parameters
----------
metadata: list
list of metadata dictionaries created during the call to pynwb_utils._get_pynwb_metadata
Returns
-------
metadata: list
updated list of metadata dictionaries
"""
metadata = deepcopy(metadata)
for meta in metadata:
if "dandi_path" not in meta or "external_file_objects" not in meta:
continue
nwb_folder_name = op.splitext(op.basename(meta["dandi_path"]))[0]
for ext_file_dict in meta["external_file_objects"]:
renamed_path_list = []
uuid_str = ext_file_dict.get("id", str(uuid.uuid4()))
for no, ext_file in enumerate(ext_file_dict["external_files"]):
renamed = op.join(
nwb_folder_name, f"{uuid_str}_external_file_{no}{ext_file.suffix}"
)
renamed_path_list.append(str(renamed))
jwodder marked this conversation as resolved.
Show resolved Hide resolved
ext_file_dict["external_files_renamed"] = renamed_path_list
return metadata


def organize_external_files(
metadata: list, dandiset_path: str, files_mode: str
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
"""
Organizes the external_files into the new Dandiset folder structure.

Parameters
----------
metadata: list
list of metadata dictionaries created during the call to pynwb_utils._get_pynwb_metadata
dandiset_path: str
full path of the main dandiset folder.
files_mode: str
one of "symlink", "copy", "move", "hardlink"

"""
for e in metadata:
for ext_file_dict in e["external_file_objects"]:
for no, (name_old, name_new) in enumerate(
zip(
ext_file_dict["external_files"],
ext_file_dict["external_files_renamed"],
)
):
new_path = op.join(dandiset_path, op.dirname(e["dandi_path"]), name_new)
name_old_str = str(name_old)
if not op.exists(op.dirname(new_path)):
os.makedirs(op.dirname(new_path))
jwodder marked this conversation as resolved.
Show resolved Hide resolved
if files_mode == "symlink":
os.symlink(name_old_str, new_path)
elif files_mode == "hardlink":
os.link(name_old_str, new_path)
elif files_mode == "copy":
copy_file(name_old_str, new_path)
elif files_mode == "move":
move_file(name_old_str, new_path)
else:
raise NotImplementedError(files_mode)


def _assign_obj_id(metadata, non_unique):
msg = "%d out of %d paths are not unique" % (len(non_unique), len(metadata))

Expand Down
88 changes: 88 additions & 0 deletions dandi/pynwb_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

from . import __version__, get_logger
from .consts import (
VIDEO_FILE_EXTENSIONS,
VIDEO_FILE_MODULES,
metadata_nwb_computed_fields,
metadata_nwb_file_fields,
metadata_nwb_subject_fields,
Expand Down Expand Up @@ -230,9 +232,95 @@ def _get_pynwb_metadata(path: Union[str, Path]) -> Dict[str, Any]:
key = f[len("number_of_") :]
out[f] = len(getattr(nwb, key, []) or [])

# get external_file data:
out["external_file_objects"] = _get_image_series(nwb)

return out


def _get_image_series(nwb: pynwb.NWBFile) -> List[dict]:
"""
This method supports _get_pynwb_metadata() in retrieving all ImageSeries
related metadata from an open nwb file.
Specifically it pulls out the ImageSeries uuid, name and all the
externally linked files
named under the argument 'external_file'.
Parameters
----------
nwb: pynwb.NWBFile
Returns
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
-------
out: List[dict]
list of dicts : [{id: <ImageSeries uuid>, name: <IMageSeries name>,
external_files=[ImageSeries.external_file]}]
if no ImageSeries found in the given modules to check, then it returns an empty list.
"""
out = []
for module_name in VIDEO_FILE_MODULES:
module_cont = getattr(nwb, module_name)
for name, ob in module_cont.items():
if isinstance(ob, pynwb.image.ImageSeries) and ob.external_file is not None:
out_dict = dict(id=ob.object_id, name=ob.name, external_files=[])
for ext_file in ob.external_file:
if Path(ext_file).suffix in VIDEO_FILE_EXTENSIONS:
out_dict["external_files"].append(Path(ext_file))
else:
lgr.warning(
"external file %s should be one of: %s",
ext_file,
VIDEO_FILE_EXTENSIONS,
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
)
out.append(out_dict)
return out


def rename_nwb_external_files(metadata: list, dandiset_path: str) -> None:
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
"""
This method, renames the external_file attribute in an ImageSeries datatype in an open nwb file.
It pulls information about the ImageSEries objects
from metadata: metadata["external_file_objects"]
populated during _get_pynwb_metadata() call.

Parameters
----------
metadata: List[dict]
list of dictionaries containing the metadata gathered from the nwbfile
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
dandiset_path: str
base path of dandiset
"""
for meta in metadata:
if not np.all(
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
[i in meta for i in ["path", "dandi_path", "external_file_objects"]]
):
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
lgr.warning(
"could not rename external files, update metadata"
'with "path", "dandi_path", "external_file_objects"'
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
)
return
dandiset_nwbfile_path = op.join(dandiset_path, meta["dandi_path"])
with NWBHDF5IO(dandiset_nwbfile_path, mode="r+", load_namespaces=True) as io:
nwb = io.read()
for ext_file_dict in meta["external_file_objects"]:
# retrieve nwb neurodata object of the given object id:
container_list = [
child
for child in nwb.children
if ext_file_dict["id"] == child.object_id
]
if len(container_list) == 0:
continue
else:
container = container_list[0]
Saksham20 marked this conversation as resolved.
Show resolved Hide resolved
# rename all external files:
for no, (name_old, name_new) in enumerate(
zip(
ext_file_dict["external_files"],
ext_file_dict["external_files_renamed"],
)
):
container.external_file[no] = str(name_new)
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved


@validate_cache.memoize_path
def validate(path: Union[str, Path], devel_debug: bool = False) -> List[str]:
"""Run validation on a file and return errors
Expand Down
Loading