Skip to content

Commit

Permalink
Merge pull request #23 from CCBR/issue_18
Browse files Browse the repository at this point in the history
Issue 18
  • Loading branch information
kopardev authored Oct 16, 2024
2 parents e26ffc6 + 7b3cbbc commit 24d09ce
Show file tree
Hide file tree
Showing 10 changed files with 190 additions and 26 deletions.
34 changes: 26 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

**Park** an **arc**hived project tool**kit**!

[![test](https://github.com/CCBR/parkit/actions/workflows/test.yml/badge.svg)](https://github.com/CCBR/parkit/actions/workflows/test.yml)
[![docs](https://github.com/CCBR/parkit/actions/workflows/docs.yml/badge.svg)](https://github.com/CCBR/parkit/actions/workflows/docs.yml)

> DISCLAIMERS:
>
Expand All @@ -17,6 +15,8 @@ When a project comes to a completion, most analysts have folders (or `.tar` file

The analyst can use `parkit` to park these folders directly on to HPCDME's **CCBR_Archive** object store vault. A typical project, say `ccbrXYZ`, can be parked at `/CCBR_Archive/GRIDFTP/Project_CCBR-XYZ` with collections "Analysis" and "Rawdata".

!!! note `projark` command is preferred for CCBR **proj**ect **ark**iving

### Prerequisites:

- On helix or biowulf you can get access to `parkit` by loading the appropriate conda env
Expand All @@ -30,7 +30,7 @@ The analyst can use `parkit` to park these folders directly on to HPCDME's **CCB

- **HPC_DM_UTILS** environmental variable should be preset before calling `parkit`. It also needs to be passed as an argument to `parkit_folder2hpcdme` and `parkit_tarball2hpcdme` end-to-end workflows.

> If not on helix or biowulf then you will have to **clone** the repo and **pip install** it. Then setup [HPC_DME_APIs](https://github.com/CBIIT/HPC_DME_APIs) appropriately.
!!! warning If not on helix or biowulf then you will have to **clone** the repo and **pip install** it. Then setup [HPC_DME_APIs](https://github.com/CBIIT/HPC_DME_APIs) appropriately.

### Usage:

Expand Down Expand Up @@ -104,7 +104,8 @@ options:
We also have end-to-end slurm-supported folder-to-hpcdme and tarball-to-hpcdme workflows:

- `parkit_folder2hpcdme`
- `parkit_tarball2hpcdme`
- `parkit_tarball2hpcdme` and
- `projark` [ recommended for archiving CCBR projects to GRIPFTP folder under CCBR_Archive ]

If run with `--executor slurm` this interfaces with the job scheduler on Biowulf and submitted individual steps of these E2E workflows as interdependent jobs.

Expand Down Expand Up @@ -135,14 +136,12 @@ options:
--version print version
```
> NOTE:
>
> `parkit_folder2hpcdme` by default parks files under `/CCBR_Archive/GRIDFTP/Project_CCBR-12345/Analysis` unless the `--rawdata` flag is provided at command line. In that case, the tarball is parked at `/CCBR_Archive/GRIDFTP/Project_CCBR-12345/Rawdata`
### `parkit_tarball2hpcdme`
```bash
parkit_tarball2hpcdme --help
%> parkit_tarball2hpcdme --help
usage: parkit_tarball2hpcdme [-h] [--restartfrom RESTARTFROM] [--executor EXECUTOR] [--tarball TARBALL] [--dest DEST]
[--projectdesc PROJECTDESC] [--projecttitle PROJECTTITLE] [--cleanup] --hpcdmutilspath HPCDMUTILSPATH
[--version]
Expand All @@ -165,3 +164,22 @@ options:
what should be the value of env var HPC_DM_UTILS
--version print version
```
```bash
> %projark --help
usage: projark [-h] --folder FOLDER --projectnumber PROJECTNUMBER
[--executor EXECUTOR] [--rawdata] [--cleanup]

Wrapper for folder2hpcdme for quick CCBR project archiving!

options:
-h, --help show this help message and exit
--folder FOLDER Input folder path to archive
--projectnumber PROJECTNUMBER
CCBR project number.. destination will be
/CCBR_Archive/GRIDFTP/Project_CCBR-<projectnumber>
--executor EXECUTOR slurm or local
--rawdata If tarball is rawdata and needs to go under folder
Rawdata
--cleanup post transfer step to delete local files
```
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ test = [
parkit = "parkit.__main__:main"
parkit_folder2hpcdme = "parkit.parkit_folder2hpcdme:main"
parkit_tarball2hpcdme = "parkit.parkit_tarball2hpcdme:main"
projark = "parkit.projark:main"
update_collection_metadata = "parkit.update_collection_metadata:main"

[tool.setuptools.dynamic]
Expand Down
12 changes: 9 additions & 3 deletions src/parkit/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@ def main():
help="destination path in vault (Analysis collection goes under here)",
required=True,
)
parser_createmetadata.add_argument(
"--collectiontype",
type=str,
help="type of collection ... Analysis[default] or Rawdata",
default="Analysis" # or Rawdata
)

# Create a subcommand for "createemptycollection"
parser_createemptycollection = subparsers.add_parser(
Expand Down Expand Up @@ -82,7 +88,7 @@ def main():
parser_deposittar.add_argument(
"--collectiontype",
type=str,
help="path to tarball",
help="type of collection ... Analysis[default] or Rawdata",
default="Analysis" # or Rawdata
)

Expand All @@ -102,9 +108,9 @@ def main():
args.dest, projectdesc=args.projectdesc, projecttitle=args.projecttitle
)
elif args.command == "createmetadata":
tar_json_path = createmetadata(args.tarball, args.dest)
tar_json_path = createmetadata(args.tarball, args.dest, args.collectiontype)
files_created.append(tar_json_path)
filelist_json_path = createmetadata(args.tarball + ".filelist", args.dest)
filelist_json_path = createmetadata(args.tarball + ".filelist", args.dest, args.collectiontype)
files_created.append(filelist_json_path)
elif args.command == "deposittar":
deposittocollection(args.tarball, args.dest, args.collectiontype)
Expand Down
19 changes: 19 additions & 0 deletions src/parkit/projark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import sys, os
import subprocess
from pathlib import Path


def main():
# Path to your bash script
p = Path(__file__).absolute()
pp = str(p.parent)

# script_path = 'parkit/scripts/parkit_folder2hpcdme'
script_path = os.path.join(pp, "scripts", "projark") # projark ... archive a ccbr project!!

# Pass all arguments to the bash script
subprocess.run([script_path] + sys.argv[1:])


if __name__ == "__main__":
main()
32 changes: 27 additions & 5 deletions src/parkit/scripts/parkit_folder2hpcdme
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ parser.add_argument('--projectdesc',required=False, help='project description')
parser.add_argument('--projecttitle',required=False, help='project title')
parser.add_argument('--rawdata',action='store_true', help='If tarball is rawdata and needs to go under folder Rawdata')
parser.add_argument('--cleanup',action='store_true', help='post transfer step to delete local files')
parser.add_argument('--makereadme',action='store_true', help='make readme file with destination location on vault')
parser.add_argument('--hpcdmutilspath', required=True, help='what should be the value of env var HPC_DM_UTILS')
parser.add_argument('--version',action='store_true', help='print version')
EOF
Expand All @@ -104,12 +105,15 @@ fi
# folder is required
required_argument "$FOLDER" "--folder"
TARBALL="${FOLDER}.tar"
README="${FOLDER}.README"

# cleanup option
if [[ "${CLEANUP}" == "yes" ]];then
for f in "${TARBALL}"* *response-header.tmp *response-message.json.tmp
do
rm -iv $f
if [[ "$f" != "${TARBALL}.filelist" ]];then
rm -iv $f
fi
done
exit 0
fi
Expand Down Expand Up @@ -176,7 +180,10 @@ echo "################ Running createtar #############################"
cmd="${PARKIT} createtar --folder \"${FOLDER}\""
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
run_sbatch_cmd "$cmd" "$dependency" "createtar" "$HPCDMUTILSPATH"
jid="$JID"
Expand All @@ -196,7 +203,10 @@ echo "############ Running createemptycollection ######################"
cmd="${PARKIT} createemptycollection --dest \"${DEST}\" --projectdesc \"${PROJECTDESC}\" --projecttitle \"${PROJECTTITLE}\""
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
dependency=""
if [[ "$jobids" != "" ]];then
Expand All @@ -218,9 +228,15 @@ fi # RUN_createemptycollection ends
if [[ "$RUN_createmetadata" == "1" ]];then
echo "########### Running createmetadata ##############################"
cmd="${PARKIT} createmetadata --tarball \"${TARBALL}\" --dest \"${DEST}\""
if [[ "${RAWDATA}" == "yes" ]];then
cmd="${cmd} --collectiontype \"Rawdata\""
fi
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
dependency=""
if [[ "$jobids" != "" ]];then
Expand All @@ -247,7 +263,10 @@ echo "############# Running deposittar ###############################"
fi
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
dependency=""
if [[ "$jobids" != "" ]];then
Expand All @@ -262,5 +281,8 @@ echo "############# Running deposittar ###############################"
jobids="$jobids:$jid"
fi
fi
if [[ "${MAKEREADME}" == "yes" ]];then
echo "${TARBALL} parked at ${DEST} on HPCDME!" >> "${README}"
fi
echo "################################################################"
fi # RUN_deposittar ends
37 changes: 31 additions & 6 deletions src/parkit/scripts/parkit_tarball2hpcdme
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ parser.add_argument('--dest',required=False, help='vault collection path (Analys
parser.add_argument('--projectdesc',required=False, help='project description')
parser.add_argument('--projecttitle',required=False, help='project title')
parser.add_argument('--cleanup',action='store_true', help='post transfer step to delete local files')
parser.add_argument('--makereadme',action='store_true', help='make readme file with destination location on vault')
parser.add_argument('--hpcdmutilspath', required=True, help='what should be the value of env var HPC_DM_UTILS')
parser.add_argument('--version',action='store_true', help='print version')
EOF
Expand All @@ -100,11 +101,14 @@ if [ ! -f "${TARBALL}" ];then
echo "${TARBALL} does not exist!"
exit 1
fi
README="${TARBALL}.README"
# cleanup option
if [[ "${CLEANUP}" == "yes" ]];then
for f in "${TARBALL}"* *response-header.tmp *response-message.json.tmp
do
rm -iv $f
if [[ "${TARBALL}.filelist" != "${f}" ]];then
rm -iv $f
fi
done
exit 0
fi
Expand Down Expand Up @@ -169,7 +173,10 @@ echo "################ Running tarprep #############################"
cmd="${PARKIT} tarprep --tarball \"${TARBALL}\""
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
run_sbatch_cmd "$cmd" "$dependency" "tarprep" "$HPCDMUTILSPATH"
jid="$JID"
Expand All @@ -189,7 +196,10 @@ echo "############ Running createemptycollection ######################"
cmd="${PARKIT} createemptycollection --dest \"${DEST}\" --projectdesc \"${PROJECTDESC}\" --projecttitle \"${PROJECTTITLE}\""
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
dependency=""
if [[ "$jobids" != "" ]];then
Expand All @@ -211,9 +221,15 @@ fi # RUN_createemptycollection ends
if [[ "$RUN_createmetadata" == "1" ]];then
echo "########### Running createmetadata ##############################"
cmd="${PARKIT} createmetadata --tarball \"${TARBALL}\" --dest \"${DEST}\""
if [[ "${RAWDATA}" == "yes" ]];then
cmd="${cmd} --collectiontype \"Rawdata\""
fi
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
dependency=""
if [[ "$jobids" != "" ]];then
Expand All @@ -235,9 +251,15 @@ fi # RUN_createmetadata ends
if [[ "$RUN_deposittar" == "1" ]];then
echo "############# Running deposittar ###############################"
cmd="${PARKIT} deposittar --tarball \"${TARBALL}\" --dest \"${DEST}\""
echo $cmd
if [[ "${RAWDATA}" == "yes" ]];then
cmd="${cmd} --collectiontype \"Rawdata\""
fi
echo $cmd
if [[ "$EXECUTOR" == "local" ]];then
`$cmd`
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi
else
dependency=""
if [[ "$jobids" != "" ]];then
Expand All @@ -252,5 +274,8 @@ echo "############# Running deposittar ###############################"
jobids="$jobids:$jid"
fi
fi
if [[ "${MAKEREADME}" == "yes" ]];then
echo "${TARBALL} parked at ${DEST} on HPCDME!" >> "${README}"
fi
echo "################################################################"
fi # RUN_deposittar ends
73 changes: 73 additions & 0 deletions src/parkit/scripts/projark
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env bash

SCRIPTNAME="$BASH_SOURCE"
SCRIPTDIRNAME=$(readlink -f $(dirname "$SCRIPTNAME"))

# add "bin" to PATH
if [[ ":$PATH:" != *":${SCRIPTDIRNAME}:"* ]];then
export PATH=${PATH}:${SCRIPTDIRNAME}
fi

# rely on redirect to be redirect to the python script
RESOURCEDIR=$(dirname "$SCRIPTDIRNAME")
TOOLDIR="$SCRIPTDIRNAME"
TOOLNAME="parkit"
# PARKIT="${TOOLDIR}/${TOOLNAME}"
PARKIT="parkit"

# Check if --version is provided as the first argument
if [[ "$1" == "--version" ]]; then
echo "projark is using the following parkit version:"
${PARKIT} --version
exit 0
fi

# Test if parkit is working
${PARKIT} > /dev/null 2>&1 || { echo "${PARKIT} not found or cannot be run!"; exit 1; }

ARGPARSE_DESCRIPTION="Wrapper for folder2hpcdme for quick CCBR project archiving!"
source ${RESOURCEDIR}/resources/argparse.bash || exit 1
argparse "$@" <<EOF || exit 1
parser.add_argument('--folder',required=True,help='Input folder path to archive')
parser.add_argument('--projectnumber',required=True,help='CCBR project number.. destination will be /CCBR_Archive/GRIDFTP/Project_CCBR-<projectnumber>')
parser.add_argument('--executor',required=False,default='slurm', help='slurm or local')
parser.add_argument('--rawdata',required=False,action='store_true', help='If tarball is rawdata and needs to go under folder Rawdata')
parser.add_argument('--cleanup',required=False,action='store_true', help='post transfer step to delete local files')
EOF

# Destination path for archiving
TITLE="CCBR-${PROJECTNUMBER}"
DEST="/CCBR_Archive/GRIDFTP/Project_${TITLE}"

# Check if SOURCE_CONDA_CMD is set
if [ -z "${SOURCE_CONDA_CMD}" ];then
echo "SOURCE_CONDA_CMD env variable must be set"
exit 1
else
echo "SOURCE_CONDA_CMD is set to: $SOURCE_CONDA_CMD"
fi

# Check if HPC_DM_UTILS is set
if [ -z "$HPC_DM_UTILS" ]; then
echo "HPC_DM_UTILS environment variable is not set."
exit 1 # Exit the script with an error code
else
echo "HPC_DM_UTILS is set to: $HPC_DM_UTILS"
fi

# Call folder2hpcdme with necessary parameters
cmd="parkit_folder2hpcdme --folder \"$FOLDER\" --dest \"$DEST\" --projecttitle \"$TITLE\" --projectdesc \"$TITLE\" --executor \"$EXECUTOR\" --hpcdmutilspath $HPC_DM_UTILS --makereadme"
if [[ "${RAWDATA}" == "yes" ]];then
cmd="${cmd} --rawdata"
fi
if [[ "${CLEANUP}" == "yes" ]];then
cmd="${cmd} --cleanup"
fi
echo $cmd
eval "$cmd"
if [[ "$?" != "0" ]];then
exit 1
fi

# Exit with the same status code as folder2hpcdme
exit $?
2 changes: 1 addition & 1 deletion src/parkit/src/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.0.2
2.0.2-dev
Loading

0 comments on commit 24d09ce

Please sign in to comment.