Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

Add wscleaner v1.0 #18

Merged
merged 15 commits into from
Oct 28, 2019
Merged

Add wscleaner v1.0 #18

merged 15 commits into from
Oct 28, 2019

Conversation

NMNS93
Copy link

@NMNS93 NMNS93 commented May 30, 2019

Fix #17


This change is Reviewable

@NMNS93 NMNS93 requested a review from andyb3 May 30, 2019 09:02
@NMNS93 NMNS93 self-assigned this May 30, 2019
Copy link
Contributor

@Graeme-Smith Graeme-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 14 files reviewed, 10 unresolved discussions (waiting on @andyb3 and @NMNS93)


wscleaner/wscleaner/lib.py, line 18 at r2 (raw file):

from functools import partial

I'm not familiar with this module - what is it used for?


wscleaner/wscleaner/lib.py, line 23 at r2 (raw file):

class RunFolder():
    """A local directory containing files with the 'fastq.gz' extension

Hi could we have a discussion on the use of RunFolder() over RunFolder(object) - is this script always going to be run in python3 or does it need to be Python2/3 agnostic?


wscleaner/wscleaner/lib.py, line 42 at r2 (raw file):

        self.dx_project = DxProjectRunFolder(self.name)

    @property

Not too familiar with the @Property syntax - could we please discuss what it is doing?


wscleaner/wscleaner/lib.py, line 56 at r2 (raw file):

        """
        # Find paths of files with fastq.gz extension
        fastq_paths = self.path.rglob('*.fastq.gz')

Small thing - would the regex '*.fastq.gz$' be better (more stringent)?


wscleaner/wscleaner/lib.py, line 75 at r2 (raw file):

        runfolder_name (str): The name of a local runfolder    
    Attributes:
        id (str): A DNAnexus project ID. Project matches input runfolder based on business rules.

What does "business rules" refer to?


wscleaner/wscleaner/lib.py, line 87 at r2 (raw file):

    def find_fastqs(self):
        """Returns a list of files in the DNAnexus project (self.id) with the fastq.gz extension"""
        fastq_regex = 'fastq.gz'

I cannot see where this variable is being used - 'fastq.gz' seems to be hardcoded in the line below.


wscleaner/wscleaner/lib.py, line 114 at r2 (raw file):

        uploaded_runfolder = dxpy.describe(self.id)['name'][4:]
        # Set logfiles locatoin. This is expected in 'Logfiles/', a subdirectory of the uploaded runfolder 
        logfile_dir = str(Path('/',uploaded_runfolder,'Logfiles'))

Will this script always be run in a linux environemt or does it need to be OS agnostic?


wscleaner/wscleaner/lib.py, line 138 at r2 (raw file):

    
    def __bool__(self):
        """Boolean expressions on class instances will return True if a single DNAnexus project was found."""

Could we discuss what the function is doing please?


wscleaner/wscleaner/lib.py, line 189 at r2 (raw file):

        for directory in subdirectories:
            rf = RunFolder(directory)
            # Criteria for runfolder: Older than or equal to min_age and containsn fastq.gz files

typo all([fastq in local_fastqs for fastq in dx_fastqs])
)containsn


wscleaner/wscleaner/lib.py, line 207 at r2 (raw file):
Should:

            all([fastq in local_fastqs for fastq in dx_fastqs])
            )

be:

            all([fastq in dx_fastqs for fastq in local_fastqs])
            )
`

?

Copy link
Contributor

@Graeme-Smith Graeme-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 14 files at r1, 2 of 6 files at r2.
Reviewable status: 6 of 14 files reviewed, 13 unresolved discussions (waiting on @andyb3 and @NMNS93)


wscleaner/wscleaner/main.py, line 31 at r2 (raw file):

    parser.register('action', 'setkey', SetKeyAction)
    parser.register('action', 'printkey', PrintKeyAction)
    # Define CLI arguments

Can we discuss what these parser.register statements are doing?


wscleaner/wscleaner/main.py, line 35 at r2 (raw file):

    parser.add_argument('--print-key', nargs=0, action='printkey', help='Print the cached DNA Nexus API key')
    parser.add_argument('--dry-run', help='Perform a dry run without deleting files', action='store_true', default=False)
    parser.add_argument('root', help='A directory containing runfolders to process')

Should the command be —root to be consistent with other commands?


wscleaner/wscleaner/main.py, line 38 at r2 (raw file):

    parser.add_argument('--logfile', help='A path for the application logfile', default='mokaguys_logger.log')
    # Get version from setup.py as version CLI response
    version_number = pkg_resources.require("wscleaner")[0].version

I’ll be using this approach to avoid hard coding version numbers in future.

Copy link
Contributor

@Graeme-Smith Graeme-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 14 files at r1, 1 of 6 files at r2.
Reviewable status: 11 of 14 files reviewed, 15 unresolved discussions (waiting on @andyb3 and @NMNS93)


wscleaner/test/test_all.py, line 18 at r2 (raw file):

    """Test that an authentication token is passed to pytest as a command line argument"""
    assert auth_token != None

I think PEP8 prefers the comparison to be in the form:
assett auth_token is not none


wscleaner/test/test_all.py, line 104 at r2 (raw file):

        rfm.delete(test_folder)
        assert test_folder.name in rfm.deleted

I like the use of monkeypatch package - everyday a school day.


wscleaner/wscleaner/lib.py, line 190 at r2 (raw file):

            rf = RunFolder(directory)
            # Criteria for runfolder: Older than or equal to min_age and containsn fastq.gz files
            if (rf.age >= min_age) and (rf.find_fastqs(count=True) > 0):

typo containsn

Copy link
Contributor

@Graeme-Smith Graeme-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 6 files at r2.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @andyb3 and @NMNS93)

Copy link
Author

@NMNS93 NMNS93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 14 files reviewed, 15 unresolved discussions (waiting on @andyb3 and @Graeme-Smith)


wscleaner/test/test_all.py, line 18 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

I think PEP8 prefers the comparison to be in the form:
assett auth_token is not none

Agreed!


wscleaner/test/test_all.py, line 104 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

I like the use of monkeypatch package - everyday a school day.

Haha! I was trying to find somewhere to test it. I don't find patching very intuitive... I also hear there's a pytest-mock extension that is easier.


wscleaner/wscleaner/lib.py, line 18 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

I'm not familiar with this module - what is it used for?

partial gets you a new function object where some of the arguments have been set/frozen. I don't seem to use it so I've removed this line.


wscleaner/wscleaner/lib.py, line 23 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Hi could we have a discussion on the use of RunFolder() over RunFolder(object) - is this script always going to be run in python3 or does it need to be Python2/3 agnostic?

Made a note to discuss. Installation in pytest enforces python3 as I don't believe we are supporting py2 going forward.


wscleaner/wscleaner/lib.py, line 42 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Not too familiar with the @Property syntax - could we please discuss what it is doing?

Made a note to discuss. Properties turn functions into attributes. Age isn't something that a runfolder does, but a property it has, however it is calculated so has to be a function. Syntax makes it a little cleaner, thought it made sense here!


wscleaner/wscleaner/lib.py, line 56 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Small thing - would the regex '*.fastq.gz$' be better (more stringent)?

Something amazing just happened. I put $ at the end, ran the test for this (test_find_fastqs) and it failed. Checked the docs and pathlib.rglob works like bash globs/wildcards. So it's equivalent to writing ls *.fastq.gz on the command line, rather than regular expression.

Amazing because tests!!!


wscleaner/wscleaner/lib.py, line 75 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

What does "business rules" refer to?

I meant the format, i.e. 003_190909_XJFFSJDFJX_NGS123. I have changed this to be clearer. I also added the runfolder attribute.


wscleaner/wscleaner/lib.py, line 87 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

I cannot see where this variable is being used - 'fastq.gz' seems to be hardcoded in the line below.

Removed.


wscleaner/wscleaner/lib.py, line 114 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Will this script always be run in a linux environemt or does it need to be OS agnostic?

Good point - this function counts logfiles on DNA nexus. They use the linux convention of '/' to refer to runfolders so I don't think this will change.


wscleaner/wscleaner/lib.py, line 138 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Could we discuss what the function is doing please?

Made a note to discuss.


wscleaner/wscleaner/lib.py, line 190 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

typo containsn

Done.


wscleaner/wscleaner/lib.py, line 207 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Should:

            all([fastq in local_fastqs for fastq in dx_fastqs])
            )

be:

            all([fastq in dx_fastqs for fastq in local_fastqs])
            )
`

?

Good spot! This whole function was convoluted and we no longer ignore undetermined files.


wscleaner/wscleaner/main.py, line 31 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Can we discuss what these parser.register statements are doing?

Made note to discuss!


wscleaner/wscleaner/main.py, line 35 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

Should the command be —root to be consistent with other commands?

Allows for the following cli wscleaner . - made a note to discuss


wscleaner/wscleaner/main.py, line 38 at r2 (raw file):

Previously, Graeme-Smith (Graeme) wrote…

I’ll be using this approach to avoid hard coding version numbers in future.

It's neat isn't it!

Copy link
Contributor

@Graeme-Smith Graeme-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3, 2 of 2 files at r4.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @andyb3)

Copy link
Contributor

@Graeme-Smith Graeme-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r5.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @andyb3)

@Graeme-Smith Graeme-Smith merged commit 3234bd5 into master Oct 28, 2019
@Graeme-Smith Graeme-Smith deleted the development branch October 28, 2019 11:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add script to clear runfolders from workstation
2 participants