Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Before and After 2R #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Before and After 2R #16

wants to merge 1 commit into from

Conversation

DanielLeski
Copy link
Collaborator

These are the updated files that are used to create the plots after count scaling on the 2R chromosome arm.

@DanielLeski DanielLeski requested a review from rnowling June 8, 2021 20:47
Copy link
Collaborator

@rnowling rnowling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Some comments that will help remove dead (old) code and increase code clarity.

"jellyfish dump -c -o {output.counts} {input.jf}"
"jellyfish dump -c -L 2 {input.jf} > {output.counts}"

rule extract_features:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you aren't using this rule, you should just delete it for clarity

shell:
"scripts/pca.py --feature-matrices {input.feature_matrices} --groups-fl {params.groups_fl} --plot-fl {output.plot}"

rule pca_count_binary:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the pca outputs to a list in a top-level rule below. This will allow you to run everything with a single command.

@@ -32,10 +25,9 @@ merge_batch_size: 512
min_doc_freq: 2

# feature extraction
n_features: 30
n_features: 16
n_rand_dim: 3000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aren't using random projection, you should remove this to avoid confusion

@@ -32,10 +25,9 @@ merge_batch_size: 512
min_doc_freq: 2

# feature extraction
n_features: 30
n_features: 16
n_rand_dim: 3000
use_binary_features: "--binary"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above -- with your changes, this isn't used so you should just remove it

@@ -32,10 +25,9 @@ merge_batch_size: 512
min_doc_freq: 2

# feature extraction
n_features: 30
n_features: 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to explain that this means 2^16 -- I made this ambiguous by calling it n_features but expecting it to be log2_n_features

import numpy as np


def load_rand_proj(flname):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

def parse_args():
parser = argparse.ArgumentParser()

parser.add_argument("--rand-proj-fl",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

type=str,
required=True)

parser.add_argument("--binary",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this -- no longer used

type=str,
required=True)

parser.add_argument("--passlist-bf",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants