-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Before and After 2R #16
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! Some comments that will help remove dead (old) code and increase code clarity.
"jellyfish dump -c -o {output.counts} {input.jf}" | ||
"jellyfish dump -c -L 2 {input.jf} > {output.counts}" | ||
|
||
rule extract_features: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you aren't using this rule, you should just delete it for clarity
shell: | ||
"scripts/pca.py --feature-matrices {input.feature_matrices} --groups-fl {params.groups_fl} --plot-fl {output.plot}" | ||
|
||
rule pca_count_binary: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add the pca outputs to a list in a top-level rule below. This will allow you to run everything with a single command.
@@ -32,10 +25,9 @@ merge_batch_size: 512 | |||
min_doc_freq: 2 | |||
|
|||
# feature extraction | |||
n_features: 30 | |||
n_features: 16 | |||
n_rand_dim: 3000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we aren't using random projection, you should remove this to avoid confusion
@@ -32,10 +25,9 @@ merge_batch_size: 512 | |||
min_doc_freq: 2 | |||
|
|||
# feature extraction | |||
n_features: 30 | |||
n_features: 16 | |||
n_rand_dim: 3000 | |||
use_binary_features: "--binary" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above -- with your changes, this isn't used so you should just remove it
@@ -32,10 +25,9 @@ merge_batch_size: 512 | |||
min_doc_freq: 2 | |||
|
|||
# feature extraction | |||
n_features: 30 | |||
n_features: 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to explain that this means 2^16 -- I made this ambiguous by calling it n_features
but expecting it to be log2_n_features
import numpy as np | ||
|
||
|
||
def load_rand_proj(flname): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
def parse_args(): | ||
parser = argparse.ArgumentParser() | ||
|
||
parser.add_argument("--rand-proj-fl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
type=str, | ||
required=True) | ||
|
||
parser.add_argument("--binary", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this -- no longer used
type=str, | ||
required=True) | ||
|
||
parser.add_argument("--passlist-bf", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
These are the updated files that are used to create the plots after count scaling on the 2R chromosome arm.