-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excel validator #62
base: master
Are you sure you want to change the base?
Excel validator #62
Conversation
…ss of library submission sheets. Checks for empty entries, missing values and does basic checks on indexes, like check for double index in pool, color balance and length.
…leaned up code and added comments.
@@ -0,0 +1,495 @@ | |||
#!/usr/bin/env python | |||
|
|||
# load libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you organise library?
I like to sort things ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting is nice, but note that from __future__ import print_function
is special and must be the first line in the script.
return(False, warnings_numeric) | ||
|
||
# currently not used | ||
#def validate_mol(self, min_mol, max_mol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could this part of the code be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for future implementation of a check on pool molarity. I am waiting for Mattias to do some stuff, so Johannes (or me) can create a database, which is required to know which application requires what molarity.
You know that you're better at Python than I. If I remember well there should be some guidelines for such coding practices. I would comments about what the functions return, just before the definition of the function instead of at the end. |
Nice work Fran. Out of curiosity how this works I looked through your code and left some comments. Sorry for sticking my nose in 😇 |
library_information_validator.py
Outdated
min_length = sorted(count_length.keys())[0] | ||
index_list_colour = [] | ||
for index in index_seq: | ||
index_colour = index.replace('T','G').replace('A','R').replace('C', 'R') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure what you do here (maybe you should add more comments explaining it 😛).. I'm pretty sure on two channel chemistry an A gives signal in both channels, i.e. it has both colors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right! My bad! I was too concentrated on getting the script to work...
library_information_validator.py
Outdated
if lev.distance(index.lower(), index_seq[i].lower()) < LibrarySheet.MAX_DISTANCE: | ||
logger.warning('The index sequences {} and {} in pool {}'\ | ||
' display low diversity (only {} nt difference).'\ | ||
.format(index,index_seq[i], pool_name, lev.distance(index.lower(), index_seq[i].lower()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You calculate the same Lev. distance twice if the test returns true, if it's an expensive operation maybe you should store the result from lev.distance
in a variable for re-use.
library_information_validator.py
Outdated
index_count = 1 | ||
for index in index_seq: | ||
# checks that indexes only contain valid letters | ||
charRE = re.compile(r'[^ATCGNatcgn\-.]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex doesn't change between iterations so you should probably compile it outside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to point out that .
is a special regex character which means "any character" and that your regex was broken. However it seems that if it's within [^.]
then it's a special case where it's not a special character. Anyway, I would escape it to help future people reading your code 😅 (it should still work the same when escaped)
charRE = re.compile(r'[^ATCGNatcgn\-\.]')
library_information_validator.py
Outdated
for row in index_list_colour: | ||
try: | ||
column.append(row[row_nr]) | ||
except IndexError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should motivate why you catch and ignore this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and log - even if it's a debug
level log. pass
is bad style 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't ask for my review, but as others were getting in on the act I couldn't resist taking a look as well..
Some of the code made me go a bit cross-eyed when trying to figure out what you were doing, but generally it looks good - nice work! 👍
I've left a few comments generally whinging about code style and being "Pythonic". None of these are a big deal though, they're just best-practice things. They make life easier for anyone else maintaining your code (typically that person is future-you). People whined at me about this stuff for several years and it was annoying, but now that it's mostly ingrained, it's my time to wine at other people.. 😁
@@ -0,0 +1,495 @@ | |||
#!/usr/bin/env python | |||
|
|||
# load libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting is nice, but note that from __future__ import print_function
is special and must be the first line in the script.
library_information_validator.py
Outdated
warning_low_div, warning_index_length,\ | ||
warning_index_balance = validator.validate_index(result_index, pool, sindex) | ||
|
||
pool_warning = pool, [warning_low_div, warning_index_length, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, the code clarity here is a little scary. Is there a reason that you can't use a dictionary? eg. something like:
pool_warnings.append({
'pool': pool,
'warning_low_div': warning_low_div,
'warning_index_length': warning_index_length,
'warning_index_balance': warning_index_balance,
'warnings_cycle': warnings_cycle[i],
'warning_index_mix': warning_index_mix,
'warning_length_comp': warning_length_comp,
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason, just inexperience :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It made a bit more sense as I got further on into the code later. But generally, yes returning a structured response like this is better IMHO.
library_information_validator.py
Outdated
for row in index_list_colour: | ||
try: | ||
column.append(row[row_nr]) | ||
except IndexError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and log - even if it's a debug
level log. pass
is bad style 😉
library_information_validator.py
Outdated
@@ -57,12 +59,11 @@ def projectID(self): | |||
if(len(re.findall('P\d+P\d+', plate_id))>0): | |||
project_id_user = re.findall('P\d+', plate_id)[0] | |||
else: | |||
logger.error( | |||
sys.exit(logger.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument for sys.exit()
is usually a number which corresponds to an exit code. 0
is good, anything other than 0
is bad (error) - typically 1
when writing code.
logger.error
is probably returning None
or something else odd. What you want instead I think is the logger.error
call followed by sys.exit(1)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I got the idea to do it like this from the python documentation. There is says as follows:
"The optional argument arg can be an integer giving the exit status (defaulting to zero), or another type of object. [...] If another type of object is passed,
None
is equivalent to passing zero, and any other object is printed to stderr and results in an exit code of1
. In particular,sys.exit("some error message")
is a quick way to exit a program when an error occurs."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - but this doesn't really change my point. logger.error
always returns None
(I just checked), which means that you are always exiting with exit status 0
- this is interpreted as exiting successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ps. To clarify: sys.exit("some error message")
would exit with a non-zero exit code. It's just sys.exit(logger.error("some error message"))
which won't..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( and I thought I was right for once. But I bow to the master and will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FranBonath should we close this one? |
shrug |
Script to check library submission sheets for: