Code Review Feb 27 #12

iamciera · 2019-02-28T00:43:31Z

Goal

Since we have a list of where all the TFBS are in a sequence, we need to use that list to get the nucleotides in that regions from all 24 species (whether they have a TFBS or not).

@ndesaraju , you can do the more clunky, but likely faster way as seen below. @joanne-chen can you investigate if this is possible to do within the code already written, when we make the raw_sequence.....

Steps for how I see it can be performed

Scrape the unique align position from the align_position column from the files in output/map_motif_bcd_with_threshold. These files show where all the TFBS are found in each species across the entire region (each file). We just need to know that one TFBS was found in that position in one species to know we need to retrieve in all 24 species, even know there may be more than one species that has that same align position. That is why we only need the unique align_position number.
Know you should have a list of unique align_position numbers which correspond to the start of a TFBS. Now you need to get the raw position from the raw_position column for every species.
You can find the raw_position that corresponds to every align_position in the files in map_motif_bcd_no_thresholdSanity check should be the number of unique numbers * 24 species. How you grab every species might be a little tricky since you can't just grab the header information. But see below for a list of all the 24 species.
Now that you have the starting raw_position from each of the 24 species, you can use that number to grab 1. the length_of_the_TFBS forward from that position to capture the TFBS 2. n nucleotides beyond (length_of_the_TFBS + n) and n nucleotides behind the TFBS (raw_position - n). In each species. You will be grabbing the raw_sequence from raw directory

Note

It would be good to know which TFBSs in which species are on the original output/map_motif_bcd_with_threshold lists.

List of 24 species

Dkik
MEMB002A
MEMB002B
MEMB002C
MEMB002D
MEMB002E
MEMB002F
MEMB003A
MEMB003B
MEMB003C
MEMB003D
MEMB003E
MEMB003F
MEMB004A
MEMB004B
MEMB004E
MEMB005D
MEMB006B
MEMB006C
MEMB007A
MEMB007B
MEMB007C
MEMB007D
MEMB008C

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Review Feb 27 #12

Code Review Feb 27 #12

iamciera commented Feb 28, 2019 •

edited

Loading

Code Review Feb 27 #12

Code Review Feb 27 #12

Comments

iamciera commented Feb 28, 2019 • edited Loading

Goal

Steps for how I see it can be performed

Note

List of 24 species

iamciera commented Feb 28, 2019 •

edited

Loading