-
Notifications
You must be signed in to change notification settings - Fork 0
/
fastqformatdetect.pl
67 lines (47 loc) · 1.64 KB
/
fastqformatdetect.pl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#!/usr/bin/perl
# Author: Martin Dahlo
#
# Usage: perl scriptname.pl <infile>
# ex.
# perl scriptname.pl reads.fq
use warnings;
use strict;
=pod
Used to detect the format of a fastq file. In its current state,
it can only differentiate between sanger and solexa/illumina.
If need arises, checking for different versions of illumina formats
could easily be implemented. ( Please upload an update if you implement this )
Can easily be copy/pasted into any other script and altered to do other
things than die when it has determined the format.
Pseudo code
* Open the fastq file
* Look at each quality ASCII char and convert it to a number
* Depending on if that number is above or below certain thresholds,
determine the format.
=cut
# get variables
my $usage = "Usage: perl scriptname.pl <infile >\n";
my $fq = shift or die $usage;
# open the files
open FQ, "<", $fq or die $!;
# initiate
my @line;
my $l;
my $number;
# go thorugh the file
while(<FQ>){
# if it is the line before the quality line
if($_ =~ /^\+/){
$l = <FQ>; # get the quality line
@line = split(//,$l); # divide in chars
for(my $i = 0; $i <= $#line; $i++){ # for each char
$number = ord($line[$i]); # get the number represented by the ascii char
# check if it is sanger or illumina/solexa, based on the ASCII image at http://en.wikipedia.org/wiki/FASTQ_format#Encoding
if($number > 76){ # if solexa/illumina
die "This file is solexa/illumina format\n"; # print result to terminal and die
}elsif($number < 59){ # if sanger
die "This file is sanger format\n"; # print result to terminal and die
}
}
}
}