About a year ago, I heard about the BioJS project, I was looking for a component to display multiple sequence alignments. Back then, I could´t find a plugin suitable for me, and I thought I would design and implement mine when the time to develop the web server of my project was in place. As at that point my software wash´t ready to production, I didm´t pursued it further. However, I did read the documentation of how to design the components, so I when applying to the GSoC 2014 I was completely aware of the technicalities. So I decided to start my project as a clone of the BioJS project, making sure it compiles (I´m currently not compiling it for each test, but I test it now and then). The current branch for this protect is located in: https://github.com/homonecloco/biojs/tree/BAMViewer
I haven´t written javascript since the time when the ajax catchy word and CSS started to change the design. I come from the age of the TD and TR. As such, I decided to start my return to .js with the basics. I´m starting by writing a parser for the SAM format, according to the specification 1.4 ( http://samtools.sourceforge.net). The current parser looks like:
parse_sam_line: function(sam_line){
var currentline = sam_line.split("\t");
var cigar = currentline[5]
var obj = {
qname : currentline[0] ,
flags : parseInt(currentline[1],10),
rname : currentline[2] ,
pos : parseInt(currentline[3],10) ,
mapq : parseInt(currentline[4],10) ,
cigar : currentline[5] ,
rnext : currentline[6] ,
pnext : parseInt(currentline[7],10),
tlen : parseInt(currentline[8],10) ,
seq : currentline[9] ,
qual : currentline[10] ,
len : 100, //TODO: change this to use the cigar.
has_flag : function (f){return flags & f > 0 }
};
/* @is_paired = (@flag & 0x0001) > 0
@is_mapped = @flag & 0x0002 > 0
@query_unmapped = @flag & 0x0004 > 0
@mate_unmapped = @flag & 0x0008 > 0
@query_strand = !(@flag & 0x0010 > 0)
@mate_strand = !(@flag & 0x0020 > 0)
@first_in_pair = @flag & 0x0040 > 0
@second_in_pair = @flag & 0x0080 > 0
@primary = !(@flag & 0x0100 > 0)
@failed_quality = @flag & 0x0200 > 0
@is_duplicate = @flag & 0x0400 > 0*/
for(var j=12;j < currentline.length;j++){
var tag = sam_line[j].split(":")
if (tag[1] == "i"){
obj[tag[0]] = parseInt(tag[2]);
}else if (tag[1 == "f"]){
obj[tag[0]] = parseFloat(tag[2]);
}
else{
obj[tag[0]] = tag[2];
}
}
return obj;
}
The parser creates a new object, by splitting each line of the SAM by it´s tabs. The numeric values are parsed accordingly and the extra fields are stored in a dictionary at the end. The length of the aligned bases is not in the format, so in the following weeks I´ll return to parse the CIGAR line, which is the field that contains the details of the alignments. Also, the has_flag function is used to validate the binary flags as stated in the documentation.
In order to avoid loading all the SAM file (usually multi-gigabyte files), I’m using ajax to load particular regions. At this point, I’m assuming I have a server which can query in the standard region used by samtools (chromosome:start-end). For testing purposes, it is possible to point to a local sam file. The loader looks like:
load_region: function(region){
reference = this.reference;
reg = region.toString; //Format: chromsome:start-end
jQuery.ajax({
type: "GET",
url: this.dataSet,
data: { region: reg, ref: this.reference } ,
dataType: "text",
container: this,
success: function (data) {
correct = true
reads = this.container.parse_sam(data);
if(reads.length > 0){
this.container.add_alignments(reads);
this.container.render_visible();
this.container._move_to_top();
} else {
alert("Unknown format detected");
}
},
error: function (qXHR, textStatus, errorThrown) {
alert(" Error loading the SAM File! \n" + textStatus +
"\n" + errorThrown + "\n" + qXHR );
}
});
}
Also, to avoid loading severalties the same coordinate, and enable asynchronous loading of subregions, a caching mechanism keeps a dictionary of positions with all the reads overlapping, and if for some region a read is loaded twice, it only keeps the first read. This is needed, because when you query a region, you get reads in the flanks that overlap to the next region.
The current code adding to the cache is:
add_alignments: function(alignments){
var als = this.alignments;
var added = 0;
for(var i = 0; i < alignments.length; i++){
var aln = alignments[i];
if("undefined" === typeof als[aln.pos]){
als[aln.pos] = {};
}
var current_alignments = als[aln.pos];
if("undefined" === typeof als[aln.pos][aln.qname]){
added ++;
als[aln.pos][aln.qname] = aln;
}
}
}


No hay comentarios:
Publicar un comentario