About a year ago, I heard about the BioJS project, I was looking for a component to display multiple sequence alignments. Back then, I could´t find a plugin suitable for me, and I thought I would design and implement mine when the time to develop the web server of my project was in place. As at that point my software wash´t ready to production, I didm´t pursued it further. However, I did read the documentation of how to design the components, so I when applying to the GSoC 2014 I was completely aware of the technicalities. So I decided to start my project as a clone of the BioJS project, making sure it compiles (I´m currently not compiling it for each test, but I test it now and then). The current branch for this protect is located in: https://github.com/homonecloco/biojs/tree/BAMViewer
I haven´t written javascript since the time when the ajax catchy word and CSS started to change the design. I come from the age of the TD and TR. As such, I decided to start my return to .js with the basics. I´m starting by writing a parser for the SAM format, according to the specification 1.4 ( http://samtools.sourceforge.net). The current parser looks like:
parse_sam_line: function(sam_line){ var currentline = sam_line.split("\t"); var cigar = currentline[5] var obj = { qname : currentline[0] , flags : parseInt(currentline[1],10), rname : currentline[2] , pos : parseInt(currentline[3],10) , mapq : parseInt(currentline[4],10) , cigar : currentline[5] , rnext : currentline[6] , pnext : parseInt(currentline[7],10), tlen : parseInt(currentline[8],10) , seq : currentline[9] , qual : currentline[10] , len : 100, //TODO: change this to use the cigar. has_flag : function (f){return flags & f > 0 } }; /* @is_paired = (@flag & 0x0001) > 0 @is_mapped = @flag & 0x0002 > 0 @query_unmapped = @flag & 0x0004 > 0 @mate_unmapped = @flag & 0x0008 > 0 @query_strand = !(@flag & 0x0010 > 0) @mate_strand = !(@flag & 0x0020 > 0) @first_in_pair = @flag & 0x0040 > 0 @second_in_pair = @flag & 0x0080 > 0 @primary = !(@flag & 0x0100 > 0) @failed_quality = @flag & 0x0200 > 0 @is_duplicate = @flag & 0x0400 > 0*/ for(var j=12;j < currentline.length;j++){ var tag = sam_line[j].split(":") if (tag[1] == "i"){ obj[tag[0]] = parseInt(tag[2]); }else if (tag[1 == "f"]){ obj[tag[0]] = parseFloat(tag[2]); } else{ obj[tag[0]] = tag[2]; } } return obj; }
The parser creates a new object, by splitting each line of the SAM by it´s tabs. The numeric values are parsed accordingly and the extra fields are stored in a dictionary at the end. The length of the aligned bases is not in the format, so in the following weeks I´ll return to parse the CIGAR line, which is the field that contains the details of the alignments. Also, the has_flag function is used to validate the binary flags as stated in the documentation.
In order to avoid loading all the SAM file (usually multi-gigabyte files), I’m using ajax to load particular regions. At this point, I’m assuming I have a server which can query in the standard region used by samtools (chromosome:start-end). For testing purposes, it is possible to point to a local sam file. The loader looks like:
load_region: function(region){ reference = this.reference; reg = region.toString; //Format: chromsome:start-end jQuery.ajax({ type: "GET", url: this.dataSet, data: { region: reg, ref: this.reference } , dataType: "text", container: this, success: function (data) { correct = true reads = this.container.parse_sam(data); if(reads.length > 0){ this.container.add_alignments(reads); this.container.render_visible(); this.container._move_to_top(); } else { alert("Unknown format detected"); } }, error: function (qXHR, textStatus, errorThrown) { alert(" Error loading the SAM File! \n" + textStatus + "\n" + errorThrown + "\n" + qXHR ); } }); }
Also, to avoid loading severalties the same coordinate, and enable asynchronous loading of subregions, a caching mechanism keeps a dictionary of positions with all the reads overlapping, and if for some region a read is loaded twice, it only keeps the first read. This is needed, because when you query a region, you get reads in the flanks that overlap to the next region.
The current code adding to the cache is:
add_alignments: function(alignments){ var als = this.alignments; var added = 0; for(var i = 0; i < alignments.length; i++){ var aln = alignments[i]; if("undefined" === typeof als[aln.pos]){ als[aln.pos] = {}; } var current_alignments = als[aln.pos]; if("undefined" === typeof als[aln.pos][aln.qname]){ added ++; als[aln.pos][aln.qname] = aln; } } }
No hay comentarios:
Publicar un comentario