lunes, 9 de junio de 2014

Week 2: SAMtools server



BAM are usually used to store short read alignments. The size of BAM files is usually in the range of the gigabytes. As such, a backend server is required for any practical applications. For the development of a realistic use case of the BioJS BAMViewer. My first open source software is a binder for samtools in ruby (https://github.com/helios/bioruby-samtools). As I know pretty well the library I thought I could use it for my backend. Originally, I intended to use Rails, but at the end I settled up to Sintra(http://www.sinatrarb.com) as the web server.


As I intend to use this service to make a BAM viewer, I implemented the following functions:, I query the list of references (so, you can list all the chromosomes without having them in a database),

  • alignment: Gets the alignments in a region
  • wig: Converts the BAM file to wig, I will use it to display the coverage across the region
  • reference: Returns the string of the region. This way, you don’t need to load a full chromosome.
  • list: gets all the reference chromosome/contigs.

Each service has a function to dispatch its content, but since creating the object containing the sam file is the same across them, I have a function to wrap this behaviour

  

def get_bam(bam,reference)
        return @bam_files[bam] if @bam_files[bam] 
        
        bam_path = "#{self.settings.folder.to_s}/#{bam}.bam"
        reference_path =  "#{self.settings.reference.to_s}/#{reference}"
        return nil unless File.file?(bam_path)
        @bam_files[bam] = Bio::DB::Sam.new( 
            :fasta =>  reference_path,
            :bam => bam_path
        )
        
        return @bam_files[bam] 
      end
Then, each query has a method to dispatch it, the alignment function is:
  
         get '/*/*/alignment' do |ref, bam|
            region = params[:region]
            reg = Bio::DB::Fasta::Region.parse_region(region)
            stream do |out|
              get_bam(bam, ref).fetch(reg.entry, reg.start, reg.end) do |sam|
                out << "#{sam.sam_string}\n"
              end
            end
          end
For the wig output, I didn’t have in the library a function to convert from the SAM file, so I monkey patched the Region class, and with that I can generate wig files with arbitrary bin sizes.
  
    class Bio::DB::Fasta::Region
       def to_wig(opts={})
        step_size = opts[:step_size] ? opts[:step_size] : 1
        out = StringIO.new
        out << "fixedStep chrom=#{self.entry} span=#{step_size}\n"
        return out.string if self.pileup.size == 0
        current_start = self.pileup[0].pos
        current_acc = 0.0
        current_end = current_start + step_size
        self.pileup.each do |pile|

         if pile.pos >= current_end
          out << current_start << " " << (current_acc/step_size).to_s << "\n"
          current_start = pile.pos
          current_acc = 0.0
          current_end = current_start + step_size
         end
         current_acc += pile.coverage
        end
        out.string
       end
      end

No hay comentarios:

Publicar un comentario