Newest Questions

18 votes
3 answers
5k views

Convert a BAM file from one reference to another?

I have a set of BAM files that are aligned using the NCBI GRCh37 human genome reference (with the chromosome names as NC_000001.10) but I want to analyze it using a BED file that has the UCSC hg19 ...
morgantaschuk's user avatar
50 votes
6 answers
16k views

Feature annotation: RefSeq vs Ensembl vs Gencode, what's the difference?

What are the actual differences between different annotation databases? My lab, for reasons still unknown to me, prefers Ensembl annotations (we're working with transcript/exon expression estimation)...
Plasma's user avatar
  • 603
23 votes
4 answers
3k views

Are there any rolling hash functions that can hash a DNA sequence and its reverse complement to the same value?

A common bioinformatics task is to decompose a DNA sequence into its constituent k-mers and compute a hash value for each k-mer. Rolling hash functions are an appealing solution for this task, since ...
Daniel Standage's user avatar
11 votes
2 answers
4k views

Difference between BWA-backtrack and BWA-MEM

Many of my colleagues recommend I use BWA-MEM instead of regular old BWA. The problem is I don't understand why and reading the BWA man page doesn't seem to help the matter. What is the difference ...
David Ross's user avatar
42 votes
4 answers
64k views

What is the difference between FASTA, FASTQ, and SAM file formats?

I'd like to learn the differences between 3 common formats such as FASTA, FASTQ and SAM. How they are different? Are there any benefits of using one over another? Based on Wikipedia pages, I can't ...
kenorb's user avatar
  • 1,323
11 votes
1 answer
125 views

What are the optimal parameters for docking a large ligand using Hex?

I'm looking to dock a large ligand (~90kDa) to a receptor slightly larger receptor (~125kDa) using Hex. If anyone is familiar with docking large structures, are there any recommended parameters for ...
Te-Yo's user avatar
  • 303
13 votes
2 answers
1k views

Mapping drug names to ATC codes

I'm interested working with the medication information provided by the UK Biobank. In order to get these into a usable form I would like to map them to ATC codes. Since many of the drugs listed in ...
Greg's user avatar
  • 831
20 votes
2 answers
671 views

Accuracy of the original human DNA datasets sequenced by Human Genome Project?

The Human Genome Project was the project of 'determining the sequence of nucleotide base pairs that make up human DNA, and of identifying and mapping all of the genes of the human genome'. It was ...
kenorb's user avatar
  • 1,323
52 votes
9 answers
7k views

What's the most efficient file format for the storage of DNA sequences?

I'd like to learn which format is most commonly used for storing the full human genome sequence (4 letters without a quality score) and why. I assume that storing it in plain-text format would be ...
kenorb's user avatar
  • 1,323

15 30 50 per page
1
433 434 435 436
437