Newest Questions
6,549 questions
18
votes
3
answers
5k
views
Convert a BAM file from one reference to another?
I have a set of BAM files that are aligned using the NCBI GRCh37 human genome reference (with the chromosome names as NC_000001.10) but I want to analyze it using a BED file that has the UCSC hg19 ...
50
votes
6
answers
16k
views
Feature annotation: RefSeq vs Ensembl vs Gencode, what's the difference?
What are the actual differences between different annotation databases?
My lab, for reasons still unknown to me, prefers Ensembl annotations (we're working with transcript/exon expression estimation)...
23
votes
4
answers
3k
views
Are there any rolling hash functions that can hash a DNA sequence and its reverse complement to the same value?
A common bioinformatics task is to decompose a DNA sequence into its constituent k-mers and compute a hash value for each k-mer. Rolling hash functions are an appealing solution for this task, since ...
11
votes
2
answers
4k
views
Difference between BWA-backtrack and BWA-MEM
Many of my colleagues recommend I use BWA-MEM instead of regular old BWA. The problem is I don't understand why and reading the BWA man page doesn't seem to help the matter.
What is the difference ...
42
votes
4
answers
64k
views
What is the difference between FASTA, FASTQ, and SAM file formats?
I'd like to learn the differences between 3 common formats such as FASTA, FASTQ and SAM. How they are different? Are there any benefits of using one over another?
Based on Wikipedia pages, I can't ...
11
votes
1
answer
125
views
What are the optimal parameters for docking a large ligand using Hex?
I'm looking to dock a large ligand (~90kDa) to a receptor slightly larger receptor (~125kDa) using Hex. If anyone is familiar with docking large structures, are there any recommended parameters for ...
13
votes
2
answers
1k
views
Mapping drug names to ATC codes
I'm interested working with the medication information provided by the UK Biobank. In order to get these into a usable form I would like to map them to ATC codes. Since many of the drugs listed in ...
20
votes
2
answers
671
views
Accuracy of the original human DNA datasets sequenced by Human Genome Project?
The Human Genome Project was the project of 'determining the sequence of nucleotide base pairs that make up human DNA, and of identifying and mapping all of the genes of the human genome'. It was ...
52
votes
9
answers
7k
views
What's the most efficient file format for the storage of DNA sequences?
I'd like to learn which format is most commonly used for storing the full human genome sequence (4 letters without a quality score) and why.
I assume that storing it in plain-text format would be ...