Highest scored questions

191 votes

4 answers

124k views

Why does the SARS-Cov2 coronavirus genome end in aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (33 a's)?

The SARS-Cov2 coronavirus's genome was released, and is now available on Genbank. Looking at it... ...

Rebecca J. Stones

1,725

asked Jan 25, 2020 at 0:55

52 votes

9 answers

7k views

What's the most efficient file format for the storage of DNA sequences?

I'd like to learn which format is most commonly used for storing the full human genome sequence (4 letters without a quality score) and why. I assume that storing it in plain-text format would be ...

kenorb

1,323

asked May 16, 2017 at 18:01

50 votes

6 answers

16k views

Feature annotation: RefSeq vs Ensembl vs Gencode, what's the difference?

What are the actual differences between different annotation databases? My lab, for reasons still unknown to me, prefers Ensembl annotations (we're working with transcript/exon expression estimation)...

Plasma

603

asked May 16, 2017 at 19:24

42 votes

4 answers

64k views

What is the difference between FASTA, FASTQ, and SAM file formats?

I'd like to learn the differences between 3 common formats such as FASTA, FASTQ and SAM. How they are different? Are there any benefits of using one over another? Based on Wikipedia pages, I can't ...

kenorb

1,323

asked May 16, 2017 at 18:37

35 votes

4 answers

9k views

Why does the FASTA sequence for coronavirus look like DNA, not RNA?

I'm looking at a genome sequence for 2019-nCoV on NCBI. The FASTA sequence looks like this: ...

jameshfisher

453

asked Feb 9, 2020 at 17:13

35 votes

2 answers

3k views

Why do some assemblers require an odd-length kmer for the construction of de Bruijn graphs?

Why do some assemblers like SOAPdenovo2 or Velvet require an odd-length k-mer size for the construction of de Bruijn graph, while some other assemblers like ABySS are fine with even-length k-mers?

Kamil S Jaron

5,662

asked May 19, 2017 at 18:34

34 votes

3 answers

28k views

Uppercase vs lowercase letters in reference genome

I am using a reference genome for mm10 mouse downloaded from NCBI, and would like to understand in greater detail the difference between lowercase and uppercase letters, which make up roughly equal ...

Scott Gigante

2,183

asked May 24, 2017 at 3:26

28 votes

7 answers

10k views

Read length distribution from FASTA file

I have a single ~10GB FASTA file generated from an Oxford Nanopore Technologies' MinION run, with >1M reads of mean length ~8Kb. How can I quickly and efficiently calculate the distribution of read ...

Scott Gigante

2,183

asked May 17, 2017 at 4:38

28 votes

8 answers

1k views

How to version the code and the data during the analysis?

I am currently looking for a system which will allow me to version both the code and the data in my research. I think my way of analyzing data is not uncommon, and this will be useful for many people ...

Iakov Davydov

2,765

asked May 18, 2017 at 9:27

27 votes

4 answers

7k views

Why sequence the human genome at 30x coverage?

A bit of a historical question on a number, 30 times coverage, that's become so familiar in the field: why do we sequence the human genome at 30x coverage? My question has two parts: Who came up with ...

719016

2,374

asked Aug 4, 2017 at 15:10

26 votes

5 answers

4k views

What happens if a major bug is discovered in a bioinformatic package that has been used in published literature?

Yesterday I was debugging some things in R trying to get a popular Flow Cytometry tool to work on our data. After a few hours of digging into the package I discovered that our data was hitting an edge ...

Nic Barker

361

asked Nov 14, 2017 at 0:26

24 votes

4 answers

17k views

What Ensembl genome version should I use for alignments? (e.g. toplevel.fa vs. primary_assembly.fa)

When you look at all the genome files available from Ensembl. You are presented with a bunch of options. Which one is the best to use/download? You have a combination of choices. First part options: ...

story

1,613

asked Jun 7, 2017 at 13:23

24 votes

4 answers

971 views

Tools for simulating Oxford Nanopore reads

Are there any free open source software tools available for simulating Oxford Nanopore reads?

Daniel Standage

5,080

asked May 22, 2017 at 18:19

23 votes

4 answers

3k views

Are there any rolling hash functions that can hash a DNA sequence and its reverse complement to the same value?

A common bioinformatics task is to decompose a DNA sequence into its constituent k-mers and compute a hash value for each k-mer. Rolling hash functions are an appealing solution for this task, since ...

Daniel Standage

5,080

asked May 16, 2017 at 19:15

22 votes

10 answers

7k views

What is the fastest way to calculate the number of unknown nucleotides in FASTA / FASTQ files?

I used to work with publicly available genomic references, where basic statistics are usually available and if they are not, you have to compute them only once so there is no reason to worry about ...

Kamil S Jaron

5,662

asked Jun 1, 2017 at 16:47

Stack Exchange Network

Why does the SARS-Cov2 coronavirus genome end in aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (33 a's)?

What's the most efficient file format for the storage of DNA sequences?

Feature annotation: RefSeq vs Ensembl vs Gencode, what's the difference?

What is the difference between FASTA, FASTQ, and SAM file formats?

Why does the FASTA sequence for coronavirus look like DNA, not RNA?

Why do some assemblers require an odd-length kmer for the construction of de Bruijn graphs?

Uppercase vs lowercase letters in reference genome

Read length distribution from FASTA file

How to version the code and the data during the analysis?

Why sequence the human genome at 30x coverage?

What happens if a major bug is discovered in a bioinformatic package that has been used in published literature?

What Ensembl genome version should I use for alignments? (e.g. toplevel.fa vs. primary_assembly.fa)

Tools for simulating Oxford Nanopore reads

Are there any rolling hash functions that can hash a DNA sequence and its reverse complement to the same value?

What is the fastest way to calculate the number of unknown nucleotides in FASTA / FASTQ files?

Hot Network Questions

Highest scored questions

Related Tags