Frequent Questions

18 votes
12 answers
10k views

How to convert fasta file to tab delimited file

I have a fasta file like >sample 1 gene 1 atgc >sample 1 gene 2 atgc >sample 2 gene 1 atgc I want to get the following output, with one break between ...
AudileF's user avatar
  • 965
19 votes
2 answers
3k views

How can we distinguish between true zero and dropout-zero counts in single-cell RNA-seq?

In single-cell RNA-seq data we have an inflated number of 0 (or near-zero) counts due to low mRNA capture rate and other inefficiencies. How can we decide which genes are 0 due to gene dropout (lack ...
Peter's user avatar
  • 2,644
18 votes
4 answers
12k views

How to compute RPKM in R?

I have the following data of fragment counts for each gene in 16 samples: ...
Iakov Davydov's user avatar
13 votes
4 answers
4k views

What methods are available to find a cutoff value for non-expressed genes in RNA-seq?

I have a gene expression count matrix produced from bulk RNA-seq data. I'd like to find genes that were not expressed in a group of samples and were expressed in another group. The problem of course ...
Peter's user avatar
  • 2,644
191 votes
4 answers
124k views

Why does the SARS-Cov2 coronavirus genome end in aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (33 a's)?

The SARS-Cov2 coronavirus's genome was released, and is now available on Genbank. Looking at it... ...
Rebecca J. Stones's user avatar
22 votes
10 answers
7k views

What is the fastest way to calculate the number of unknown nucleotides in FASTA / FASTQ files?

I used to work with publicly available genomic references, where basic statistics are usually available and if they are not, you have to compute them only once so there is no reason to worry about ...
Kamil S Jaron's user avatar
19 votes
7 answers
18k views

How to convert FASTA to BED

I have a FASTA file: ...
SmallChess's user avatar
  • 2,766
1 vote
3 answers
721 views

RNASeq: Normalization, stabilization, gene length and rlog

I was thinking about the best method for normalization, which takes gene length into account (in order to compare genes)... Do you think I can do that? : - taking raw counts and dividing each gene by ...
Nin00's user avatar
  • 11
24 votes
4 answers
971 views

Tools for simulating Oxford Nanopore reads

Are there any free open source software tools available for simulating Oxford Nanopore reads?
Daniel Standage's user avatar
21 votes
3 answers
9k views

How exactly is "effective length" used in FPKM calculated?

According to this famous blog post, the effective transcript length is: $\tilde{l}_i = l_i - \mu$ where $l_i$ is the length of transcript and $\mu$ is the average fragment length. However, typically ...
user172818's user avatar
  • 6,605
19 votes
2 answers
15k views

Obtaining uniquely mapped reads from BWA mem alignment

This is based on a question from betsy.s.collins on BioStars. The original post can be found here. Does anyone have any suggestions for other tags or filtering steps on BWA-generated BAM files that ...
gringer's user avatar
  • 15.5k
11 votes
1 answer
2k views

Why does a very strong BLAST hit get lost when I change num_alignments, num_descriptions or max_target_seqs parameter?

Disclaimer: This is a self answered question for documentation purpose and I adapted this from the following github gist. Especially from users terrycojones and peterjc as well as sujaikumar who ...
voiDnyx's user avatar
  • 401
5 votes
2 answers
14k views

Manually define clusters in Seurat and determine marker genes

I want to define two clusters of cells in my dataset and find marker genes that are specific to one and the other. Is there a way to do this in Seurat? Say, if I ...
Nikita Vlasenko's user avatar
5 votes
3 answers
2k views

Access base aligned to particular reference position

The short version: If I have a SAM record, is there any simple way to retrieve the base aligned to a particular reference position without computing a pileup? The long version: I'm using pysam to ...
Daniel Standage's user avatar
2 votes
1 answer
2k views

Protein ligand docking: how to convert <protein>.pdb to <protein>.maps.fld?

Hello I'm helping to develop a cloud docking tool for screening compounds, similar to Swissdock but with mass throughput and GPU optimizations. Specifically helping screen existing drugs against ...
Paul Shen's user avatar

15 30 50 per page
1
2 3 4 5
27