Frequent Questions
402 questions
18
votes
12
answers
10k
views
How to convert fasta file to tab delimited file
I have a fasta file like
>sample 1 gene 1
atgc
>sample 1 gene 2
atgc
>sample 2 gene 1
atgc
I want to get the following output, with one break between ...
19
votes
2
answers
3k
views
How can we distinguish between true zero and dropout-zero counts in single-cell RNA-seq?
In single-cell RNA-seq data we have an inflated number of 0 (or near-zero) counts due to low mRNA capture rate and other inefficiencies.
How can we decide which genes are 0 due to gene dropout (lack ...
18
votes
4
answers
12k
views
How to compute RPKM in R?
I have the following data of fragment counts for each gene in 16 samples:
...
13
votes
4
answers
4k
views
What methods are available to find a cutoff value for non-expressed genes in RNA-seq?
I have a gene expression count matrix produced from bulk RNA-seq data. I'd like to find genes that were not expressed in a group of samples and were expressed in another group.
The problem of course ...
191
votes
4
answers
124k
views
Why does the SARS-Cov2 coronavirus genome end in aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (33 a's)?
The SARS-Cov2 coronavirus's genome was released, and is now available on Genbank. Looking at it...
...
22
votes
10
answers
7k
views
What is the fastest way to calculate the number of unknown nucleotides in FASTA / FASTQ files?
I used to work with publicly available genomic references, where basic statistics are usually available and if they are not, you have to compute them only once so there is no reason to worry about ...
19
votes
7
answers
18k
views
How to convert FASTA to BED
I have a FASTA file:
...
1
vote
3
answers
721
views
RNASeq: Normalization, stabilization, gene length and rlog
I was thinking about the best method for normalization, which takes gene length into account (in order to compare genes)...
Do you think I can do that? :
- taking raw counts and dividing each gene by ...
24
votes
4
answers
971
views
Tools for simulating Oxford Nanopore reads
Are there any free open source software tools available for simulating Oxford Nanopore reads?
21
votes
3
answers
9k
views
How exactly is "effective length" used in FPKM calculated?
According to this famous blog post, the effective transcript length is:
$\tilde{l}_i = l_i - \mu$
where $l_i$ is the length of transcript and $\mu$ is the average fragment length. However, typically ...
19
votes
2
answers
15k
views
Obtaining uniquely mapped reads from BWA mem alignment
This is based on a question from betsy.s.collins on BioStars. The original post can be found here.
Does anyone have any suggestions for other tags or filtering steps on BWA-generated BAM files that ...
11
votes
1
answer
2k
views
Why does a very strong BLAST hit get lost when I change num_alignments, num_descriptions or max_target_seqs parameter?
Disclaimer: This is a self answered question for documentation purpose and I adapted this from the following github gist. Especially from users terrycojones and peterjc as well as sujaikumar who ...
5
votes
2
answers
14k
views
Manually define clusters in Seurat and determine marker genes
I want to define two clusters of cells in my dataset and find marker genes that are specific to one and the other. Is there a way to do this in Seurat? Say, if I ...
5
votes
3
answers
2k
views
Access base aligned to particular reference position
The short version: If I have a SAM record, is there any simple way to retrieve the base aligned to a particular reference position without computing a pileup?
The long version: I'm using pysam to ...
2
votes
1
answer
2k
views
Protein ligand docking: how to convert <protein>.pdb to <protein>.maps.fld?
Hello I'm helping to develop a cloud docking tool for screening compounds, similar to Swissdock but with mass throughput and GPU optimizations.
Specifically helping screen existing drugs against ...