miRMaster Tutorial

Getting started

Introduction

With miRMaster you can analyse your NGS sncRNA samples. To get started you only need two things:

Sequencing files in FASTQ format (uncompressed or .gz compressed).
The name of the library preparation kit or the library construct layout.

Once you have these information click on the Launch experiment button on the homepage or in the navigation bar. There you will be guided through 3 steps, where you can name your experiment, provide the used experimental protocol and finally upload your sequencing files.

If you have additional metadata information for your samples, such as a diagnosis for which you would like to perform differential expression analysis you can provide this information by clicking on the buttons below the upload area. This will allow you to name your annotations and to upload an annotation file. Once you have chosen your files and added your metadata you can click the launch button to start processing your data and sending it to the server. You can follow the progress in real-time by clicking the follow button.

This will open the experiment status page in a new tab, where you will be able to track the progress of the analysis of all your uploaded samples. Real-time web reports are provided for each sample that has been uploaded, allowing you to directly inspect your data. They provide information on the preprocessing, mapping, quantification and miRNA candidate prediction steps.

As soon as all samples have been analyzed, the results can be downloaded and an overall web-report is created and will be displayed on the top of the page.

Demo run and example samples

We provide two ways for trying out miRMaster:

You can directly inspect a pre-generated demo run of 70 samples (48 Alzheimer's Disease, 22 controls) from GEO (GSE46579).
You can download an example set of 4 samples (2 Alzheimer's Disease, 2 controls), configure your own run and follow the result generation in real-time.

Want to know more?

For a more detailed explanation of the options provided for the analysis setup please read the Starting an analysis section. If you would like more information on the report pages please read the Inspecting results section.

Starting an analysis

The analysis configuration is divided into the following three steps.

1 - Experiment information

Email (optional)	We will notify you as soon as the analysis of your uploaded samples is done.
Experiment Name (required)	Assign a name to your experiment run so that you will be able to recognize it later.
Aggregated Data Analysis	You acknowledge and agree that your data are included in an aggregated analysis. Please note: no experimental information or other data than summarized reads are considered. With the aggregated data analysis you will learn about the specificity of findings in your experiment and at the same time you further the development of miRMaster and miRCarta.

2 - Settings

The settings are divided in the following five categories. All options except the 3' adapter have reasonable default settings. The options for each category can be shown by clicking on the category bars. Even more options can be shown by toggling the expert mode switch.

Preprocessing

3'-Adapter (required)	Provide or choose the 3' adapter of your sequencing run.
5' Barcode length	Length of the barcode that should be removed at the beginning of each read.
Adapter barcode length	Length of the adapter barcode which precedes the actual adapter that should be removed.
UMI length	Length of the UMI. A value of 0 implies that no UMI was used. We assume that the read first starts with the UMI.
Minimum read length	Only trimmed reads with the specified length will be kept.
Maximum adapter edit distance	Maximum edit distance of the adapter to the reads.
Minimum read/adapter overlap	How many nucleotides must overlap between the found adapter and read, in case the adapter is not fully contained in the read.
Trim leading Ns	The first X leading bases are discarded if they contain Ns.
Trim trailing Ns	The last X trailing bases are discarded if they contains Ns.
Discard reads containing Ns	If reads still contain Ns after the trimming they will be discarded.
Sliding window size	The size of the sliding window to look at when averaging base qualities.
Sliding window required quality	The minimum average window quality needed to keep the sequence. If the quality falls below the minimum the sequence is trimmed from that position on.
Number of threads	The number of threads to use for preprocessing your samples in the browser.

Mapping

Alignment tool	Bowtie or STAR
Allowed number of mappings	Reads mapping to more than the allowed number of locations will be discarded.
Mapping seed length	Only mismatches in the seed region count.
Allowed mismatches	Maximum number of allowed mismatches in the seed region.

Quantification

Maximal allowed mismatches for the quantification of other ncRNAs	The maximum number of mismatches allowed during the quantification of other ncRNAs.
Maximal allowed mismatches for miRNA quantification	The maximum number of mismatches allowed during the quantification of known and novel miRNAs.
Maximal allowed distance from the 5'/3' end for miRNA/isomiR/mismatch quantification	The maximum allowed distance of the read 5'/3' end to the 5'/3' end of the annotated miRNA position.
Maximal allowed mismatches for isomiR quantification	The maximum number of mismatches allowed inside a read (the flanks are excluded to account non-template nucleotide additions) during the isomiR quantification of known and novel miRNAs.
Maximal allowed mismatches for quantification of mismatches in miRNAs	The maximum number of mismatches allowed during the quantification of mismatches in known and novel miRNAs.

miRNA candidate predicition - precursor excision

Minimum read stack height	The minimum count for reads from which precursors are excised.
Downstream window size	In downstream (5' to 3') direction, the highest read stack is searched in a window size of X nucleotides, from which two potential precursors are excised.
Short/long sequence extension	Related to the highest read mapping stack. On one side a short sequence extends the mapping region, while on the other side a long sequence extends it. As result for that region, two potential precursors are excised: short-mapped-long and long-mapped-short.

miRNA candidate prediction - filtering

Minimal/Maximal length of potential major miRNA	If the length of a potential mature miRNA is lower/higher than this threshold, then the corresponding precursor will be discarded.
Minimal required base pair ratio in the potential major miRNA	If a potential mature miRNA has a base pair ratio lower than this threshold, the corresponding precursor will be discarded.
Maximal allowed difference in length between the 5p- and 3p-segment	If the length difference between a 5p- and 3p-segment is higher than this threshold, then the corresponding precursor will be discarded.
Maximal portion of reads mapping inconsistently according to Dicer processing	If the portion of reads mapping inconsistently to a precursor according to Dicer processing is higher than the specified maximum, that precursor will be discarded.

miRNA candidate prediction - merging and global signature filtering

Maximal read score portion of reads mapping inconsistently according to Dicer processing	If the read score portion of reads mapping inconsistently to a precursor according to Dicer processing is higher than the specified maximum, that precursor will be discarded.
Use default penalty parameters	Reads mapping with mismatches are penalized per default by a dividing factor if they occur in at most 10% of all samples (but at most 10 samples). The dividing factor is the limit of occurring samples minus 1, but at least 2. This means that penalties are only introduced when uploading 10 or more samples.
Mismatch score penalty (modifiable when default penalty = NO)	When scoring the global signature we penalize reads mapping with mismatches to the precursor by a dividing factor.
Maximum occurring number of samples for score penalty (modifiable when default penalty = NO)	If a read mapping with a mismatch occurs in more than the specified number of samples its score is not penalized.
Maximum merging distance	Precursors are merged if the distances of their start and end positions are within the specified maximum distance.

3 - Data upload

In the last step first select the samples that should be analyzed either via drag-and-drop or by cliking in the upload box. If you have additional metadata information for your samples, such as a diagnosis for which you would like to perform differential expression analysis this information can then be provided by clicking the corresponding buttons below the upload area.

Click the launch button once all samples and metadata have been provided to start the analysis. A button to follow the real-time progress of the analysis will then appear. The preprocessing and upload progress are shown in progress bars in a table. Once the bar reaches 100% and the file has been upload it will turn green. During the preprocessing adapter will be removed from the reads, low quality reads will be filtered out or trimmed and reads with identical sequences will be collapsed, retaining only the number of identical reads and their sequence.

Inspecting results

Status page (example)

The status page displays the real-time progress of your analysis. You can see how many samples have been transferred to the server and their current analysis progress. Each row in the status table shows the timestamp when the sample reached the server, the sample name, its current analysis progress, the status and a link to its real-time report. If provided, additional metadata is displayed in the table as well.

In case an analysis fails the bar will turn red and a failure icon will be shown in the status column. Samples that have not been processed yet will have grey bars and a pending status icon.

Once all individual samples have been processed an overall report will be generated. When the total progress reaches 100%, the results will be available for download at the top of the page, as well as a link to the web-report.

Single sample report (example)

The report is divided into the following six categories. All plots and tables can be downloaded in multiple formats (png, svg, pdf, csv, xlsx).

1 - Overview

The overview page provides information about the experiment id, name, the sample name and optionally provided annotations.

2 - Preprocessing

The preprocessing page provides information about the sequences found in the sample and is subdivided into seven tabs.

Overview

Basic information on the number of sequences processed, the mean GC-content and the detected FASTQ quality encoding.

Base quality

A plot displaying the mean quality and standard deviation of all valid sequences at each position.

Sequence quality distribution

A plot displaying the number of sequences having a specific mean quality. Most sequences should have a mean quality near 40.

Base distribution

A plot displaying the base distribution at each position found in all valid sequences. Unknown bases that could not be recognized by the sequencer are described by N.

GC content distribution

A plot showing the distribution of the mean GC content in percent among the valid sequences.

Sequence length distribution

A plot showing the distribution of the sequence length among the sequences. For miRNA sequencing data peaks should be observed around 22 nucleotides.

Overrepresented sequences

A table showing the 50 most frequently found sequences, displaying their sequence, count and percentage of all valid sequences.

3 - Mapping

The mapping page provides information on the number of mapped sequences against different references. We report the mapping statistics regarding all reads on one hand, and regarding the collapsed reads (leading to only unique sequences and counting each as occurring only once) on the other. It is subdivided into five tabs.

Genome

Reads are mapped against the species' genome per default without mismatches in the first 18 nucleotides and reads mapping to more than 5 locations (per default) are discarded. Pie charts for the number and portion of reads mapping to the genome are shown.

miRNA

Reads are mapped against the species' miRBase v22 [1] while allowing per default one mismatch. We report the number and portion of reads mapping to known miRNAs within a 2 bases upstream and 5 bases downstream window. Further we report the portion of reads mapping exclusively to a miRBase stem-loop, but not to a miRNA.

Other ncRNA

Reads are mapped against other ncRNAs of Ensembl (release 100) [2] without mismatches.

Bacteria/Viruses

We map all reads that did not align to the species' genome with at most one mismatch) to all 7,556 bacteria and 7,026 virus sequences of NCBI RefSeq release 74 [3] and report the number of perfectly mapping reads. Reads mapping to bacteria or viruses can indicate exogenous miRNAs, but also reagent contamination or diseases such as sepsis.

4 - Quantification other ncRNA

The ncRNA quantification page provides information about the amount of found known ncRNAs from GtRNAdb (tRNAs), RNACentral (piRNAs), Ensembl and circBase (circRNAs). It is subdivided into nine tabs.

5 - Quantification miRNA

The miRNA quantification page provides information about the amount of found known miRNA sequences in the sample. All found known miRNAs are listed in a table and include the number of reads that mapped to them within a 2 bases upstream and 5 bases downstream window (per default).

Each miRNA is clickable and additional information are provided for the selected miRNA and its precursor stem-loop. The sequence and the minimum free energy are reported, as well as the secondary structure. The secondary structure nucleotides are per default colored according to the type of structure they belong to:

Green:: Stems (canonical helices)
Red: Multiloops (junctions)
Yellow: Interior loops
Blue: Hairpin loops
Orange: 5' and 3' unpaired region

The 3' and 5' miRNA are highlighted specifically as well.

The pileup plot of the reads mapping to the stem-loop can be inspected. It can be filtered according to the minimum number of reads mapping to a position. Further the plot can be dragged and zoomed to visualize the exact sequences. The number of reads with a shown sequence is shown on the left of the pileup plot (e.g. x1234 is a read occuring 1,234 times).

The read distribution on the precursor is visualized showing the total number of sequences mapping at each position. The distribution plot can be displayed either as bar plot or as area plot.

6 - Prediction

The prediction page provides information about the predicted precursors in the sample, before the merging step performed for the overall report. All predictions are listed in a table and include the number of reads mapping to the 5' and 3' predicted miRNAs, the genomic location of the precursor, its predicted probability of being a true precursor and whether it contains already known miRNAs. Since we don't exclude known precursors, many entries in this table will contain re-discovered known precursors.

Each precursor is clickable and additional information are provided for the selected prediction, as in the quantification tab. The sequence and the minimum free energy are reported, as well as the secondary structure. The additional information can be shown for the precursor sequence (determined by the start of the 5' miRNA and the end of the 3' miRNA) and also for the precursor stem-loop which was determined during the precursor excision step.

The secondary structure nucleotides are per default colored according to the type of structure they belong to:

Green:: Stems (canonical helices)
Red: Multiloops (junctions)
Yellow: Interior loops
Blue: Hairpin loops
Orange: 5' and 3' unpaired region

The 3' and 5' miRNA are highlighted specifically as well.

The pileup plot of the reads mapping to the precursor/stem-loop can be inspected. It can be filtered according to the minimum number of reads mapping to a position. Further the plot can be dragged and zoomed to visualize the exact sequences. The number of reads with a shown sequence is shown on the left of the pileup plot (e.g. x1234 is a read occuring 1,234 times).

The read distribution on the precursor/stem-loop is visualized showing the total number of sequences mapping at each position. The distribution plot can be displayed either as bar plot or as area plot.

Overall report (example)

The report is divided into six categories. It is built similar to the single sample report. All plots and tables can be downloaded in multiple formats (png, svg, pdf, csv, xlsx). For all except the first category at least an overview table is shown. A detailed per sample table is available for the same categories as well, however it is hidden by default to allow faster page loading. The overview table provides the median, mean, standard deviation and coefficient of variation for each provided information. If groups were specified the measures are provided for each group.