LIRBase: a comprehensive collection of long inverted repeats in eukaryotes

LIRBase houses a total of 6,789,791 long inverted repeats (LIR, longer than 800 nt) in 427 eukaryotic genomes and provides various functionalities to facilitate the annotation and functional studies of LIRs and their derived small RNAs.

Statistics

6,789,791

Long inverted repeats

427

Eukaryotic genomes

374

Species

LIRs in 77 invertebrate metazoa genomes

LIRs in 142 plant genomes

LIRs in 208 vertebrate genomes

Functionalities of LIRBase

Browse LIRBase by species/genomes

Search LIRBase by genomic locations

Search LIRBase by the identifiers of LIRs

Search LIRBase by sequence similarity using BLAST

Detect and annotate long inverted repeats in user-uploaded DNA sequences

Identify candidate LIRs encoding long hpRNAs by aligning sRNA sequencing data to LIRs

Differential expression analysis of LIRs or small RNAs between different biological samples/tissues

Identify protein-coding genes targeted by the small RNAs derived from a LIR

Predict and visualize the secondary structure of potential long hpRNA encoded by a LIR

Browse long inverted repeats identified in 77 Invertebrate metazoa genomes

Species
LIRs of Invertebrate metazoa

Browse long inverted repeats identified in 142 Plant genomes

Species
LIRs of Plant

Browse long inverted repeats identified in 208 Vertebrate genomes

Species
LIRs of Vertebrate

Search by genomic location

Choose genome

Choose chromosome

Calculation In progress...

Structure of LIRs in the search result

Sequence of LIRs in the search result

Search result

Search by LIR identifier

Input
Output

Input LIR identifiers

Choose genome

Calculation In progress...

Structure of LIRs in the search result

Sequence of LIRs in the search result

Search result

Search LIRBase by sequence similarity using BLAST

Input
Output

Paste or upload input data?

Input sequence

Upload file

Browse...

Example BLAST input data

Choose BLAST databases

E-value cutoff

Maximum no. of hits

Calculation In progress...

BLAST result

Structure of LIRs in the BLAST result

Sequence of LIRs in the BLAST result

BLAST output

Download structure of predicted LIRs

Download sequence of predicted LIRs

Identify candidate LIRs encoding long hpRNAs by aligning sRNA sequencing data to LIRs

Input
Output

Input sRNA read count data

Paste sRNA read count

Upload sRNA read count file

Browse...

Example input data

Choose a LIR database to align the sRNA data

Max number of alignment hits of sRNA

Max number of mismatches allowed

Calculation In progress...

sRNA alignment summary

sRNA alignment result

sRNA read count of aligned LIRs

Alignment summary of sRNA sequencing data to each LIR

Differentially expressed LIRs/sRNAs

Predicted targets of small RNAs encoded by a LIR

Alignment of small RNAs against their targets

Download LIR secondary structure in PDF file

Why LIR and LIRBase?

Long inverted repeat, long hpRNA and siRNA

An inverted repeat is a single stranded nucleotide sequence followed by its reverse complement at the downstream. The intervening sequence between the initial sequence and the reverse complement can be any length including zero. When transcribed, long inverted repeat can form long hairpin RNA genes (hpRNAs), which are much longer than typical animal or plant pre-miRNAs.

Henderson et al. reported the biogenesis of small interfering RNAs (siRNAs) from long inverted repeat in Arabidopsis thaliana for the first time (Henderson et al. 2006 Nature genetics). This siRNA biogenesis pathway was soon verified in Drosophila (Czech et al. 2008 Nature). In 2008, Okamura et al. systematically characterized the genes and mechanisms underlying the biogenesis of 21-22-nucleotide siRNAs from long hpRNAs encoded by LIRs in Drosophila (Okamura et al. 2008 Nature). They found that Dicer-2, Hen1 and Argonaute 2 played vital roles in this siRNA biogenesis pathway. This siRNA biogenesis pathway was further characterized in Arabidopsis soon (Dunoyer et al. 2010 EMBO J).

LIRs can act as functional genomic elements in eukaryotic genomes.
A typical long inverted repeat and the small RNAs originated from the LIR analyzed utilizing LIRBase are demonstrated in the following image.

siRNA derived from long inverted repeats play important biological roles

In 2018, Lin et al. identified two long hpRNAs in Drosophila simulans, which could be processed into 21-nt siRNAs (Tao et al. 2007a PLOS Biology; Tao et al. 2007b PLOS Biology; Lin et al. 2018 Developmental Cell). These siRNAs could then repress the expression of the Dox and MDox genes which promotes X chromosome transmission by suppressing Y-bearing sperm. As a result, the two long hpRNAs and the derived siRNAs are critical to the maintenance of balanced sex ratio in the offsprings of Drosophila simulans.

The biological functions of siRNAs derived from long inverted repeats in plants and animals were also reported in recent years.

In mouse, siRNAs derived from LIRs were reported to regulate gene expression in oocytes (Tam et al. 2008 Nature; Watanabe et al. 2008 Nature).

In Drosophila, another hpRNA and the derived siRNAs were reported to regulate testis gene expression and control male fertility (Wen et al. 2015 Molecular Cell).

In apple, a long hpRNA and the generated siRNAs contributed to the resistance of apple to leaf spot disease (Zhang et al. 2018 Plant Cell).

In soybean, a long hpRNA and the derived 22-nt siRNAs regulate the seed coat color of soybean (Tuteja et al. 2009 Plant Cell; Cho et al. 2013 PLOS ONE; Jia et al. 2020 Plant Cell).

In rice, we previously found that several LIRs were present in one parental genome of an elite hybrid but were absent from the other parental genome (Yao et al. 2020 Computational and Structural Biotechnology Journal). As a result, siRNAs derived from the LIRs were detected and expressed in only one parental genome. The association between the LIRs and siRNAs were further detected and verified in an F2 population derived from a self-cross of the elite hybrid.

Comprehensive genome-wide identification of LIRs and long hpRNAs in eukaryotic genomes are urgently needed

In 2013, Axtell urgently called on the comprehensive genome-wide identification and annotation of long inverted repeats and long hpRNAs (Axtell et al. 2013 Annual Review of Plant Biology). However, genome-wide identification and annotation of long inverted repeats were only conducted in very few organisms. None database or web server for annotation and analysis of long inverted repeats and long hpRNAs exist up to now.

Using Inverted Repeats Finder (IRF) (Warburton et al. 2004 Genome Research), we identified a total of 6,789,791 long inverted repeats in the whole genomes of 424 eukaryotes, including 297,317 LIRs in 77 invertebrate metazoa genomes, 1,902,296 LIRs in 142 plant genomes and 4,590,178 LIRs in 208 vertebrate genomes. We requested a minimum length of 400 nt for both arms of the long inverted repeat identified by IRF, to remove potential miniature inverted-repeat transposable element (MITE) or Alu element from the result of IRF.

Nomenclature of a long inverted repeat in LIRBase

Each long inverted repeat has a unique identifier in LIRBase determined by the species name and several features of the LIR including the chromosome ID, the start coordinate of the left arm, the end coordinate of the left arm, the start coordinate of the right arm, the end coordinate of the right arm.

Please be noted that the sequence of a LIR in LIRBase is composed of the left arm sequence, the loop sequence, the right arm sequence, as well as two 200-bp sequences flanking the LIR (the left flanking sequence and the right flanking sequence). The genomic coordinates of both arms of the LIR are reflected in the identifier of the LIR, while the flanking sequences are not denoted in the identifier of the LIR.

References

Axtell et al. (2013), Classification and Comparison of Small RNAs from Plants, Annual Review of Plant Biology
Cho et al. (2013), The Transition from Primary siRNAs to Amplified Secondary siRNAs That Regulate Chalcone Synthase During Development of Glycine max Seed Coats, PLoS ONE
Czech et al. (2008), An endogenous small interfering RNA pathway in Drosophila, Nature
Dunoyer et al. (2010), An endogenous, systemic RNAi pathway in plants, EMBO J (Note: This article had been retracted due to image irregularities, while the authors considered that the core conclusions of the published paper remain valid.)
Henderson et al. (2006), Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning, Nature genetics
Jia et al. (2020), Soybean DICER-LIKE2 Regulates Seed Coat Color via Production of Primary 22-Nucleotide Small Interfering RNAs from Long Inverted Repeats, Plant Cell
Lin et al. (2018), The hpRNA/RNAi Pathway Is Essential to Resolve Intragenomic Conflict in the Drosophila Male Germline, Developmental Cell
Okamura et al. (2008), The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs, Nature
Tam et al. (2008), Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes, Nature
Tao et al. (2007), A sex-ratio Meiotic Drive System in Drosophila simulans. I: An Autosomal Suppressor, PLOS Biology
Tao et al. (2007), A sex-ratio Meiotic Drive System in Drosophila simulans. II: An X-linked Distorter, PLOS Biology
Tuteja et al. (2009), Endogenous, Tissue-Specific Short Interfering RNAs Silence the Chalcone Synthase Gene Family in Glycine max Seed Coats, Plant Cell
Warburton et al. (2004), Inverted Repeat Structure of the Human Genome: The X-Chromosome Contains a Preponderance of Large, Highly Homologous Inverted Repeats That Contain Testes Genes, Genome Research
Watanabe et al. (2008), Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes, Nature
Wen et al. (2015), Adaptive Regulation of Testis Gene Expression and Control of Male Fertility by the Drosophila Hairpin RNA Pathway, Molecular Cell
Yao et al. (2020), Features of sRNA biogenesis in rice revealed by genetic dissection of sRNA expression level, Computational and Structural Biotechnology Journal
Zhang et al. (2018), A Single-Nucleotide Polymorphism in the Promoter of a Hairpin RNA Contributes to Alternaria alternata Leaf Spot Resistance in Apple (Malus × domestica), Plant Cell

Tutorial of LIRBase

LIRBase is a database with comprehensive collection of long inverted repeats in 427 eukaryotic genomes.

Using IRF (https://tandem.bu.edu/irf/irf.download.html), we identified a total of 6,789,791 long inverted repeats in the whole genomes of 427 eukaryotes, including 297,317 LIRs in 77 invertebrate metazoa genomes, 1,902,296 LIRs in 142 plant genomes and 4,585,178 LIRs in 208 vertebrate genomes. LIRBase is deployed at https://venyao.xyz/lirbase/ for online use.

The homepage of LIRBase displays the main functionalities of LIRBase (Figure 1).

Browse long inverted repeats (LIR) identified in 427 eukaryotic genomes for the sequences, structures of LIRs, and the overlaps between LIRs and genes.

Search LIRBase for long inverted repeats in a specific genome by genomic locations.

Search LIRBase for long inverted repeats in a specific genome by the identifiers of long inverted repeats.

Search LIRBase by sequence similarity using BLAST.

Detect and annotate long inverted repeats in user-uploaded DNA sequences.

Align small RNA sequencing data to long inverted repeats of a specific genome to detect the origination of small RNAs from long inverted repeats and quantify the expression level of long inverted repeats.

Perform differential expression analysis of long inverted repeats or small RNAs between different biological samples/tissues.

Identify protein-coding genes targeted by the small RNAs derived from a LIR through detecting the complementary matches between small RNAs and the cDNA sequence of protein-coding genes.

Predict and visualize the secondary structure of potential hpRNA encoded by a LIR using RNAfold.

Figure 1. The homepage of LIRBase.

1. Browse LIRBase for long inverted repeats identified in 427 eukaryotic genomes

The images and the species names of 77 invertebrate metazoa genomes are listed in the Species panel of the Invertebrate metazoa submenu under the Browse menu (Figure 2). The images and the species names of 142 plant genomes are listed in the Species panel of the Plant submenu under the Browse menu. The images and the species names of 208 vertebrate genomes are listed in the Species of under the Vertebrate submenu under the Browse menu.

Figure 2. Species name and images of 77 Invertebrate metazoa genomes listed in the Species panel of the Invertebrate metazoa submenu under the Browse menu.

Click of the image or the species name of any genome would take you to the LIRs of Invertebrate metazoa panel of the Invertebrate metazoa submenu under the Browse menu, which displays all the LIRs identified in the selected genome (Figure 3). A brief summary of all the LIRs identified in the selected genome and a table of all the LIRs showing the structure of each LIR are demonstrated in the LIRs of Invertebrate metazoa panel. Three buttons would be displayed below the table if any row of the table was clicked (Figure 3).

Figure 3. List of all the LIRs identified for a selected genome.

The three buttons below the table can be clicked to display the sequence, structure of the selected LIR and the overlaps between the selected LIR and genes, respectively (Figure 4).

Figure 4. Detailed information of a selected LIR.

2. Search LIRBase by genomic locations

LIRBase allows searching for LIRs identified in any of the 427 eukaryotic genomes by genomic locations (Figure 5).

Figure 5. The Search by genomic location submenu under the Search menu.

The detailed steps to search LIRBase by genomic locations are shown in Figure 6. The search results are displayed as a data table (Figure 6). Each row of the data table represents a LIR in the search result. Three buttons would be displayed below the table if any row of the table was selected. The detailed information of a LIR including the sequence, structure of the selected LIR and the overlaps between the selected LIR and genes, can be viewed by clicking the three buttons (Figure 6)

Figure 6. Steps to search LIRBase by genomic location.

3. Search LIRBase by the identifiers of LIRs

LIRBase allows searching for LIRs identified in any of the 427 eukaryotic genomes by the identifiers (IDs) of long inverted repeats (Figure 7).

Figure 7. The Search by LIR identifier submenu under the Search menu.

The detailed steps to search LIRBase by the identifiers of LIRs are shown in Figure 8.

Figure 8. Steps to search LIRBase by LIR identifiers.

After clicking the Search button in the Input panel shown in Figure 8, the results would be displayed as a data table in the Output panel (Figure 9). The result can also be downloaded by clicking the download buttons on top of the data table. Each row of the data table represents a LIR in the search result. Three buttons would be displayed below the table if any row of the table was clicked. The detailed information of a LIR including the sequence, structure of the selected LIR and the overlaps between the selected LIR and genes, can be viewed by clicking the three buttons (Figure 9).

Figure 9. The Output panel of the Search by LIR identifier submenu.

4. Search LIRBase using BLAST

Users can also search LIRBase by sequence similarity using BLAST (Figure 10). A graphical interface was implemented in LIRBase for users to perform BLAST alignment through the NCBI BLAST+ program. BLASTN databases were constructed for all the LIRs identified in each of the 427 eukaryotic genomes. Users can choose to BLAST against any one or more BLASTN databases. The detailed steps to perform BLAST in LIRBase in shown in Figure 10.

Figure 10. Steps to BLAST in LIRBase.

Once the BLAST alignment is finished, you would be taken to the Output panel of the Blast menu, which displays the BLAST result in a data table (Figure 11). The whole BLAST results can be downloaded by clicking the download buttons on top of the data table. Each row of the data table represents a BLAST hit. By clicking a row of this table, the detailed information of the selected BLAST hit would be displayed, including the alignment of a query sequence and a subject LIR sequence in the BLAST database. Three buttons would also be displayed below the table if any row of the table was clicked. The structure, sequence of the LIR in this BLAST hit and the overlaps between this LIR and genes in the corresponding genome, can be viewed by clicking the three buttons (Figure 11).

Figure 11. The Output panel of the Blast submenu.

5. Annotate long inverted repeats in user-uploaded DNA sequences

The software IRF (https://tandem.bu.edu/irf/irf.download.html) was utilized to identify long inverted repeats in the 427 eukaryotic genomes collected in LIRBase. IRF can only be used in the command line. We implemented a graphical interface for users to annotate long inverted repeats in user-uploaded DNA sequences by IRF (Figure 12). The detailed steps to annotate LIRs in user-uploaded DNA sequences are shown in Figure 12. The input DNA sequences for IRF can be pasted in a text area or be uploaded from a local text file. The input data must be DNA sequences in fasta format. Each sequence should have a unique ID start with “>”.

Figure 12. The Annotate menu of LIRBase to annotate LIRs in user-uploaded DNA sequences.

The sequences and structures of LIRs identified by IRF can be downloaded as text files (Figure 12). The result of IRF are listed in a data table (Figure 12). Each row shows the structure of an identified long inverted repeat. Two buttons would be displayed below the table if any row of the table was clicked. The detailed information of the selected LIR including the sequence, structure of the selected LIR, can be viewed by clicking the two buttons (Figure 12).

6. Identify candidate LIRs encoding long hpRNAs by aligning sRNA sequencing data to LIRs

When transcribed, long inverted repeat can form long hairpin RNA genes (hpRNAs), which are much longer than typical animal or plant pre-miRNAs. Henderson et al. (2006) reported the biogenesis of small interfering RNAs (siRNAs) from long inverted repeat in Arabidopsis thaliana for the first time. This siRNA biogenesis pathway was soon reported and verified in other animals and plants.

To facilitate the annotation of small RNAs derived from LIRs, we implemented a functionality in LIRBase allowing alignment of user-uploaded small RNA sequencing data to all the identified LIRs of a genome by Bowtie (Figure 13). The input data should be read count of small RNAs rather than the raw small sequencing data as shown in Figure 13. The input small RNA read count data can be pasted in a text area provided or be uploaded from a local text file.

Figure 13. The Quantify submenu under the Expression menu of LIRBase to align small RNA sequencing data to a LIR database.

After clicking the Align! button, the alignment would be performed by Bowtie. The alignment results would be displayed in the Output panel of the Quantify submenu (Figure 14). The detailed alignment result, the summary of the alignment and the sRNA read count of aligned LIRs can be downloaded. What's more, the summary of alignment result for all aligned LIRs can be viewed as a data table. By clicking on a single row of the data table, the size distributions of sRNAs and the alignment of sRNAs to the LIR would be plotted. The detailed information of the chosen LIR would be displayed by clicking the four buttons below the table of sRNA alignment summary.

Figure 14. The Output panel of the Quantify submenu of LIRBase.

For each LIR in the data table of sRNA alignment summary, the following information are displayed as different columns.

The number of sRNAs aligned to the LIR.

The number of sRNA sequencing reads aligned to the LIR.

The percentage of 21-nt and 22-nt sRNAs among all sRNAs aligned to the LIR.

The percentage of 24-nt sRNAs among all sRNAs aligned to the LIR.

The percentage of sRNAs aligned to the arms of the LIR among all sRNAs aligned to the LIR.

The percentage of sRNAs aligned to the loop of the LIR among all sRNAs aligned to the LIR.

The percentage of sRNAs aligned to the flanking sequences of the LIR among all sRNAs aligned to the LIR.

At the top of this table, we can set the values of different columns to identify candidate LIRs encoding long hpRNAs. For example, we can identify LIRs encoding candidate long hpRNAs in the genome of Minghui 63 with the following request: (1) a minimum of 90 sRNAs aligned to the LIR, (2) a minimum of 80% sRNAs aligned to the arms of the LIR, (3) a minimum of 50% sRNAs should be 21 or 22 nt (Figure 15).

Figure 15. Set the values of different columns of the table of sRNA alignment summary to identify LIRs encoding candidate long hpRNAs.

7. Differential expression analysis of long inverted repeats and small RNAs

By aligning small RNA sequencing data to LIRBase, we can obtain the small RNA read count for each LIR in a genome. With multiple biological samples/tissues, we can perform differential expression analysis of long inverted repeats between different biological samples/tissues (Figure 16). The R package DESeq2 (http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html) was utilized to perform differential expression analysis. A read count matrix and a sample information table are required as input data for the differential expression analysis. The sample names in the count matrix and the sample names in the information table must be in the same order. Check the example data provided by LIRBase for the format of a sample information table.

Figure 16. The DESeq submenu under the Expression menu of LIRBase to perform differential expression analysis of LIRs/sRNAs.

The results of DESeq2 can be downloaded as a plain text file or can be viewed in a data table (Figure 16). In addition, the MA-plot and the volcano plot showing the identified differentially expressed LIRs/sRNAs are also generated. A heatmap displaying the sample-to-sample distances can be viewed by clicking the button at the bottom of the sidebar panel under the DESeq submenu.

8. Predict mRNA targets of small RNAs encoded by a LIR

An analysis module was implemented to predict the mRNA targets of small RNAs encoded by a LIR through the detection of complementary matches between small RNAs and the cDNA sequence of protein-coding genes. The input data should be all the small RNAs encoded by a LIR in FASTA format or sequences only (Figure 17). Then the small RNA sequences can be aligned to the cDNA sequences of a specific genome by Bowtie. The alignments were processed to identify complementary matches between small RNAs and the cDNA sequences. An example output is shown in Figure 17.

Figure 17. The Target menu of LIRBase.

9. Predict and visualize the secondary structure of the potential hpRNA encoded by a LIR

We utilized the RNAfold software to predict and visualize the secondary structure of the potential hpRNA encoded by a LIR (Figure 18). The input data should be the DNA sequence of a single LIR. The secondary structure in dot-bracket notation is displayed in the main panel. The secondary structure in PNG image can be viewed by clicking the button on the top of the main panel. High-resolution image of the predicted secondary structure can be downloaded as PDF files.

Figure 18. The Visualize menu of LIRBase.

10. Information of 427 genomes collected in LIRBase

The information of 427 genomes collected in LIRBase is displayed in the Download menu of LIRBase (Figure 19).

Figure 19. The Download menu of LIRBase.

11. Download LIRs identified in 427 eukaryotic genomes, and the corresponding BLAST/Bowtie index database

In addition to be used online at https://venyao.xyz/lirbase/, LIRBase can be deployed on a personal local or web Linux server. Deployment of LIRBase is platform independent, i.e., LIRBase can be deployed on any platform with the R environment available. The detailed steps are described in the Installation submenu under the Help menu of LIRBase (Figure 20).

Figure 20. The Installation submenu under the Help menu of LIRBase.

The source code of LIRBase is deposited in GitHub (https://github.com/venyao/LIRBase). As the file size of identified LIRs and the corresponding BLAST/Bowtie databases of the 427 eukaryotic genomes are too large, these datasets were not deposited in GitHub. Instead, these data can be downloaded from https://venyao.xyz/lirbase/ through the Download menu (Figure 21).

Figure 21. The Download menu of LIRBase.

12. About LIR and LIRBase

The definition of long inverted repeat, the biogenesis pathway of siRNAs from long inverted repeat and the biological roles of siRNAs generated in this pathway are elaborated in the About submenu under the Help menu of LIRBase (Figure 22). These results implied that a platform for comprehensive annotation and analysis of siRNAs derived from long inverted repeat is in urgent need.

Figure 22. The About submenu under the Help menu of LIRBase.

LIRBase

A total of 427 eukaryote genomes were collected and the long inverted repeats (LIR, longer than 800 nt) in these genomes were systematically identified. The following functionalities are implemented in LIRBase.

Browse LIRs identified in 427 eukaryotic genomes for the sequences, structures of LIRs, and the overlaps between LIRs and genes.

Search LIRBase for LIRs in a specific genome by genomic locations.

Search LIRBase for LIRs in a specific genome by the identifiers of LIRs.

Search LIRBase by sequence similarity using BLAST.

Detect and annotate LIRs in user-uploaded DNA sequences.

Align small RNA sequencing data to LIRs of a specific genome to detect the origination of small RNAs from LIRs and quantify the expression level of small RNAs and LIRs.

Perform differential expression analysis of LIRs or small RNAs between different biological samples/tissues.

Identify protein-coding genes targeted by the small RNAs derived from a LIR through detecting the complementary matches between small RNAs and the cDNA sequence of protein-coding genes.

Predict and visualize the secondary structure of potential hpRNA encoded by a LIR using RNAfold.

Use LIRBase online

LIRBase is deployed at venyao.xyz/lirbase/ for online use.

Deploy LIRBase on local or web Linux server

Step 1: Install R

Please check CRAN (cran.r-project.org) for the installation of R.

Step 2: Install the R Shiny package and other packages required by LIRBase

Start an R session and run these lines in R:

# try an http CRAN mirror if https CRAN mirror doesn't work  
install.packages("data.table")
install.packages("DT")
install.packages("ggplot2")
install.packages("grid")
install.packages("gridExtra")
install.packages("htmlwidgets")
install.packages("pheatmap")
install.packages("RColorBrewer")
install.packages("shiny")
install.packages("shinyBS")
install.packages("shinycssloaders")
install.packages("shinydashboard")
install.packages("shinydisconnect")
install.packages("shinyjqui")
install.packages("shinyWidgets")
install.packages("stringr")
install.packages("tidyr")
install.packages("dplyr")
install.packages("XML")

install.packages("BiocManager")
BiocManager::install("apeglm")
BiocManager::install("Biostrings")
BiocManager::install("DESeq2")
BiocManager::install("GenomicRanges")

# install shinysky
install.packages("devtools")
devtools::install_github("venyao/ShinySky", force=TRUE)

For more information, please check the following pages:
cran.r-project.org/web/packages/shiny/index.html
github.com/rstudio/shiny
shiny.rstudio.com

Step 3: Install Shiny-Server

Please check the following pages for the installation of shiny-server.
rstudio.com/products/shiny/download-server/
github.com/rstudio/shiny-server/wiki/Building-Shiny-Server-from-Source

Step 4: Install BLAST+

Download and install BLAST+ on your system PATH. Check opensource.com/article/17/6/set-path-linux for the setting of system PATH in Linux.
Please check blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download for the download and installation of BLAST+.

Step 5: Install Bowtie

Download and install Bowtie on your system PATH. Check opensource.com/article/17/6/set-path-linux for the setting of system PATH in Linux.
Please check bowtie-bio.sourceforge.net/index.shtml and github.com/BenLangmead/bowtie for the download and installation of Bowtie.

Step 6: Upload source files of LIRBase

Put the directory containing the code and data of LIRBase to /srv/shiny-server.
The BLASTN database files downloaded from the Data menu of LIRBase should be placed in the LIRBase_blastdb directory under the www directory of LIRBase.
The downloaded Bowtie index files downloaded from the Data menu of LIRBase should be placed in the LIRBase_bowtiedb directory under the www directory of LIRBase.
The downloaded Inverted_repeat_structure files downloaded from the Data menu of LIRBase should be placed in the Table directory under the www directory of LIRBase.
The downloaded Inverted_repeat_sequence files downloaded from the Data menu of LIRBase should be placed in the Fasta directory under the www directory of LIRBase.
The downloaded IRF_stem_alignment files downloaded from the Data menu of LIRBase should be placed in the HTML directory under the www directory of LIRBase.

Step 7: Configure shiny server (/etc/shiny-server/shiny-server.conf)

# Define the user to spawn R Shiny processes
run_as shiny;

# Define a top-level server which will listen on a port
server {  
  # Use port 3838  
  listen 3838;  
  # Define the location available at the base URL  
  location /lirbase {  
    # Directory containing the code and data of LIRBase  
    app_dir /srv/shiny-server/LIRBase;  
    # Directory to store the log files  
    log_dir /var/log/shiny-server;  
  }  
}

Step 8: Change the owner of the LIRBase directory

$ chown -R shiny /srv/shiny-server/LIRBase

Step 9: Start Shiny-Server

$ start shiny-server

Now, the LIRBase app is available at http://IPAddressOfTheServer:3838/LIRBase/ (Remeber to replace IPAddressOfTheServer as the actual IP address of your Linux server).

LIRBase

Source code: github.com/venyao/LIRBase
Online use: venyao.xyz/lirbase/
Contact: yaowen@henau.edu.cn

IRF: Inverted Repeats Finder, a program to identify inverted repeats in the genome.
P-MITE: Plant miniature inverted-repeat transposable elements (MITEs) Database.
Gramene: a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species.
Ensembl: a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation.
Ensembl Metazoa: a genome browser for metazoa genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation.
Bowtie: an ultrafast, memory-efficient short read aligner.
DESeq2: Differential gene expression analysis based on the negative binomial distribution.
ViennaRNA: stand-alone programs for the prediction and comparison of RNA secondary structures.
RNAfold web server: predict secondary structures of single stranded RNA or DNA sequences online.
miRBase: the microRNA database.
psRNATarget: a plant small RNA target analysis server.
ASRD: The Arabidopsis Small RNA Database.

Wen Yao, PhD, Professor

College of Life Sciences
Henan Agricultural University
Zhengzhou 450002, China

yaowen (AT) henau.edu.cn
venyao (AT) qq.com

Agricultural Road No. 63 (450002), Zhengzhou, Henan, China

Zhang Zhang, PhD, Professor

Associate Director of National Genomics Data Center (NGDC)
China National Center for Bioinformation (CNCB)
Beijing Institute of Genomics (BIG)
Chinese Academy of Sciences (CAS)

zhangzhang (AT) big.ac.cn

No.1 Beichen West Road, Chaoyang District, Beijing 100101, China

LIRBase: a comprehensive collection of long inverted repeats in eukaryotes

Statistics

6,789,791

427

374

LIRs in 77 invertebrate metazoa genomes

LIRs in 142 plant genomes

LIRs in 208 vertebrate genomes

Functionalities of LIRBase

Browse LIRBase by species/genomes

Search LIRBase by genomic locations

Search LIRBase by the identifiers of LIRs

Search LIRBase by sequence similarity using BLAST

Detect and annotate long inverted repeats in user-uploaded DNA sequences

Identify candidate LIRs encoding long hpRNAs by aligning sRNA sequencing data to LIRs

Differential expression analysis of LIRs or small RNAs between different biological samples/tissues

Identify protein-coding genes targeted by the small RNAs derived from a LIR

Predict and visualize the secondary structure of potential long hpRNA encoded by a LIR

Overlaps between the selected LIR and genes

Sequence of the selected LIR

Structure of the selected LIR

Overlaps between the selected LIR and genes

Sequence of the selected LIR

Structure of the selected LIR

Overlaps between the selected LIR and genes

Sequence of the selected LIR

Structure of the selected LIR

Overlaps between the selected LIR and genes

Sequence of the selected LIR

Structure of the selected LIR

Overlaps between the selected LIR and genes

Sequence of the selected LIR

Structure of the selected LIR

Input sequence

Upload file

Overlaps between the selected LIR and genes

Sequence of the selected LIR

Structure of the selected LIR

Input sequence

Upload file

Sequence of the selected LIR

Structure of the selected LIR

Paste sRNA read count

Upload sRNA read count file

Expression of sRNAs aligned to the LIR

Size of all sRNAs aligned to the LIR

Size of all sRNA reads aligned to the LIR

Overlaps between the selected LIR and genes

Sequence of the selected LIR

Structure of the selected LIR

Small RNAs derived from the selected LIR

Input count matrix

Upload file

Input sample information table

Upload file

MA plot

Volcano plot

Sample-to-sample distance plot

View LIR secondary structure in PNG format

Information of 427 genomes

Annotated LIRs of 427 genomes

BLASTN database

Bowtie database

Table of contents

1. Browse LIRBase for long inverted repeats identified in 427 eukaryotic genomes

2. Search LIRBase by genomic locations

3. Search LIRBase by the identifiers of LIRs

4. Search LIRBase using BLAST

5. Annotate long inverted repeats in user-uploaded DNA sequences

6. Identify candidate LIRs encoding long hpRNAs by aligning sRNA sequencing data to LIRs

7. Differential expression analysis of long inverted repeats and small RNAs

8. Predict mRNA targets of small RNAs encoded by a LIR

9. Predict and visualize the secondary structure of the potential hpRNA encoded by a LIR

10. Information of 427 genomes collected in LIRBase

11. Download LIRs identified in 427 eukaryotic genomes, and the corresponding BLAST/Bowtie index database

12. About LIR and LIRBase

LIRBase

Use LIRBase online

Deploy LIRBase on local or web Linux server

LIRBase