CisReg: A Benchmarking set of Structured cis-regulatory RNA elements

CisReg- A set of structured cis-regulatory elements located in mRNA sequences, and flanking regions.

Published in Lange and Maticzka et al., "Global or local? Predicting secondary structure and accessibility in mRNAs", Submitted to NAR Oct 2011

CisReg 1.0 data was processed in May 2011 from Rfam 10.0 Data curation by: Sita J. Lange (1) Joshua N. Gagnon (2) Chris M. Brown (2) . It was used for the development of the LocalFold algorithm.

(1) Department of Computer Science, Albert-Ludwigs-University, Freiburg, Germany

(2) Department of Biochemistry and Genetics Otago, University of Otago, Dunedin, New Zealand

Seed families have been carefully selected from the Rfam database (version 10.0) have been filtered to optimise the quality of sequences. The consensus structures for each seed has been mapped to each single sequence in the seed alignment and then folded using the mapped consensus structure as a constraint using RNAfold. Alignment, family, sequence and structure information is saved in separate files.

The RFxxxxx. struct files (e.g. RF00032.struct) are the key file used for benchmarking structure prediction. These files include all the information to evaluate how well the final structure for the element fits to the consensus structure of the family. According to these values, the best candidates can be chosen for the evaluation of structure prediction programs, etc.

Set A. Structured elements with simple secondary structures

mRNA set (3)

RF00031 SECIS Selenocysteine insertion sequence more detail ...

RF00032 Histone3 Histone 3' UTR stem-loop more detail ...

RF00037 IRE Iron response element more detail ...

Genomic set (6)

RF00023 tmRNA transfer-messenger RNA more detail...

RF00036 RRE HIV Rev response element more detail...

RF00038 PrfA PrfA thermoregulator UTR more detail...

RF00040 rne5 RNase E 5' UTR element more detail...

RF00041 Entero_OriR Enteroviral 3' UTR element more detail...

RF00048 Entero_CRE Enterovirus cis-acting replication element more detail...

Set B. Structured elements with more complex secondary structures

Some cis-regulatory elements have qualifiers or subclasses of the following six types most of these were separated into this class.

riboswitch - two conformations are biologically relevant
pseudoknots - Not predicted by many current algorithms
Internal Ribosome Entry Site (IRES)- Large structures most of the structure not verified experimentally. Only a small part of the sequence and structure is required for function.
thermoregulator (5)
frameshift_element (5)
leader (10)

In addition 57 elements were placed in class B as they had the characteristics of riboswitches or pseudoknots but had not been classified as such.

Set B elements (4)

RF00050 FMN FMN riboswitch (RFN element)more detail...

RF00059 TPP TPP riboswitch (THI element)more detail...

RF00061 IRES_HCV Hepatitis C virus internal ribosome entry sitemore detail...

RF00080 yybP-ykoY yybP-ykoY leadermore detail...

Related datasets

Cis-Regulatory elements classified as such by Rfam were listed from here . The full listing of 222 Cis-Regulatory elements is here

BRaliBase I Benchmarking secondary structure prediction algorithms (2004). Contains no cis-regulatory elements but two small RNAs (RNase P and tRNA-Phe)

RNA STRAND two families of cis-regulatory structures

CamparNA proccesed for analysis several 3D structures of cis-regulatory elements (e.g. IRE). A list of CisREg elements with (usually partial) 3D structures is here.

Cmfinder benchmarking used 11 families of Rfam cis-regulatory structures.

Joshua Gagnon, Vlad Kazantsev and Chris Brown