RiboGap

RiboGap is a database related web server that allows novice and skilled users to find intergenic sequences (IGR) including noncoding RNA and terminators in the entire genome of prokaryotes. Intergenic sequence : region presents beteween two coding sequences including 5' and 3' regions, untranslated regions (UTR), introns. Such region can or can not contain regulatory elements


Search Panel Information

RiboGap is a relational database based on MySQL language with many selectable boxes (display) for the users. It offers useful utilities in addition to finding intergenomic sequences.

RiboGap interface


The website presents two versions, which are the default and the advanced versions.

Default version contains 36 selectable boxes. For example one might search for the existence of non coding RNAs, such as known riboswitches, in an intergenic sequence; or find all possible operons which follow a intergenic sequences corresponding to a defined set of criteria. It is also possible to look for specific keywords in coding sequence descriptions and extracting the intergenic sequences of the corresponding genes. This search is done by checking the boxese of interest (some illustrations are given bellow).

Advanced version that is more exhaustive, contains 67 selectable boxes. In addition to the boxes, one might enter its own queries using MySQL language in the text area (some examples are given bellow).

Display

cdd: stands for conserved domain database from NCBI conserved domain, (most known protein domains). Note that one protein can have many domains.

cds: stands for protein coding sequence from NCBI microbe

fragment: stands for chromosome information (or plasmid, a "fragment" of DNA) from NCBI nucleotide and from NCBI taxonomy

gap3: stands for sequence information for 3-prime-UTR (UnTranscribe Region) from NCBI nucleotide

gap5: stands for sequence information for 5-prime-UTR (UnTranscribe Region) from NCBI nucleotide

operon: stands for operon database from Operon DataBase (ODB). Operons are polycistronic mRNA that can have many ORFs.

organism: stands for organism information from NCBI microbe. (Note that although we tried to provide a lot of information, not all organisms have data in this table due to sources not updated).

rna_family: stands for family of RNA from Rfam. An RNA family is a group of RNA sequences that share a common ancestor. RNA families with common general function e.g. miRNAs, riboswitches, tRNAs.

rna_known: stands for known RNA from Rfam. An RNA corresponds to a unique RNA found in a given sequence at a given position. RNAs often have a 3', (and) a 5' single-strand sequence. Rfam is a database dedicated for non coding RNA families.

Conditions :
- 67 selectable parameters (see RiboGap interface: boxes area)
- Comparison operator

Character string:

find some pathern : Equal the pattern lysin*. The '*' character represents any string and the character '_' represents any single character. ie: lysin* could be lysin, lysine, lysination.

all except : Not equal the pattern

is null : NULL values represent missing unknown data

is not null : values must be present

Number:

>= : Greater than or equal to

<= : Less than or equal to

REGEXP : Equal the regular expression like '[[:<:]]Zn[[:>:]]|zinc' or not equal the regular expression. More information about the regular expressions here

- Logical operator
There are two logical operators, AND operator, OR operator.

result_number :
The number of (results) hits displayed. It is possible to choose between 200, 500, 1000, 5000 or 10000 results.

Email :
Since execution of some queries need more time therefore result could be send via Email.


RiboGap interface


.
.
.




Note that RiboGap relies on annotation from different databases (NCBI, Rfam, ODB) and thus will also include any misannotation found within the original datasource. Therefore, one should be carefull not to consider all results from queries as equivalent to experimentally confirmed results. Very wide ranging queries for numerous different genes on multiple genomes are likely to include misannotated features within corresponding sequences.




Examples of Search



Example #1 for finding the organisms which have the lysine riboswitch. For this search, the following boxes are selected :

Select a field
fragment category selected the following field:
fragment

organism category selected the following field:
organism

rna_family category selected the following boxes:
type
description

rna_known category selected the following boxes:
start
end
strand

Condition :
rna_family category selected the following boxes:
description
Equal the pattern
lysin*

The table of result is :



Example #2 for finding intergenic sequences for particular gene (eg calcium-related genes E.coli NC_000913.3). For this search, the following boxes are selected :

Select a field
cds category selected the following field:
product
fragment category selected the following boxes:
fragment
gap5 category selected the following boxes:
start
end
strand
sequence

Condition :
fragment Equal the pattern NC_000913 AND
product REGEXP (Calcium) AND
sequence is not null

The table of result is :



Note that in the example shown above, the entire sequence is not shown.

Examples #3 for finding very quickly all the intergenic sequences for one or several microorganisms. Here we show three examples for illustration.

Example #3a for finding if there is a regulatory element in this intergenic sequence (ie TPP riboswitch). For this search, the following boxes are selected :

Select a field
fragment category selected the following boxes:
fragment
gap5 category selected the following boxes:
start
end
rna_family category selected the following boxes:
description

Condition :
fragment Equal the pattern NC_000913 AND
description Equal the pattern TPP

The table of result is :



Example #3b for finding the position of our regulatory elements. For this search, the following boxes are selected :

Select a field
fragment category selected the following boxes:
fragment
gap5 category selected the following boxes:
start
end
rna_family category selected the following boxes:
rna_known category selected the following boxes:
start
end

Condition :
fragment Equal the pattern NC_000913 AND
description Equal the pattern TPP

The table of result is :



Example #3c for finding how many operons could be regulated by this element. For this search, the following boxes are selected :

Select a field
fragment category selected the following boxes:
fragment
gap5 category selected the following boxes:
start
end
operon category selected the following boxes:
description
rna_family category selected the following boxes:
rna_known category selected the following boxes:
start
end

Condition :
fragment Equal the pattern NC_000913 AND
description Equal the pattern TPP

The table of result is :



Example #4 for finding genes with no IGR but when sequences are partially overlapping and the size of the gap is negative. For this search, the following boxes are selected :

Select a field
cds category selected the following boxes:
gene
start
end
strand
fragment category selected the following boxes:
fragment
gap5 category selected the following boxes:
start
end
strand
size

Condition :
fragment Equal the pattern NC_000913 AND
size <= ''

The table of result is :