mGene.web Disclaimer

Dear mGene User,

mGene.web is a web interface to mGene based on the Galaxy-framework [3]. It is based on code similar to that released under GPL at http://mgene.org/download. This software is the result of further development of mGene as described in [1].

The original software [1] has been improved with respect to making it easier to use, more stable and applicable to a broad range of organisms. However, there are still a few limitations/issues of mGene in general (see also here) and mGene.web in particular:

For well-annotated genomes, extensive annotations exist. In these cases, we have to limit the amount of data used for training the system (due to the relatively high memory and computing time demand). In these cases, we sub-sample the data and may therefore obtain sub-optimal results compared with including all data.
mGene.web currently works best for a few well-annotated regions of a genome with chromosomes/contigs of size <20 Mb, including at least a few hundred and at most a few thousand genes. For longer contigs mGene.web may fail due to some technical limitations (the current version of Octave has a 2Gb variable size limit and the cluster nodes only have 8Gb of RAM).
Currently, the speed of the system is severely influenced by the number of chromosomes/contigs, as each one is processed separately. We therefore do not recommend gene prediction on draft assemblies with hundreds or thousands of contigs.
Moreover, there are a few built-in settings that are more suitable for organisms with compact genomes, for instance, that introns have to be shorter than 20 kb. We intend to make these options configurable in the near future.
For technical reasons, mGene.web does not support model selection: All models are trained with preset hyper-parameters. Model-selection can significantly improve the results of the predictions.
Currently, mGene.web does not support prediction tuning: There are several ways to trade-off sensitivity with specificity in gene prediction. However, right now these expert options are not supported by the web service.
Reading genes for training from GFF3 files is tricky. We only accept a gene's annotation, if it satisfies certain, quite stringent conditions, leading to smaller sets for training. In some cases we try to infer missing information, like the open reading frame (which can easily be wrong). Training on poor-quality input data generally leads to a poor prediction performance.
Currently, mGene uses regions around annotated genes for training the gene predictor, where we use some heuristic to determine where to cut intergenic regions. (These regions are assembled into blocks for training.) This strategies generally works reasonably well (even for incomplete annotations that lack many genes), but is inferior to using fully annotated regions (as done in [1]). There will be more options to control this behaviour in future versions.

We are working towards solving these issues, to make the software and the webservice more stable and to include more features into mGene.

If you are interested in using mGene.web for annotating an organism's genome or for comparing it with other predictions, please let us know (support@mgene.org or Gunnar.Raetsch@tuebingen.mpg.de). We would be glad to assist you to obtain the optimal results using mGene.

The mGene Development Team

References:

[1]	(1, 2, 3) Schweikert et al. mGene: Accurate Computational Gene Finding with Application to Nematode Genomes. Submitted to Genome Research 2009.

[2]	Schweikert et al. mGene.web: a web service for accurate computational gene finding. Nucleic Acids Research 2009.

[3]	Giardine B, Riemer C, Hardison R, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res. (2005) 15:1451–1455.

cBio@MSKCC

Personal tools

mGene.web Disclaimer

Document Actions