mGene.web: A Web Service for Accurate Computational Gene Finding
mGene.web is a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. mGene.web additionally allows to train the system for other organisms on the push of a button, a functionality that greatly accelerates the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is free of charge, and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).
The web service has been described in [1] and can be found here: http://galaxy.raetschlab.org. mGene.web is an interface to the mGene gene finding system (see http://mgene.org and [3] for more details) that is available for download at http://mgene.org/download.
Main Features
- Simple one-step procedure to train an ab initio gene predictor for a new organism based on a FASTA and a GFF3 (or GTF) file.
- Gene prediction for a growing list of organisms from a given FASTA file using pretrained mGene instances.
- Easy access to the signal predictions, e.g. for splice sites, transcription start sites, etc.
- Integration of externally provided signal or content predictions/tracks into the mGene gene finder.
- High accuracy of mGene's gene and signal predictions.
Getting Started
A number of examples with easy step by step explanations can be found here.
Workflows
mGene.web has a very flexible modular setup. Using the galaxy workflow system we are able to pass this strength to webservice users without complicated and confusing parameterization procedures. User defined pipelines can be build on modules via a graphical workflow editor and can be shared among users. We provide a number of predefined workflows that combine different modules of our system to perform a number of tasks. More information on the different workflows and links for importing them into your galaxy environment can be found here.
Libraries
Using the mechanisms of the Galaxy-framework [2], we offer pretrained signal, content and gene-structure predictors for an increasing number of organisms. To obtain such a classifier please follow the simple steps described here.
These classifiers come with meta-information such as the number of training examples and the performance on an appropriate holdout set.
Prediction Results
We tested the web service tools for a large number of organisms, including:
A detailed list of the results for gene, signal and content predictions can be found here.
Gene State Model
mGene.web is supposed to be applicable to a wide range of organisms. We therefore use a more general state model than in the original version of mGene that was developed for gene predictions in nematodes. Predicting trans-splicing and operons is therefore not supported in the mGene.web model. Also, we excluded poly-A signals, as there is usually no training data available. However, we do model splicing of UTRs. The complete model applied in the GeneTrain tool is depicted in Figure 2.
Contact
In case of comments, problems, questions etc. feel free to contact
References
[1] | Schweikert, G, Behr, J, Zien, A, Zeller, G, Ong, CS, Sonnenburg, S, and Rätsch, G (2009). mGene.web: a web service for accurate computational gene finding. Nucleic Acids Research, Web Server Issue. |
[2] | Giardine B, Riemer C, Hardison R, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res. (2005) 15:1451–1455. |
[3] | Schweikert, G, Zien, A, Zeller, G, Behr, J, Dieterich, C, Ong, CS, Philips, P, De Bona, F, Hartmann, L, Bohlen, A, Krüger, N, Sonnenburg, S, and Rätsch, G. mGene: Accurate Computational Gene Finding with Application to Nematode Genomes. Genome Research, 19, 2133-2143, 2009. |