Examples
Essentially mGene.web can be used in three different ways: 1) If you simply want it as a black-box tool to get your sequence annotated, you should use the monolithic tools mGeneTrain, mGenePredict and mGeneEval. 2) If you would like to get a little more involved, perhaps inspect intermediate results, like transcription start site predictions and so on, you should use the pre-defined workflows. 3) You might not even be interested in the complete gene finding process, but only in a subtask, then you should have a look at the individual mGene.web Modules. With those you can also build your own workflows. These options are explained in more detail along the lines of a few examples.
Load the Supplied Example Data
- go to http://galaxy.fml.mpg.de
- in the toolbar at the left, open "mGene.web"
- klick on "Examples and Instructions" => the organism "C. elegans" is already pre-selected
- press "execute" to load the example data => three new objects will appear in the object list on the right
- Information about the datasets
- Genome Annotation in GFF3 format
- Genome Sequence in FASTA format
mGeneTrain: Train the Gene Finder
- from "mGene.web" in the toolbar, select "mGeneTrain"
- in the input box (beige, labeled "mGeneTrain") select the FASTA file and the GFF3 annotation from the C. elegans example
- press "execute"
- be patient -- training a full gene finder is a very complex computation and will take a few hours
mGenePredict: Find Genes in Other Genomic DNA
- use "Upload file" in the toolbar to upload a FASTA file
- activate "mGene.web"->"mGenePredict" from the toolbar
- select your FASTA data file and your previously trained mGene predictor and "execute"
Libraries
Via the galaxy-framework we offer pretrained signal, content and gene-structure classifiers for an increasing number of organisms. To obtain such a classifier please follow the following simple steps.
- click on Libraries on the top panel of your galaxy environment => a list of all available libraries is shown
- click on one of the libraries from the list => a list of datasets and classifiers opens
- select a data set of your choice and click on go to import it
Please note that all the classifiers provide additional information once you have uploaded them, e.g.:
`Trained "don" classifier`
- based on 3381 labeled examples (272 positive, 3109 negative)
- using 5-fold cross-validation for model-selection from 1 models (inner cv loop)
- using 5-fold cross-validation for obtaining unbiased predictions (outer cv loop)
`Performance`
- Average area under ROC curve on test splits: 0.995
- Average area under PRC curve on test splits: 0.940
Using pre-defined Workflows
We provide a number of predefined workflows that combine different parts of our system to complete certain tasks. The workflows are described in detail here.
Note: To import one of these workflows you have to login to the galaxy system
To run a pre-defined workflow do the folliwing steps:
- click on User on the top panel of your galaxy environment
- if you are already registered, log in, if not, register
- use one of the links provided here to import one of the workflows
- in the list Workflows shared with you by others you should now find the imported workflow
- click on the respective workflow
- chose the required input data files (if there is nothing to chose from you first need to upload the data, see above)
- click on Run workflow at the bottom of the page
- go back to Analyze Data at the top panel of your galaxy environment
- you should now be able to observe the progress of the called modules.
Changing pre-defined Workflows
- click on the Workflow button in the top panel of the galaxy system
- click on the arrow on the right hand side of your imported workflow and select clone
- click on the arrow of your cloned workflow and select edit
- now the workflow editor opens and you can inspect and modify the workflow
- you might also want to have a look at the individual mGene.web Modules (left panel). they can be added to the workflow by clicking on them.