wFleaBase | BLAST | BioMart | GBrowse Maps | Genomics | Help |
Nine eukaryote proteomes have been aligned to the D. pulex genome, with help from Daphnia Genomics Consortium, TeraGrid and Generic Model Organism Database projects. The D.pulex genome is a 4x preliminary assembly not for public release but provided by Joint Genome Institute to the Daphnia Genomics Consortium. There are some 3804 scaffolds in this prerelease, the first 50 or so are large chunks of chromosomes (1 to 5 megabases of 184 MB genome total). The nine proteomes, with 217,006 total protein sequences, are drawn from organism genome databases, Ensembl and NCBI (see below). Alignment is done using NCBI tBLASTn, with a Grid-aware version of NCBI software developed by Peng Wang at IU, and run on the TeraGrid. The TeraGrid run for this took 12 hours using 64 processors. Blast output is converted to scaffold locations, and displayed for browsing and searching in GMOD GBrowse genome maps. This genome map is available to DGC members at the Daphnia genome database, http://wfleabase.org/prerelease/ (password required). These are sample map views:
Daphnia microsatellites and ESTs, available at http://wfleabase.org/genomics/, are also mapped here. Human and Rice tracks are not shown here, but are similar to Mouse and Arabidopsis, respectively. A copy of the proteomes used are available at ftp://eugenes.org/biomirror/eugenes/proteomes/
ensAG => "Mosquito", "Anopheles_gambiae.MOZ2a.dec.pep.fa.gz", ncbAT => "Mustard_weed", "Arabidopsis_thaliana_NC_003070-76.fa.gz", modCE => "Worm-e", "Caenorhabditis-elegans_WormBase_WS130_protein-reps.fa.gz", ensDR => "Zebrafish", "Danio_rerio.ZFISH4.dec.pep.fa.gz", modDM => "Fruitfly-m", "Drosophila-melanogaster_FlyBase_r4.0_protein-reps.fa.gz", ensHS => "Human", "Homo_sapiens.NCBI35.dec.pep.fa.gz", modMM => "Mouse", "Mus-musculus_MGI_01282005_protein-reps.fa.gz", modOG => "Rice", "Oryza_Gramene_r16.0_protein-reps.fa.gz", modSC => "Yeast-c", "Saccharomyces-cerevisiae_SGD_08272004_protein-reps.fa.gz",
This annotation was performed by Don Gilbert, gilbertd@indiana.edu, July 2005, as part of an assessment of the Teragrid as a shared computational resource for genome database projects under the GMOD umbrella. The Daphnia annotation data are available to DGC members now, and to all at first public release of this genome.
Thanks to Dick Repasky, Peng Wang, George Turner, Stephen Simms and others of the IU Teragrid group (rats@indiana.edu, hpc@indiana.edu), for effective help overcoming various cyberinfrastructure issues, and for providing a version of BLAST that works well with multiple genomes on Grid systems. Also thanks for support to NSF (TeraGrid and DGC support), NIH (GMOD support to D. Gilbert), and DOE (JGI's Daphnia genome data).