The Daphnia pulex official gene set is dpulex_jgi060905_JGI_V11
Name Last modified Size Description
Parent Directory 09-Dec-2008 10:03 -
About.txt 09-Dec-2009 15:47 6k
dpulex1_JGI_V11_annotatedgene.description_count.txt 07-Nov-2007 14:32 71k
dpulex1_JGI_V11_annotatedgene.function_count.txt 07-Nov-2007 14:32 42k
dpulex1_JGI_V11_annotatedgene.gff.gz 07-Nov-2007 14:05 1.3M
dpulex1_JGI_V11_annotatedgene.head 30-Aug-2007 17:05 1k
dpulex1_gnomon_annotatedgene.description_count.txt 28-Aug-2007 13:28 37k
dpulex1_gnomon_annotatedgene.flat.gz 02-Sep-2007 18:41 2.9M
dpulex1_gnomon_annotatedgene.function_count.txt 28-Aug-2007 13:28 127k
dpulex1_gnomon_annotatedgene.gff.gz 02-Sep-2007 16:27 2.9M
dpulex1_gnomon_annotatedgene.head 02-Sep-2007 18:40 2k
dpulex1_gnomon_annotatedgene.ugp.xml.gz 02-Sep-2007 22:53 3.8M
dpulex1_gnomon_go.tab 08-Aug-2007 14:04 197k
dpulex1_gnomon_paralog.tab 08-Aug-2007 14:05 371k
dpulex1_gnomon_paralog_mcl2ids.tab 08-Aug-2007 14:19 355k
dpulex1_gnomon_uniprot.tab 08-Aug-2007 14:05 751k
dpulex_jgi060905_DGIL_SNO.aa.gz 26-Sep-2006 09:34 8.2M
dpulex_jgi060905_DGIL_SNO.gff.gz 26-Sep-2006 09:34 3.4M
dpulex_jgi060905_DGIL_SNO.hmm.gz 25-Sep-2006 10:49 15k
dpulex_jgi060905_DGIL_SNO.tr.gz 26-Sep-2006 09:34 12.8M
dpulex_jgi060905_Gnomon.aa.gz 24-May-2007 15:10 7.1M
dpulex_jgi060905_Gnomon.gff.gz 24-May-2007 15:07 3.9M
dpulex_jgi060905_Gnomon.tr.gz 24-May-2007 15:10 11.7M
dpulex_jgi060905_JGI_FM5.aa.gz 07-Apr-2007 15:08 5.6M
dpulex_jgi060905_JGI_FM5.gff.gz 07-Apr-2007 15:02 3.1M
dpulex_jgi060905_JGI_FM5.info 07-Apr-2007 15:19 1k
dpulex_jgi060905_JGI_FM5.tr.gz 07-Apr-2007 15:08 9.2M
dpulex_jgi060905_JGI_V11.aa.gz 27-Jul-2007 15:31 6.0M
dpulex_jgi060905_JGI_V11.gff.gz 27-Jul-2007 15:24 3.1M
dpulex_jgi060905_JGI_V11.head 27-Jul-2007 15:24 1k
dpulex_jgi060905_JGI_V11.tr.gz 27-Jul-2007 15:31 10.0M
dpulex_jgiV11_annot2oneline.perl 07-Nov-2007 14:05 11k
dpulex_jgiV11_annotsubset.gff.gz 06-Oct-2007 13:29 279k
dpulex_jgiV11_annotsubset.txt.gz 06-Oct-2007 13:19 549k
dpulex_jgiV11_annotsubset.xls.gz 20-Sep-2007 10:54 2.5M
functiontable.perl 28-Aug-2007 13:28 4k
gnomon-uniprot-match.tab.gz 08-Aug-2007 14:47 65k
gnomon-uniprot-records.swiss.gz 08-Aug-2007 14:48 9.4M
gnomon-uniprot-summmary.tab.gz 08-Aug-2007 14:47 373k
gnomonstitch.pl 24-May-2007 15:08 19k
The Daphnia pulex official gene set is dpulex_jgi060905_JGI_V11
Annotated gene prediction sets:
dpulex1_JGI_V11_annotatedgene.gff.gz : gene (mRNA-only) features from JGI V11 official preditions
with added annotations from homology,GO functions, expression and cross-reference to Gnomon
dpulex1_gnomon_annotatedgene.gff.gz : gene (mRNA-only) features from Gnomon predictions,
with added Uniprot, GO, Pfam IDs and descriptions, using Gnomon best protein_hit ID.
JGI v1.1 IDs are included (jgi= perfect match, jgiov= overlap matches)
Also protein gene duplicates (paralog=) and tandem gene (tandy=) are identified.
dpulex1_gnomon_annotatedgene.flat.gz : same as .gff data but in key: value lines flat file format
Summary annotation tables
dpulex1_gnomon_go.tab : list of GO, Pfam IDs with gene counts
dpulex1_gnomon_paralog.tab : list of paralog (OrthoMCL, p<=1e-40) IDs, uniprot descript
dpulex1_gnomon_uniprot.tab : list of uniprot IDs, and any assoc. paralogs.
dpulex1_gnomon_paralog_mcl2ids.tab : table of each Gnomon ID, OrthoMCL id (paralog= in gff)
Uniprot source records: gnomon-uniprot-records.swiss.gz, and gnomon-uniprot-summary.tab.
gnomon-uniprot-match.tab has the uniprot.org lookup results, including some no_match
cases (7841 found, 242 missed).
Annotation key for dpulex1_gnomon_annotatedgene.flat (and .gff) files:
# ID: NCBI_GNO_336014 == mRNA gene prediction ID from dpulex_jgi060905_Gnomon.gff
# Location: scaffold_1:173179-177588:+ == location from gff
# Type: mRNA:NCBI_GNO == Type:Source field from gff
# Score: 280.246 == Gene quality score from Gnomon
# Dbxref: == All database IDs from Gnomon, EST and protein matches
# NP_000115.1,CAH70291.1,NP_001074690.1,NP_080815.1,WFes0149391,WFes0149392,WFes0162361
# Note: == Uniprot ID/Accessions/Species/Description from protein_hit
# ERCC6_HUMAN/Q03468,Q5W0L9/Homo sapiens/DNA excision repair protein ERCC-6,ATP-dependent ...
# Ontology_term: == Uniprot GO and Pfam cross-refs
# GO:0003678/F:DNA helicase activity,GO:0005515/F:protein binding,GO:0003702/F:RNA polyme...
# Parent: gene336014 == NCBI gene parent ID (a few alt-transcripts make this non-trivial)
# flags: EST,Prot,Start,Stop == NCBI prediction flags
# (only genes with Start+Stop codons are included)
# jgiov: JGI_V11_231974 == JGI v1.1 gene overlap IDs
# (not perfect match; FIXME: includes trivial overlaps)
# jgi: JGI_V11_nnnn == JGI v1.1 perfect gene match
# paralog: Omcl83,24 == Paralog ID,count from OrthoMCL of blastp of all Gnomon proteins
# maxCDS: 173179 177401 == Gnomon value (?)
# protCDS: 173368 177389 == Gnomon value (?)
# protein_hit: gi|4557565|ref|NP_000115.1| == best protein match from Gnomon pipeline
# tandy: td_s1c2g0 == Tandem genes id (FIXME: needs near/far duplicate flag)
# tilex: 39484 == Genome tiling expression maximum score
Gene prediction set: dpulex_jgi060905_JGI_V11
.gff = feature annotations, locations.
.aa = amino translation (protein)
.tr = transcript
dpulex_jgi060905_JGI_V11_annotgene.gff = annotation of gene features with Gnomon matching gene,
JGI_FM5 matching gene, and tile expression
#species: Daphnia_pulex
#assembly-id: Dappu1 , dpulex_jgi060905
#annotation-group-id: JGI_V11
#algorithm: Filtered gene models as consensus (V11, version 1.1) with supporting evidence (EST, homology)
# of several gene predictors with versions: fgenesh, SNAP, NCBI Gnomon, GeneWise
# and curated gene models
#source: ftp://ftp.jgi-psf.org/pub/JGI_data/Daphnia_pulex/v1.0/FrozenGeneCatalog_2007_07_03.gff.gz
Gene prediction set: dpulex_jgi060905_DGIL_SNO
#species: Dapnia_pulex
#assembly-id: dpulex_jgi060905
#annotation-group-id: DGIL_SNO
#algorithm: DGIL_SNO = SNAP gene predictor + protein homology guidance, version 2006-05-18,
# : SNAP ref=http://www.biomedcentral.com/1471-2105/5/59/abstract
# : bootstrapped HMM predictor from Dmelanogaster.hmm on Dapnia_pulex assembly dna
# : and guided with -xdef Drosophila, Mouse and C.elegans protein gene matches (tblastn)
#authors: gilbertd AT indiana.edu
#more-info: http://wfleabase.org/docs/
#date: 20060926
Gene prediction set: dpulex_jgi060905_JGI_FM5
#species: Daphnia_pulex
#assembly-id: Dappu1 , dpulex_jgi060905
#annotation-group-id: JGI_FM5
#algorithm: Filtered gene models as consensus (FM5) with supporting evidence (EST, homology)
# of several gene predictors with versions: fgenesh, SNAP, GeneWise
#authors: Jeff Boore, Igor Grigoriev, Andrea Aerts and the Joint Genome Institute
#more-info: http://shake.jgi-psf.org/Dappu1/
#date: 20070212
#
# FM5 gene model predictor counts:
# 7063 fgenesh_pg
# 7003 SNAP
# 3727 estExt_fgenesh_pg
# 2472 e_gw
# 1915 estExt_GenewisePlus
# 1357 estExt_Genewise
# 1220 gw
# 295 estExt_fgenesh_pm
# 262 estExt_fgenesh_kg
# 131 fgenesh_pm
# 32 fgenesh_kg
Gene prediction set: dpulex_jgi060905_Gnomon
#species: Daphnia_pulex
#assembly-id: Dappu1,dpulex_jgi060905
#annotation-group-id: NCBI_GNO
#annotation group: NCBI
#authors: Alexander Souvorov, Yuri Kapustin, Boris Kiryutin, Vyacheslav Chetvernin,
# Tatiana Tatusova, Paul Kitts, Victor Sapojnikov and Jim Ostell
#algorithm: Gnomon, http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.html
#date: 20070522
# Dbxref from support=nnn links in gnomon/aligns.gff.gz gnomon/chains_for_annotation.gff.gz
# 37329 gene
# 37466 mRNA