Index of /genome/Daphnia_pulex/dpulex_genes2017/genes

      Name                          Last modified       Size  Description

[DIR] Parent Directory 27-Nov-2017 23:01 - [   ] daphplx17evigenes.aa.gz 14-Nov-2017 13:15 21.7M [   ] daphplx17evigenes.attr.txt.gz 21-Nov-2017 15:26 10.8M [   ] daphplx17evigenes.cds.gz 14-Nov-2017 13:14 62.9M [TXT] daphplx17evigenes.cds.qual 14-Nov-2017 13:21 13.2M [   ] daphplx17evigenes.gff.gz 14-Nov-2017 13:19 41.9M [   ] daphplx17evigenes.mrna.gz 14-Nov-2017 13:15 94.2M [TXT] daphplx17evigenes.mrna.ids 14-Nov-2017 13:10 2.0M [TXT] daphplx17evigenes.mrna.qual 14-Nov-2017 13:11 13.2M [TXT] daphplx17evigenes_summary.txt 28-Nov-2017 16:34 12k [TXT] genesearch.html 12-Nov-2017 22:33 4k


Daphnia pulex gene set 2017
http://eugenes.org/EvidentialGene/daphnia/daphnia_pulex/daphnia_pulex_genes2017/
public candidate: evg5daplx/map5ann/dpx7finp    1 Nov 2017

This daphnia_pulex_genes2017/genes/daphplx17evigenes_summary.txt
Compare to eugenes.org/EvidentialGene/daphnia/daphnia_magna/Genes/evg7finloc9_sum.txt
See also eugenes.org/EvidentialGene/about/evigene_plantsanimals_2017.txt

TABLE G1.  Daphnia pulex Gene set numbers, version Daplx7pEVm, 1 Nov 2017
           Dmag: values are daphnia_magna equivalent statistic
---------------------------------------------------------------------------
30081 gene loci, all supported by RNA-seq and/or protein homology evidence
  28498 (95%) are protein coding, 1583 are non-coding
  Dmag: 29121 pc loci
  1030 additional transposon loci are not counted as gene loci (have TE domains of CDD &/or DFam)
  2085 additional fragments are not counted as gene loci, some are unattached exons of nearby genes
  
27575 (92%) have RNA assemblies, 2506 are genome-modelled 
Dmag: 26825 (92%) are RNA assemblies, 2296 are genome-modelled 

28820 (96%) of loci have uniquely mapped RNA-seq, 99% have mapped RNA-seq
24693 of loci have >= 90% read coverage, 26900 have >= 60%, 1280 have < 20% read cover. 
26751 (98%) of loci have valid RNA introns, of 27362 with model introns and genome mapping
Of 189710 RNA-valid introns, 127921 (67.4%) are recovered in gene transcripts.

22785/28498 (80%) have homology to other species (blastp e<=1e-5 to proteins or conserved domains),
   17331 (60%) have conserved coding seq across Daphnia species (sig. Ka/Ks)
   12440 (44%) are orthologs to other species (OrthoMCL clustering), 
    4475 (16%) are inparalogs of orthologs (16915,59% ortholog or inparalog, of pc loci),  
    7968 (28%) have ortholog only in other Daphnia spp.
    8947 (31%) have non-daphnid ortholog
    ---
   11011 (39%) are species-unique (in OrthoMCL clusters)
    5298 of these have significant other-species homology
    5713 of these lack other-species homology, 2382 of these have D.pulex paralog
Dmag: 22063 (76%) have homology to other species
Dmag:   11770 (40%) are orthologs to other species, 4535 (16%) are inparalogs
Dmag:    5170 (18%) have homology only to other Daphnia
    
76257 alternate transcripts are at 15420 (51%) loci, with 4 median, 5.9 ave, transcripts per locus,
  including 185 noncoding loci, with 142 alts maximum, at hv-DSCAM locus, 
   26 loci have 50+ alts, 2390 have 10+ alts, 
Dmag: 84898 alternate transcripts are at 17473 loci (60%), ave. 5 transcripts per locus,
Dmag:  DSCAM has 123 alts, 56 loci have 50+, 2496 have 10+ alts.

25041 (88%) have complete proteins, 3416 have partial proteins, of 28498 PC genes

28516 (95%) are properly mapped to Dpx.2017 chromosome assembly (>=80% align, no splits), 
   1430 partial-mapped coverage ( 10% < align <80%), 
    127 are ~un-mapped genes ( align < 10% ), 46 loci have split-scaffold mapping 

21848 (73%) are properly mapped to Dpx.2007 chromosome assembly (>=80% align, no splits), 
    and 26830 (90%) align >= 50% (genes were validated on  Dpx.2017 chromosome assembly)

2231/29593 (7%) are single-exon loci of those mapping >= 50% to genome.
Dmag: 2860/20558 (14%) are single-exon loci, for good mapping (fewer w/ good Dmag asm map)

2370 loci have antisense or mixed sense mapping, including putative trans-spliced loci. 
Dmag: ~1000 with antisense or mixed sense (putative trans-splicing)
  Ortholog loci between species conserve bio-directional transcription,
  indicating they are mostly gene biology rather than artifacts.
  See daphnia_pulex_genes2017/docs/daphnia_pulex_magna_bidirectional_genes.html

Gene locus IDs: Daplx7pEVm000001t1 .. Daplx7pEVm065087t1,
Alternate transcripts have ID suffix t2 .. t100.
--------------------------------------------------------

TABLE G2. Gene orthology categories (using OrthoMCL)
         ---------- GENES ---------------------   ------ GROUPS ---- 
         nGene  Orth1   Ordup   Uniq1   UDup      OrGrp OrMis1 UniGrp
         --------------------------------------   ------------------  
  Crustacea (Daphnia only)          
daphmag 28116   10632   7000    9346    1138      13056   8    415
daphplx 28496   10924   6561    6967    4044      12877   24   1107
daphgal 34529   10320   6232    12437   5540      12237   213  1262
daphsim 38523   11175   3209    21455   2684      12343   53   768
  Insects
tribcas 12861   7411    2241    1856    1353      8044    90   334
bemtab  13901   7058    2113    2472    2258      7731    181  523
drosmel 13916   6536    2202    3601    1577      7226    291  451
  Fish
zefish  26247   10172   10755   3549    1771      13279   119  326
guppy   22914   10380   8554    3010    970       13239   38   229
human   39357   7535    13463   12252   6107      12092   51   2099
--------------------------------------------------------------------
 source aaeval/omcldap/daph10_omcl/daph10omcla-orthomcl-gclass.tab
  daphmag = daphnia magna 2014, daphplx = daphnia pulex 2017,
  daphgal = daphnia galeata 2016, daphsim = daphnia similoides(sp) 2017,
  beetlet = tribolium cast., fruitfly = drososophila mel., white fly = bemisia tabaci(?)
  guppy fish = Poe..., zfish = Danio rerio
Key:
  inGene = count of input genes, excludes alternate isoforms/locus.
  Orth1 = single copy orthologous genes,
  Ordup = multi-copy old-ortholog genes (one-to-one matches among multicopies),
  Inpara= Inparalogs (recent ortholog duplicates) of orthologous genes
  Uniq1, UDup  = single-copy and duplicated species-unique genes
  OrMis1= groups missing in species that all other species have
  OrGrp, UniqGrp = orthologous and species-unique groups
--------------------------------------------------------------------


TABLE G3.  Water flea Daphnia pulex gene sets compared

 Daphnia magna  REFERENCE (nr=29127)
      Evigene17 Maker17  Evigene10b
 found  72.6%    59.6%    71.4% 
 align  91.3%    70.8%    87.3%
 tiny    0.2%    23.8%     1.5%
 best   75.4%     3.1%     --     equal  21.3%        
 
 Drosophila mel. REFERENCE (nr=10902)
      Evigene17 Maker17  Evigene10b
 found  71.4%    68.3%    70.6% 
 align  81.9%    74.6%    78.5%
 tiny    0.8%     6.5%     2.2%
 best   65.4%     5.2%     --     equal  29.2%        
 
 Highly conserved Drosophila REFERENCE (BUSCO subset, nr=3038)
      Evigene17 Maker17  Evigene10b
 found  98.1%    94.6%    97.4% 
 align  84.0%    76.7%    80.5%
 tiny    0.4%     5.6%     1.9%
 best   63.7%     4.7%     --     equal  31.5%        
 ----------------------------------------------

 Intron recovery for Daphnia pulex gene sets (ni=189710 of RNA-seq mapped to chrs)
             RNA-Introns
 Geneset      Found%  GeneTr validExons
 DpEvigene17   67.4    95855   127921 
 DpMaker17     46.7    18296    88747 
 DpEvigene10   42.9    34175    81455 
 ----------------------------------------------

Daphnia pulex gene sets  
DpEvigene17 Dapplx Evigene17 of 2017 from 
  http://arthropods.eugenes.org/EvidentialGene/daphnia/daphnia_pulex/daphnia_pulex_genes2017/
DpEvigene10 Dap.plx Evigene10b of 2010 from wfleabase.org and
  http://arthropods.eugenes.org/EvidentialGene/daphnia/daphnia_pulex/daphnia_pulex_genes2010/
DpMaker17 Dap.plx Maker17 genes of 2017 from report of doi:10.1534/g3.116.038638 
 "A New Reference Genome Assembly for the Microcrustacean Daphnia pulex"

Reference genes:
  Daphnia magna water flea,  Evigene 2015 set, primary isoforms n=29127
  Drosophila melanogaster fruit fly, NCBI RefSeq 2015, total primary isoforms n=13828
  Conserved Fruit fly, NCBI RefSeq 1-copy genes identified by BUSCO, total n=3055

Methods: BLASTP -query reference.aa -db dapplx_twogenesets.aa -evalue 1e-5
Statistics: 
  Found  = percent of reference genes with signif. align in target gene set
  AlignF = average % alignment to found reference genes (align-aa/ref-aa)
  Tiny   = target genes with size < 50% of reference length, of found genes
  Best   = which target set has longest alignment per ref gene, of found genes
  Introns Found% = percent of evidence introns aligned to gene set exons,
       intron evidence from Illumina RNA-seq mapped to chromosome assemblies

Intron Methods and Statistics  
  map RNA-seq (Illumina) to chromosome assembly with GSNAP, 
  extract splice-mapped reads and their intron locations, 
  tabulate gene-exon x rna-intron matches.
Statistics
  GeneTr  = gene transcripts total in gene set
  valExon = gene exons w/ validated intron
  InFound% = percent of all valid introns recovered b/n gene exons
===============================

What loci differ among gene sets? homology effects
Homology class x has-othergene-maker count cross table

Hclass  Miss%  NoMkr    Maker17
------------------------------
CDD     24%     3759    11876    Conserved Domain Database of proteins
daphgal 13%     1625    10442    Daphnia
daphmag 25%     4250    12251    Daphnia
daphsim 11%     1375    10667    Daphnia
bemtab  13%     1185    7288     Whitefly
tribcas 9%      757     7464     Beetle
drosmel 6%      635     8751     Fruitfly
zefish  2%      152     5819     Fish
guppy   2%      147     5926     Fish
human   2%      150     5904     
inpar   38%     1743    2732     Inparalogs, with less recent ortholog above
unipar  61%     1619    1030     Species-unique paralog
unique  70%     3195    1357     Species-uniq
------------------------------

Hclass  Miss%  NoEv10   Evigene10
------------------------------
CDD     3%      522     15113   Conserved Domain Database of proteins
daphgal 2%      341     11726   Daphnia
daphmag 4%      714     15787   Daphnia
daphsim 2%      253     11789   Daphnia
bemtab  1%      150     8323    Whitefly
tribcas 1%      103     8118    Beetle
drosmel 1%      95      9291    Fruitfly
zefish  0%      34      5937    Fish
guppy   0%      28      6045    Fish
human   0%      32      6022   
inpar   7%      315     4160    Inparalogs, with less recent ortholog above
unipar  31%     838     1811    Species-unique paralog
unique  7%      363     4189    Species-uniq
------------------------------
Counts of othergene model(s) presence per Evigene locus, not equivalence (ie may differ much)

The above indicate genes of Daphnia clade differ in reconstruction
by methods, as well as conserved protein domains.  Also paralogs, 
both in- and unique-, are less fully reconstructed by maker prediction 
method.  As well as these, gene reconstruction differs for well-conserved orthologs, 
of all size ranges (and complexity, below).  

One special, large difference is alternate transcripts: none are presented
with maker models.  However these often have greater homology to other species
than the longest, or single modeled form.  Also important, the alternate
transcripts help much in validating gene loci, as they form partly independent
reconstructions of the same locus, and highlight exon-intron usage patterns 
consistent with complete gene loci.

What loci differ among gene sets? large vs small genes?

A. Daphnia magna aligns, A = Dp17Maker, B= Dp17Evigene
                Asmall  Bsmall          Amiss   Bmiss
  1000 largest,  923,   31,  46 equal ;  148       0   (ref 22000 aa .. 1300 aa)
  1000 middle,   827,   42, 131 equal ;  130       0   (ref ~400 aa)
  1000 smallest, 694,   65, 241 equal ;  330       8   (ref ~150 aa)

B1. Dros mel aligns, A = Dp17Maker, B= Dp17Evigene
                Asmall  Bsmall          Amiss   Bmiss
  1000 largest,  775,  79,  146 equal ;   31      1   (ref 15638 aa .. 1100 aa)
  1000 middle,   779,  74,  147 equal ;   38      2   (ref ~500 aa)
  1000 smallest, 555,  81,  364 equal ;   50      6   (ref ~200 aa)

All Dros.mel sizes are consistently poorer aligned to Dp17Maker vs Dp17Evigene, 
by ratios of Dp17mk 75% (700-800/1000), to Dp17evg 7% (70-80/1000), to 18% (150-200) equal
Dp17mk misses 31/1000 (67/2000) largest, vs 1/1000 (2/2000) for evg, ditto for shorter.
----------------------------------------------------------------------------------------