wFleaBase | BLAST | BioMart | GBrowse Maps | Genomics | Help
[?]

Index of /release1/current_release/gene-duplicates

      Name                           Last modified       Size  Description

[DIR] Parent Directory 09-Dec-2008 10:03 - [DIR] prots/ 02-Oct-2008 16:19 - [TXT] dpulex_genedupl_data_info.txt 13-Feb-2008 17:44 4k [TXT] dpulex_tandy6v11_genepairs.txt 01-Aug-2007 17:13 373k [   ] dpulex_tandy6v11.gff.gz 01-Aug-2007 15:54 5.1M

Data supporting tandem duplicate genes in Daphnia pulex.
Please see this summary http://wfleabase.org/genome-summaries/gene-duplicates/

The below table dpulex_tandy6v11_genepairs.txt
of counts and gene IDs are drawn from duplicate genes located by
exon match analyses, in data file dpulex_tandy6v11.gff.gz
The duplicate analysis identifies regions or clusters, 'gene' models within those,
and exon matches.  Nearby and far matches are noted (below).

These data draw exons from JGI Dappu1 release 1.1 gene models, NCBI Gnomon gene models
and DGIL SNAP-homology gene models.

An alternate analysis of protein similiarity (blastp) for the NCBI Gnomon gene models
is in dpulex_jgi060905_Gnomon_full.aa.blastp.gz, summarized in 
dpulex_jgi060905_Gnomon_full.blastp150.idchains (all matching genes at bitscore>=150,
approx. E-value 1e-13).

See here for the exon and  protein analysis details
  http://wfleabase.org/release1/dpulex_jgi060905/gene-duplicates/

Summary table for tandem duplicate genes

           # tandem clusters(1)    # of tandem genes (2)
Fruitfly       119, 71           1,500 / 13,500, 11% (exons ~ proteins)
Nematode       986, 555          3,000 / 20,000, 15% (exons ~ proteins)
Daphnia       1865, 892          5,400 / 27,000, 20% (exons) (4) 
                                 3,900 / 32,000, 13% (proteins)
                                 ^^^^ this near/far needs correction for scaffold size;
                                      find 15% near prots using big daph. scaffolds
                                  
(1) Tandem clusters are counted as 2+ duplicate genes, and
    3+genes in a contigous region with <= 15kb between nearest duplicates,
    based on exon-matched genes There is a published
    count for C.elegans of 400 for 3+ genes using different criteria of
    nearby http://www.wormbook.org/chapters/www_geneduplication/geneduplication.html
    which compares to the 555 counted here.

(2) number of nearby (<=15kb) duplicate genes from protein and exon matching
    over total genes sampled from genomes.  Protein matches used bitscore >= 150
    (p<= 1e-13) for reciprocal matching.  Exon matches used %identity >= 80%
    and filters to remove repetitive exons and poor quality gene matches.

(4) Daphnia 27,000 gene subset from >=1MB scaffolds was used
    for exon matching, larger set of 32,000 for protein matching

Total Protein duplicates (using reciprocal blastp, bitscore >= 150)
Species             Dupl. / Total   Percent
Daphnia pulex       13972 / 28093       50%
Cae. elegans        8674 / 19692        44%
Dros. melanogaster  4497 / 13391        34%
Dros. grimshawi     6278 / 15075        42%
Mus musculus        10244 / 18871       54%
 
See here for exon and protein analysis details and summaries:
  http://wfleabase.org/release1/dpulex_jgi060905/gene-duplicates/
  http://wfleabase.org/genome-summaries/gene-duplicates/