Data supporting tandem duplicate genes in Daphnia pulex.
Please see this summary http://wfleabase.org/genome-summaries/gene-duplicates/
The below table dpulex_tandy6v11_genepairs.txt
of counts and gene IDs are drawn from duplicate genes located by
exon match analyses, in data file dpulex_tandy6v11.gff.gz
The duplicate analysis identifies regions or clusters, 'gene' models within those,
and exon matches. Nearby and far matches are noted (below).
These data draw exons from JGI Dappu1 release 1.1 gene models, NCBI Gnomon gene models
and DGIL SNAP-homology gene models.
An alternate analysis of protein similiarity (blastp) for the NCBI Gnomon gene models
is in dpulex_jgi060905_Gnomon_full.aa.blastp.gz, summarized in
dpulex_jgi060905_Gnomon_full.blastp150.idchains (all matching genes at bitscore>=150,
approx. E-value 1e-13).
See here for the exon and protein analysis details
http://wfleabase.org/release1/dpulex_jgi060905/gene-duplicates/
Summary table for tandem duplicate genes
# tandem clusters(1) # of tandem genes (2)
Fruitfly 119, 71 1,500 / 13,500, 11% (exons ~ proteins)
Nematode 986, 555 3,000 / 20,000, 15% (exons ~ proteins)
Daphnia 1865, 892 5,400 / 27,000, 20% (exons) (4)
3,900 / 32,000, 13% (proteins)
^^^^ this near/far needs correction for scaffold size;
find 15% near prots using big daph. scaffolds
(1) Tandem clusters are counted as 2+ duplicate genes, and
3+genes in a contigous region with <= 15kb between nearest duplicates,
based on exon-matched genes There is a published
count for C.elegans of 400 for 3+ genes using different criteria of
nearby http://www.wormbook.org/chapters/www_geneduplication/geneduplication.html
which compares to the 555 counted here.
(2) number of nearby (<=15kb) duplicate genes from protein and exon matching
over total genes sampled from genomes. Protein matches used bitscore >= 150
(p<= 1e-13) for reciprocal matching. Exon matches used %identity >= 80%
and filters to remove repetitive exons and poor quality gene matches.
(4) Daphnia 27,000 gene subset from >=1MB scaffolds was used
for exon matching, larger set of 32,000 for protein matching
Total Protein duplicates (using reciprocal blastp, bitscore >= 150)
Species Dupl. / Total Percent
Daphnia pulex 13972 / 28093 50%
Cae. elegans 8674 / 19692 44%
Dros. melanogaster 4497 / 13391 34%
Dros. grimshawi 6278 / 15075 42%
Mus musculus 10244 / 18871 54%
See here for exon and protein analysis details and summaries:
http://wfleabase.org/release1/dpulex_jgi060905/gene-duplicates/
http://wfleabase.org/genome-summaries/gene-duplicates/