wFleaBase | BLAST | BioMart | GBrowse Maps | Genomics | Help

Daphnia pulex: Protein gene duplicate analysis

Daphnia pulex's genome appears to have 50% to twice as many gene duplications as the duplicate-rich C. elegans genome. In these figures, gene duplicates (paralogs) are shown as a function of gene family size, and by distance between tandem duplicates, for Daphnia, C. elegans, Drosophila and Mouse genomes. Duplicate genes here are determined by BlastP of all proteins identified in these genomes. The criteria used is reciprocal matches with bitscore >= 150 (e-value <~ 1e-50).

Figure A shows the number of Daphnia duplicates (in green) increases in medium to larger clusters of similar genes. Comparing this as percent of total genes (B), the Daphnia excess in clusters of 8 or more is apparent. Both of these figures plot the cumulative gene count with increasing gene family size. Figure C shows that duplicates in larger family sizes (10..80 paralogs) have a high frequency within 1 to 2 kilobases of each other for Daphnia, similar to but higher than that for C. elegans. Figure D shows for small clusters (2..5 paralogs) this nearby tandem effect is not as common, and shows little species effect. Figure E below extends Figs. C,D with distant duplicates.

Species Paralogs / TotalPercent
Daphnia pulex 13972 / 28093 50%
Cae. elegans 8674 / 19692 44%
Dros. melanogaster 4497 / 13391 34%
Dros. grimshawi 6278 / 15075 42%
Mus musculus 10244 / 18871 54%

A. Cumulative Duplicate genes by family size PDF B. Cumulative % Duplicates / Total genes by family size PDF
dpulex_aaparalog_famsize dpulex_aaparalog_famsize
C. Tandem gene distance, large gene families PDF D. Tandem gene distance, small gene families PDF
dpulex_aaparalog_famsize dpulex_aaparalog_famsize

E. Duplicate gene counts by distance PDF
Same data as above, as duplicate gene counts by distance class. Far and Unlinked distance classes here are not shown in Figs. C, D. Inverted duplicate gene counts are indicated in cross-hatch.
Figure E extends Figs. C, D with distant duplicates (Far and Unlinked), as well as shows the portion of inverted duplicates. Note the high % inverted duplicates in Far class. There is a suggestion that the nearby tandem duplicates in Daphnia show a lower inversion rate than other species, a finding consistent with recent evolution of these nearby duplicates.

Daphnia shows an excess in Unlinked (across-scaffold) duplicate genes as well as the very near 1-Kb tandem genes. As this draft genome assembly has many thousands of small scaffolds, the unlinked duplicates may be found to be nearby tandems with further assembly refinement. There is a suggestion the small scaffolds failed to assemble in part due to tandem duplicate gene regions.

Don Gilbert, Aug 2007,