wFleaBase | BLAST | BioMart | GBrowse Maps | Genomics | Help |
Eukaryote coding density (pdf) | Daphnia CDS density: singleton vs tandem regions (pdf) |
---|---|
The whole genome average doesn't show the distribution of gene density or higher density in some regions (and doesn't account for many more gaps in Daphnia). The two figures here show that Daphnia has a density skewed toward the higher C. elegans, from lower insects gene density. The second shows that Daphnia's gene duplicates are in regions of higher gene density (not a big surprise).
In Daphnia regions with 2+ genes, where there are duplicates, about 35% of the region is coding. Where there are 2+ genes but no duplicates, only 10-20% is coding. Overall, Celegans peaks at about 23% coding, Daphnia at about 18% and insects at about 10%. The averages don't show this due to broad tails on these distributions.
Whole genome coding sequence ratio (distinct cds-exons including alternate transcripts) # nematodes celegans wb176 : ntr=27049, n=124138, m=204.413, cds=25375492, tb=100241936, c/t=0.253 cbriggsae wb176 : ntr=19525, n=114373, m=210.568, cds=24083296, tb=108443721, c/t=0.222 # crustacean daphnia JGI_V11 : ntr=30940 n=142754, m=211.28, cds=30160786, tb=174233412, c/t=0.173 daphnia Gnomon : ntr=37466 n=151668, m=237.45, cds=36014074, tb=200738384, c/t=0.179 # drosophila drosmel ncbi : ntr=14560, n=55078, m=404.47, cds=22277417, tb=120231707, c/t=0.185 drosmel Gnomon : ntr=20420 n=59534, m=385.79, cds=22967701, tb=129253983, c/t=0.178 drossec Gnomon : ntr=25689 n=69851, m=356.17, cds=24878808, tb=138574395, c/t=0.180 drossim Gnomon : ntr=19885 n=63858, m=350.95, cds=22410664, tb=142312176, c/t=0.157 drosyak Gnomon : ntr=20302 n=67234, m=370.24, cds=24892648, tb=168514273, c/t=0.148 drosere Gnomon : ntr=18662, n=61599, m=376.01, cds=23161570, tb=136287721, c/t=0.170 drosana Gnomon : ntr=23784 n=74217, m=376.16, cds=27917668, tb=195136171, c/t=0.143 drospse Gnomon : ntr=19259, n=65407, m=375.92, cds=24587759, tb=143281209, c/t=0.172 drosper Gnomon : ntr=24696 n=75374, m=355.43, cds=26790450, tb=163411818, c/t=0.164 droswil Gnomon : ntr=24920 n=73171, m=365.77, cds=26763491, tb=207912054, c/t=0.129 drosmoj Gnomon : ntr=17950, n=63811, m=368.24, cds=23497700, tb=174752375, c/t=0.134 drosvir Gnomon : ntr=18636 n=65573, m=368.18, cds=24142814, tb=179118823, c/t=0.135 drosgri Gnomon : ntr=17922, n=64047, m=359.51, cds=23025399, tb=153703083, c/t=0.150 # other insects anogam ncbi : ntr=12444, n=48852, m=357.99, cds=17488695, tb=230175766, c/t=0.075 apismel ncbi : ntr=9429, n=70453, m=234.09, cds=16492577, tb=177733093, c/t=0.093 nasvit fgenesh : ntr=26115, n=115968, m=282.68, cds=32782779, tb=267327937, c/t=0.122 nasvit Gnomon : ntr=28998, n=118889, m=264.06, cds=31394613, tb=270065215, c/t=0.116 #...................... Key: tb=total bases that gene cds bases span cds=sum of all non-overlapping cds bases (including alt-tr distinct exons of same gene) m=mean cds-exon length n=number of distinct cds-exons c/t= ratio of cds/tb bases ntr=number of transcripts #...................... # Method: sum CDS bases from genome GFF files set ntr=`gunzip -c $gf | grep -c 'mRNA'` echo -n "CDSbases $gf : ntr=$ntr " ; gunzip -c $gf | grep 'CDS ' | \ sort -k1,1 -k4,4n -k5,5nr | perl -ne '($r,$s,$t,$b,$e)=split; if($lr and $lr ne $r) { $tb+=$le;} \ unless($r eq $lr and $b < $le and $e > $lb) { $n++; ($b,$e)=($e,$b) if($e<$b); $sb += 1+$e-$b;} \ ($lr,$lb,$le)=($r,$b,$e); END{ $m=$sb/$n; $tb += $le; $cb=$sb/$tb; \ printf "\tn=$n,\t m=%.2f,\t cds=$sb,\t tb=$tb,\t c/t=%.3f\n",$m,$cb;}'