Gene variation by Gene Ontology group in Drosophila genomes
Deviations in GO categories by species genomes for gene match counts.
These may indicate where species genes differ in functional categories.
Statistically significant deviations are brightly colored.
Low counts or 'missing genes' may be due to divergence rather than lack;
extra gene matches indicate something more is there.
The "gene match counts" here are High-scoring Segment Pair (HSP) groupings,
and include various events: gene duplications, alternate
splice exons within genes, new genes that appear composed of exons
from other genes, as well as computational artifacts (see notes
below). The detail pages provide links to GBrowse genome map views
showing all secondary HSPs.
Genome averages of gene count and other protein match statistics are
here
and PDF.
GO-Slim groupings are used for Biological Process, Molecular Function, Cell Location (~125 categories).
Find below this table "euprot4go.tab.gz" which has correspondence between MOD gene ids, GO primary ids, and
the GO-slim grouping ids used here. Chris Mungall's GO map2slim software was used for this. Current
GO associations for genes used in BLAST analyses were used. See below counts of GO associations available
for each proteome thus identified with GO groupings.
All protein matches for tBLASTn, probability <= 1e-3, includes duplicate matches.
Low score matches contained in the location of better matches are removed.
Gene counts are based on High-scoring Segment Pair (HSP) groupings, where the
group is determined from overlap of query protein parts, and target genome overlaps.
Included are HSP groups that are distinct protein parts in the same gene region (alternate
exons), as well as protein parts found at distinct genome locations. The data includes
computational artifacts, esp. where paralogs exist, a secondary HSP group for paralog-A
can partially overlap primary HSP matches to paralog-B.
Proteome source subsets are those organism with extensive
GO annotations: Dmel, Mouse, Worm, Yeast
Target genomes analyzed include Drosophila species along with outgroup species
Ano. gambia, Daphnia pulex and C. elegans
Data tables used in this analysis, extracted from BLAST output, are below:
modDM.gob5stats.gz : fruitfly, 8109 GO genes of 13472 in proteome,
modMM.gob5stats.gz : mouse, 12732 GO genes of 18941 in proteome,
modCE.gob5stats.gz : worm, 8812 GO genes of 19764 in proteome,
modSC.gob5stats.gz : yeast, 5758 GO genes of 5777 in proteome.
Table fields match those in the genome-mean-wfmgenes8 figures:
species DB query : genome target and gene source,
align eval bits exonHSP intronGap len1 : values of best gene match,
nparalog : number of distinct "gene" matches (HSP groups),
dist12 len2 : distance to and length of 1st duplicate,
dist13 len3 : distance to and length of 2nd duplicate,
GOC GOID : GO class and GO-slim ID