These are very preliminary working notes on locating
pseudogenes in Daphnia. We have started with
Pseudopipe software, as being available open source and
sensible in its operation (using tBlastN of predicted
genes to located non-predicted fragments, etc.)
http://papers.gersteinlab.org/e-print/pseudopipe/
with software at
http://www.pseudogene.org/DOWNLOADS/pipeline_codes/
"These high-confidence pseudogenes are then classified as (1) retro-
transposed pseudogenes, (2) duplicated pseudogenes and (3) pseudogeneic
fragments. Retrotransposed pseudogenes lack introns, have small flanking
direct repeats and a 30 polyadenine tail."
================= notes ====================
Jeong-Hyeon,
One thing I've found very helpful to evaluate any new genome
analysis like this that marks locations, is to view the other
evidence in genome maps.
This way, we could check by eye a few 10s of cases for each
type of pseudogene call from Pseudopipe, and see if we agree
or not with its results. For instance if there is a lot of
gene tile expression, EST and such data at the same location
of some of these, it may be making mistakes.
A quick way to view is to make some html pages, with maybe
10 cases per page / Class, using gbrowse_img to show the
region
Pseudogene case #1:
For gbrowse_img parameter help, see
http://wfleabase.org/cgi-bin/gbrowse_img/dpulex_jgi060905/
See here for a quick perl program to read your table and make
map views, and a few samples.
http://wfleabase.org/release1/dpulex_jgi060905/pseudogenes/
The sample map views suggest a mixed result, more or less what
I found from my earlier look for these -- some look like pseudo genes,
others look like computational mistakes. Of the comp. mistakes,
some esp. in the DUP class look like the result again of GeneWise
prediction mistakes in tandem gene regions: the Pseudopipe error
misses high expressed regions with good looking Gnomon prediction
and poor Genewise prediction. Other Pseudopipe calls have a mix
of high and low gene expression with good looking or no other
gene predictions.
So while we might use the quick answer from
Pseudopipe as a rough estimate of pseudogenes, I think this is
a longer term project that needs more judgement of more of the genome
evidence for a careful answer of Daphnia pseduogenes.
-- Don
.........
#chr pgene pgene band chrom_start chrom_end strand query gene_id unique_name query_start query_end query_len frac num_of_ins num_of_dels num_of_shifts num_of_stops expect ident polya disable num_of_exons exon_bound exon_len intron_bound intron_len class_old class_new short_ID comment comment2
chr4 pgene pgene NoBandData 6100 43728 - JGI_V11_98703 chr4_JGI_V11_98703.1 JGI_V11_98703_chr4_5876 76 75 198 0 16 2 5 0 6.00E-020 0.68 0 D 2 com(6100..6324 43587..43728) com(225 142) com(6325..43586) com(37262) DUP DUP
chr4 pgene pgene NoBandData 6964 8506 - JGI_V11_255566 chr4_JGI_V11_255566.1 JGI_V11_255566_chr4_6140.1 283 341 641 0.09 21 100 13 14 1.00E-046 0.47 0 D 3 com(6964..7324 7442..7742 7723..8506) com(361 301 784) com(7325..7441 7743..7722) com(117 -20) DUP DUP