Daphnia genome assemblies, 2020 assessment, in-progress results by Don Gilbert, http://wfleabase.org/genome/Daphnia_species_genomes/ ================================================================================= Rough taxa distances (cf. Ebert+ 2019 paper) |.... Daphnia magna (Europe) : 3+ chr-assemblies, annotated genes ----| | |.... Daphnia carinata (Australia) : 1 chr-assembly | |.. Daphnia similoides (China ?) : annotated genes mapped to D.cari | |........ Daphnia pulex (N. America) : 3 chr-assemblies, annotated genes Genome size estimates from flow cytometry Daphnia pulex(es) : 190-264 Mb (0.193-0.25 pg [1,2] ) Daphnia magna(s) : 234-391 Mb (0.240-0.40 pg [2,3] ) Daphnia carinata : 254 Mb (0.26 pg [3]) Drosophila pseudoobscura : 161-180 Mb (male-female [4]) Refs: 1. doi: 10.1111/j.1095-8312.2008.01185.x (2009); 2. doi: 10.1186/1471-2164-15-1033 (2014); 3. http://www.genomesize.com/ (Daphnia refs there); 4. doi: 10.1534/g3.119.400560 ================================================================================= Genome-wide excess DNA depth discrepancies on Dcari20 and Dmagna19 assemblies, somewhat on Dpulex16, suggest these assemblies are missing a large amount of duplicate-gene, and transposon, sequence. Estimated missing assembly from DNA depth gives values near to the Flow cytometry estimated genome sizes. Table DC1b. Excess DNA coverage depths (XC) measured for Daphnia genome assemblies, observed and estimated genome sizes, and portion with coding sequences (CDS) and duplicated regions (Dup), including transposons. -------------------------------------------------------------------- Assembly CDS Dup XC ObsMb EstMb FlowCy ------------------------------------------- Fruitfly Dropse20 33 34 1.0 160 166 160-180 Mb Dapnia magna Dmagna10 54 22 1.9 134 214 230+ Mb Dmagna14 78 79 1.4 180 227 230 Dmagna19 59 27 1.7 123 192 230 Dmagna20 86 88 1.0 247 237 230 Daphnia pulex Dpulex06 60 45 1.0 220 218 200+ Mb Dpulex16 60 37 1.1 156 170 200 Dpulex19 80 89 1.0 190 180 200 Daphnia carinata Dcari20 59 34 1.3 132 162 250 Mb --------------------------------------------- CDS = Observed megabases with CDS-mapped reads. Dup = Observed megabases of multi-mapped reads. XC = XCopy, measure of excess read depth relative to depth at conserved unique genes ObsMb = Observed assembly size EstMB = Estimated assembly from observed and XCopy excess read depth FlowCy = Flow cytometry measured size Dropse20, Drosophila pseudoobscura is a control case, from 2020 PacBio-Canu assembly (like dcari20). The ratio of multi-mapping gDNA reads to unique is much higher in Daphnia (55%-66% Dmagna, 52% Dpulex) than Drosophila pse. (23%) Dpse CDS cover of 33 Mb is < 1/2 of Daphnia, that appears to be a true, important value based on using same methods. Daphnia have much more coding sequence than Drosophila, for similar sized genomes. These measures are made on 1Kb spans, averaged over a sliding window of 1Kb, from samples made at 100 base intervals. The values are an average coverage/1kb of CDS, transposons, repeats, and gDNA reads at a given depth. ======================================================== Duplicate (paralog) gene coding sequence, and transposons are signif. positively correlated with gDNA read depth. CDS of unique ortholog genes, and simple repeats are negatively correlated (ie most at 1x depth, few at higher depth). Table DC2a. Correlation of evidence measures with DNA depth on Daphnia assemblies. DNA depth is relative to 1 for unique gene CDS depth. cdsuni is unique genes CDS spans, cdsdup is duplicate/paralog genes CDS spans, repeats are simple repeats and low complexity sequence, transposons are RepeatMasker transposon results. cdsuni cdsdup repeats transposons ------------------------------------------- Dpulex19 -0.089 0.074 -0.093 0.103 Dmagna19 -0.351 0.136 -0.111 0.349 Dcari20 -0.255 0.263 -0.040 0.285 ------------------------------------------- Table DC3c. gDNA missing from chr assembly but found in gene coding sequences. gDNA reads were mapped to both chr-assembly and gene coding sequence. The proportion of reads found in coding sequence missing from chr-assembly is listed. Miss rates below 1% are the expected result; rates above ~1% of same-sample read sets indicate incomplete assembly at coding sequence loci. Species_Asm CDS_miss ------------------------- Daphnia magna Dmagna10nwb 0.17% Dmagna14bgi 0.12% Dmagna19sk 4.41% Dmagna20skma 0.002% Dmagna20ugma 1.04% Daphnia pulex Dpulex16ml 0.6% Dpulex19ml 1.3% Daphnia carinata Dcari20cn 0.2% Drosophila pseudoobscura Dropse20 0.0000003% (1 in 3 million) --------------------------------------