Improvement to the interior White Spruce (PG29) genome sequence

We are continuing to refine the PG29 genotype draft genome of British Columbia interior white spruce (Picea engelmannii x glauca), and soon expect a 5th genome release. The improved assembly will be derived from the scaffolding of the version 3 assembly using 10x Genomics linked read data, sequenced from a haploid (megagametophyte tissue) PG29 DNA sample. The long-range information provided by the 10x Genomics sequencing platform will be used for scaffolding with ARCS, a software the Spruce-Up Project team has recently developed (Yeo et al 2018). To cite the BC interior white spruce PG29 genome sequence resource, please use Birol et al 2013  and Warren et al 2015. Organellar genome sequences of PG29 are also published and described in the following article: Jackman et al 2016. The PG29 genome assemblies are available for download through the National Center for Biotechnology Information (NCBI): [Bioproject PRJNA83435].

Improvement to the White Spruce (WS77111) genome sequence

A second version of the draft genome sequence of the WS77111 genotype of eastern Canada white spruce (Picea glauca) has been generated from ARCS scaffolding with 10x Genomics linked read sequence data derived from the original diploid DNA source. Submission to NCBI is pending gene annotation, which has been initiated (Fall 2017). To cite the WS77111 genome draft, please use the following article: Warren et al 2015. The genome sequence is available for download through NCBI: [Bioproject PRJNA242552].

Sitka Spruce draft genome assembly (Q903)

We have recently sequenced the Sitka spruce (Picea sitchensis) Q903 genotype on the 10x Genomics linked read platform. An initial assembly has been generated on a single-server with ABySS v2.0 (Jackman et al 2017). The latter constitutes a major advance for the assembly of large genomes, and its departure from message passing interface for parallel computing of the resource-hungry assembly task, now in favour of lightweight Bloom filter data structures with an overall lesser compute footprint. This software improvement is enabling for the determination of large conifer genome sequences, as it permits timely optimization of assembly parameters on a more manageable compute infrastructure. We are in the process of refining the initial Q903 nuclear genome sequence assembly by applying diverse scaffolding methodologies developed by the Spruce-Up team (Warren et al 2015; Yeo et al 2017). A novel, linked read-aware genome assembly software is under active development. We are also in the process of generating long nanopore sequence reads (Oxford Nanopore Technologies), which is expected to improve both the accuracy and contiguity of the Sitka spruce nuclear genome sequence. Previously, we reported one of the first use-case that utilized 10x Genomics linked reads exclusively in a whole-genome sequencing project, that of the Sitka spruce Q903 chloroplast genome. To refer to the manuscript describing this work, please cite: Coombe et al 2016. The chloroplast genome is available for download from NCBI [KU215903].