Genome Assemblies data

Improvement to the interior White Spruce (PG29) genome sequence

We are continuing to refine the PG29 genotype draft genome of British Columbia interior white spruce (Picea engelmannii x glauca x sitchensis), and have now release version 5 [ALWZ000000000]. The improved assembly is derived from the scaffolding of the version 4 assembly using 10x Genomics linked read data, sequenced from a haploid (megagametophyte tissue) PG29 DNA sample. The long-range information provided by the 10x Genomics sequencing platform was used for correction with Tigmint and scaffolding with ARCS, both tools developed by the Spruce-Up Project team (Jackman et al 2018, Yeo et al 2018, tools available from https://github.com/bcgsc/). To cite the BC interior white spruce PG29 genome sequence resource, please use Birol et al 2013 and Warren et al 2015. Organellar genome sequences of PG29 are also published and described in the following article: Jackman et al 2016. The PG29 genome assemblies are available for download through the National Center for Biotechnology Information (NCBI): [Bioproject PRJNA83435].

Improvement to the White Spruce (WS77111) genome sequence

A second version of the draft genome sequence of the WS77111 genotype of eastern Canada white spruce (Picea glauca) has been generated from Tigmint correction and ARCS scaffolding (Jackman et al 2018, Yeo et al 2018, both tools available from https://github.com/bcgsc/) of 10x Genomics linked read sequence data derived from the original diploid DNA source. The updated assembly’s annotation has been submitted to NCBI. To cite the WS77111 genome draft, please use the following article: Warren et al 2015. The genome sequence is available for download through NCBI: [Bioproject PRJNA242552, JZKD000000000]]. We also present the complete chloroplast genome sequence of white spruce Picea glauca, genotype WS77111 (Lin et al 2019). In addition to the linked reads, we have recently generated nanopore long reads (Oxford Nanopore Technologies, Inc.) for white spruce, and are working on an improved genome assembly using this sequencing data type.

Sitka Spruce draft genome assembly (Q903)

We have recently sequenced the Sitka spruce (Picea sitchensis) Q903 genotype on the 10x Genomics linked read platform. The initial assembly was generated on a single-server with ABySS v2.0 (Jackman et al 2017). The latter constitutes a major advance for the assembly of large genomes, and its departure from message passing interface for parallel computing of the resource-hungry assembly task, now in favour of lightweight Bloom filter data structures with an overall lesser compute footprint. This software improvement is enabling for the determination of large conifer genome sequences, as it permits timely optimization of assembly parameters on a more manageable compute infrastructure. In addition to the linked read technology, we generated low-coverage nanopore long reads to further enhance the draft genome. We refined the initial Q903 nuclear genome sequence assembly by applying diverse scaffolding methodologies developed by the Spruce-Up team (Warren et al 2015; Yeo et al 2017). to utilize the long-range information provided by the linked and long read data. This first nuclear genome assembly for Sitka spruce is available from NCBI under accession SNQJ000000000. Previously, we reported one of the first use-case that utilized 10x Genomics linked reads exclusively in a whole-genome sequencing project, that of the Sitka spruce Q903 chloroplast genome. To refer to the manuscript describing this work, please cite: Coombe et al 2016. Furthermore, we recently assembled and annotated the mitochondrion genome of Sitka spruce, and reported on its complex physical structure (Jackman et al 2020). The chloroplast and mitochondrion genomes are available for download from NCBI [KU215903, MK697696-MK697708].

Engelmann Spruce draft genome assembly (Se404-851)

We report on the genome sequencing of an Engelmann spruce from Western Canada (Picea engelmannii) Se404-851 genotype to 94X sequence coverage using a combination of sequencing data types, including paired-end and MPET (Illumina), linked (10x Genomics Chromium) and Nanopore long reads (Oxford Nanopore Technologies Ltd). This first assembly was generated with ABySS v2.0 (Jackman et al 2017) and linked read scaffolding tools (Jackman et al 2018; Yeo et al 2017). Nanopore long reads were also used to contiguate the final assembly, the highest yet for a spruce genome (NG50 length = 0.56 Mbp). The assembly is available for download from NCBI [WSFP000000000]. The Engelmann spruce chloroplast genome assembly is available for download from NCBI [MK241981] and has recently been described (Lin et al 2019).

Black Spruce chloroplast genome assembly (40-10-1)

We present the chloroplast genome sequence of black spruce (Picea mariana), a conifer widely distributed throughout the North American boreal forests (genotype 40-10-1 was collected in Thunder Bay, Ontario). This complete and annotated chloroplast sequence is 123,961 bp long and will contribute to future studies on the genetic basis of evolutionary change and adaptation in spruces and conifers (Lo et al 2020). The complete chloroplast genome sequence of Picea mariana, genotype 40-10-1, is available from GenBank under accession number MT261462, and the raw sequencing reads are available from the SRA under SRX7890468 and SRR11284755.