ARCS is a new application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. Using this tool, we have shown how the contiguity of an ABySS Homo sapiens genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts. More detailed information can be found in Yeo et al 2018. To access ARCS: https://github.com/bcgsc/ARCS/.
ChopStitch is a new algorithm for finding putative exons de novo and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-Seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also accounts for base substitutions in transcript sequences that may be derived from sequencing or assembly errors, haplotype variations, or putative RNA editing events. More detailed information can be found in Khan et al 2018. To access ChopStitch: https://github.com/bcgsc/ChopStitch.
ABySS 2.0 is the second version of our flagship sequence assembly algorithm. It improves on the resource efficiency of ABySS, and provides support for the emerging sequencing technologies, including those from 10x Genomics (Pleasanton, CA), Pacific Biosciences (PacBio, Menlo Park, CA), and Oxford Nanopore Technologies (ONT, Oxford, UK). We have demonstrated that ABySS 2.0 and its associated algorithms can assemble human genomes to chromosome-scale scaffolds, using computational resources readily available in modern servers. More detailed information can be found in Jackman, Vandevalk et al 2017. To access ABySS 2.0: https://github.com/bcgsc/abyss.
Kollector is an alignment-free targeted assembly algorithm approach to perform local assembly of sequences of interest. A typical use case for the algorithm is the assembly of genic loci of non-model organisms using a set of transcript sequences. The resulting sequences can be readily utilized for more focused biological research, for example to study cis-regulatory elements. More detailed information can be found in Kucuk et al 2017. To access Kollector: https://github.com/bcgsc/kollector.
ntCard performs a fundamental bioinformatics function to analyze the sequence content of large volumes of raw sequencing data. It provides statistics for estimating the sequencing error frequency, genome size, and the repeat content by profiling the k-mer spectrum of the input data. ntCard implements a computationally efficient algorithm that can process 90x coverage of spruce mega-genome in 30 min using 500 MB of RAM. More detailed information can be found in Mohamadi, Khan, and Birol, 2017. To access ntCard: https://github.com/bcgsc/ntCard.