Tigmint is a software tool to address assembly errors in large molecules reads such as those generated by 10X Genomics Chromium platform. The utility of Tigmint is for correcting the assemblies of mutliple assembly tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. More detailed information can be found in Jackman et al 2018. Access Tigmint here: https://github.com/bcgsc/tigmint.
The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. More detailed information can be found in Coombe et al 2018. To access ARKS : https://github.com/bcgsc/arks
ARCS is a new application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. Using this tool, we have shown how the contiguity of an ABySS Homo sapiens genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts. More detailed information can be found in Yeo et al 2018. To access ARCS: https://github.com/bcgsc/ARCS/.
ChopStitch is a new algorithm for finding putative exons de novo and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-Seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also accounts for base substitutions in transcript sequences that may be derived from sequencing or assembly errors, haplotype variations, or putative RNA editing events. More detailed information can be found in Khan et al 2018. To access ChopStitch: https://github.com/bcgsc/ChopStitch.
ABySS 2.0 is the second version of our flagship sequence assembly algorithm. It improves on the resource efficiency of ABySS, and provides support for the emerging sequencing technologies, including those from 10x Genomics (Pleasanton, CA), Pacific Biosciences (PacBio, Menlo Park, CA), and Oxford Nanopore Technologies (ONT, Oxford, UK). We have demonstrated that ABySS 2.0 and its associated algorithms can assemble human genomes to chromosome-scale scaffolds, using computational resources readily available in modern servers. More detailed information can be found in Jackman, Vandevalk et al 2017. To access ABySS 2.0: https://github.com/bcgsc/abyss.
Kollector is an alignment-free targeted assembly algorithm approach to perform local assembly of sequences of interest. A typical use case for the algorithm is the assembly of genic loci of non-model organisms using a set of transcript sequences. The resulting sequences can be readily utilized for more focused biological research, for example to study cis-regulatory elements. More detailed information can be found in Kucuk et al 2017. To access Kollector: https://github.com/bcgsc/kollector.
ntCard performs a fundamental bioinformatics function to analyze the sequence content of large volumes of raw sequencing data. It provides statistics for estimating the sequencing error frequency, genome size, and the repeat content by profiling the k-mer spectrum of the input data. ntCard implements a computationally efficient algorithm that can process 90x coverage of spruce mega-genome in 30 min using 500 MB of RAM. More detailed information can be found in Mohamadi, Khan, and Birol, 2017. To access ntCard: https://github.com/bcgsc/ntCard.