The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education and research. The ORCA environment on a server is implemented using Docker containers, but without requiring users to interact directly with Docker, suitable for novices who may not yet have familiarity with managing containers. ORCA has been used successfully to provide a private bioinformatics environment to external collaborators at a large genome institute, for teaching an undergraduate class on bioinformatics targeted at biologists, and to provide a ready-to-go bioinformatics suite for a hackathon. Using ORCA eliminates time that would be spent debugging software installation issues, so that time may be better spent on education and research. More detailed information can be found in Jackman et al 2019. Access ORCA here: https://hub.docker.com/r/bcgsc/orca/
RapidACi is an R package for the batch treatment of Rapid carbon dioxide response curves (A-Ci) generated by the LI-COR® portable systems. It is a tool to accelerate photosynthesis phenotyping measurements. More detailed information can be found in Coursolle et al 2019. Access RapidACi here: https://github.com/ManuelLamothe/RapidACi
In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. ntEdit is a scalable genomics application for polishing genome assembly drafts. ntEdit simplifies polishing and “haploidization” of gene and genome sequences with its re-usable Bloom filter design. We expect ntEdit to have additional applications in fast mapping of simple nucleotide variations between any two individuals or species’ genomes. We generated 17-fold coverage spruce sequence data from haploid sequence sources (seed megagametophytes), and used it to edit our pseudo haploid assemblies of the 20 Gbp interior and white spruce genomes in <4 and <5h, respectively, making roughly 50M edits at a (substitution+indel) rate of 0.0024. More detailed information can be found in Warren et al 2019. Access to ntEdit here: https://github.com/bcgsc/ntEdit.
Tigmint is a software tool to address assembly errors in large molecules reads such as those generated by 10X Genomics Chromium platform. The utility of Tigmint is for correcting the assemblies of mutliple assembly tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. More detailed information can be found in Jackman et al 2018. Access Tigmint here: https://github.com/bcgsc/tigmint.
The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. More detailed information can be found in Coombe et al 2018. To access ARKS : https://github.com/bcgsc/arks
ARCS is an application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. Using this tool, we have shown how the contiguity of an ABySS Homo sapiens genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts. More detailed information can be found in Yeo et al 2018. To access ARCS: https://github.com/bcgsc/ARCS/.
ChopStitch is a new algorithm for finding putative exons de novo and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-Seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also accounts for base substitutions in transcript sequences that may be derived from sequencing or assembly errors, haplotype variations, or putative RNA editing events. More detailed information can be found in Khan et al 2018. To access ChopStitch: https://github.com/bcgsc/ChopStitch.
ABySS 2.0 is the second version of our flagship sequence assembly algorithm. It improves on the resource efficiency of ABySS, and provides support for the emerging sequencing technologies, including those from 10x Genomics (Pleasanton, CA), Pacific Biosciences (PacBio, Menlo Park, CA), and Oxford Nanopore Technologies (ONT, Oxford, UK). We have demonstrated that ABySS 2.0 and its associated algorithms can assemble human genomes to chromosome-scale scaffolds, using computational resources readily available in modern servers. More detailed information can be found in Jackman, Vandevalk et al 2017. To access ABySS 2.0: https://github.com/bcgsc/abyss.
Kollector is an alignment-free targeted assembly algorithm approach to perform local assembly of sequences of interest. A typical use case for the algorithm is the assembly of genic loci of non-model organisms using a set of transcript sequences. The resulting sequences can be readily utilized for more focused biological research, for example to study cis-regulatory elements. More detailed information can be found in Kucuk et al 2017. To access Kollector: https://github.com/bcgsc/kollector.
ntCard performs a fundamental bioinformatics function to analyze the sequence content of large volumes of raw sequencing data. It provides statistics for estimating the sequencing error frequency, genome size, and the repeat content by profiling the k-mer spectrum of the input data. ntCard implements a computationally efficient algorithm that can process 90x coverage of spruce mega-genome in 30 min using 500 MB of RAM. More detailed information can be found in Mohamadi, Khan, and Birol, 2017. To access ntCard: https://github.com/bcgsc/ntCard.