Comparative Genomics with SynExtend and DECIPHER • CompGenomicsBioc2022

The SynExtend and DECIPHER packages for R incorporate a wealth of easy to use functions for comparative genomics analyses. This interactive tutorial series will introduce users to these packages by walking them through a complete workflow of identifying co-evolving genes from a dataset of genome sequences. This webpage was created for presentation at Bioconductor 2022, but the content will be freely available forever.

I’ve summarized on this page all the skills you can expect to learn by working through the tutorials on this site. When you’re ready to get started, check out the Overview page!

Note: When this tutorial was originally given, multiple steps of the pipeline used a function called ProtWeaver. This has since been renamed to EvoWeaver (as well as ProtWeb renamed to EvoWeb). I’ve attempted to correct all the locations where this occurs, but you may encounter references to the old naming scheme in files available through the Docker image.

Topics Covered

Loading Genome Data with `DECIPHER`

The first step in analyzing genomics data is loading the data itself. Here we will download sequencing data from NCBI as a .fasta, load it into R, then perform some basic operations with the data. Users will learn to efficiently work with large scale genomics data, including visualization and alignment of sequencing data.

^{Function Reference}

Gene Calling and Annotation with `DECIPHER`

A natural next step is identifying what elements comprise each genome in our dataset. Users will learn to programmatically identify coding and non-coding regions of genomes, and annotate them with predicted KEGG orthology groups using IDTAXA.

^{Function Reference}

Annotation of COGs with `SynExtend`

Annotated genetic regions can be mapped across organisms into clusters of orthologous genes (COGs). Users will learn how to identify COGs at scale using the data generated in the previous step.

^{Function Reference}

Constructing Gene Trees with `DECIPHER`

Each COG comprises sets of conserved orthologs across species. These data, combined with sequencing data for each ortholog, allow us to reconstruct the evolutionary history of each COG. Users will learn how to construct, visualize, and save phylogenetic trees from sets of genomes using the TreeLine() function.

^{Function Reference}

Identifying Co-evolving Gene Collectives with `SynExtend`

With these data, we can analyze patterns in evolutionary signal across COGs. Co-evolutionary signal between genes implies functional association, so finding COGs under shared selective pressure aids us in uncovering the mechanisms of intracellular pathways. Users will learn to use the EvoWeaver class to tease out subtle evidence of correlated evolutionary pressure in order to create co-evolutionary networks.

^{Function Reference}