Skip to contents

The SynExtend and DECIPHER packages for R incorporate a wealth of easy to use functions for comparative genomics analyses. This interactive tutorial series will introduce users to these packages by walking them through a complete workflow of identifying co-evolving genes from a dataset of genome sequences. This webpage was created for presentation at Bioconductor 2022, but the content will be freely available forever.

I’ve summarized on this page all the skills you can expect to learn by working through the tutorials on this site. When you’re ready to get started, check out the Overview page!

Note: When this tutorial was originally given, multiple steps of the pipeline used a function called ProtWeaver. This has since been renamed to EvoWeaver (as well as ProtWeb renamed to EvoWeb). I’ve attempted to correct all the locations where this occurs, but you may encounter references to the old naming scheme in files available through the Docker image.


Topics Covered

 

Loading Genome Data with DECIPHER

The first step in analyzing genomics data is loading the data itself. Here we will download sequencing data from NCBI as a .fasta, load it into R, then perform some basic operations with the data. Users will learn to efficiently work with large scale genomics data, including visualization and alignment of sequencing data.

Function Reference

Gene Calling and Annotation with DECIPHER

A natural next step is identifying what elements comprise each genome in our dataset. Users will learn to programmatically identify coding and non-coding regions of genomes, and annotate them with predicted KEGG orthology groups using IDTAXA.

Function Reference

Annotation of COGs with SynExtend

Annotated genetic regions can be mapped across organisms into clusters of orthologous genes (COGs). Users will learn how to identify COGs at scale using the data generated in the previous step.

Function Reference

Constructing Gene Trees with DECIPHER

Each COG comprises sets of conserved orthologs across species. These data, combined with sequencing data for each ortholog, allow us to reconstruct the evolutionary history of each COG. Users will learn how to construct, visualize, and save phylogenetic trees from sets of genomes using the TreeLine() function.

Function Reference

Identifying Co-evolving Gene Collectives with SynExtend

With these data, we can analyze patterns in evolutionary signal across COGs. Co-evolutionary signal between genes implies functional association, so finding COGs under shared selective pressure aids us in uncovering the mechanisms of intracellular pathways. Users will learn to use the EvoWeaver class to tease out subtle evidence of correlated evolutionary pressure in order to create co-evolutionary networks.

Function Reference

Conclusion

By working through this website, users will be able to perform the following tasks in R:

  • Visualize sequence data
  • Work with big genomic data
  • Identify and annotate genes from sequence data
  • Identify COGs from a set of gene calls
  • Build phylogenies at the species and gene level
  • Analyze shared evolutionary pressure on COGs
  • Predict novel protein function from coevolutionary signal