DECIPHER packages for R incorporate a wealth of easy to use functions for comparative genomics analyses. This interactive tutorial series will introduce users to these packages by walking them through a complete workflow of identifying co-evolving genes from a dataset of genome sequences. This webpage was created for presentation at Bioconductor 2022, but the content will be freely available forever.
I’ve summarized on this page all the skills you can expect to learn by working through the tutorials on this site. When you’re ready to get started, check out the Overview page!
The first step in analyzing genomics data is loading the data itself. Here we will download sequencing data from NCBI as a
.fasta, load it into R, then perform some basic operations with the data. Users will learn to efficiently work with large scale genomics data, including visualization and alignment of sequencing data.
A natural next step is identifying what elements comprise each genome in our dataset. Users will learn to programmatically identify coding and non-coding regions of genomes, and annotate them with predicted KEGG orthology groups using
Annotated genetic regions can be mapped across organisms into clusters of orthologous genes (COGs). Users will learn how to identify COGs at scale using the data generated in the previous step.
Each COG comprises sets of conserved orthologs across species. These data, combined with sequencing data for each ortholog, allow us to reconstruct the evolutionary history of each COG. Users will learn how to construct, visualize, and save phylogenetic trees from sets of genomes using the
With these data, we can analyze patterns in evolutionary signal across COGs. Co-evolutionary signal between genes implies functional association, so finding COGs under shared selective pressure aids us in uncovering the mechanisms of intracellular pathways. Users will learn to use the
ProtWeaver class to tease out subtle evidence of correlated evolutionary pressure in order to create co-evolutionary networks.
By working through this website, users will be able to perform the following tasks in R:
- Visualize sequence data
- Work with big genomic data
- Identify and annotate genes from sequence data
- Identify COGs from a set of gene calls
- Build phylogenies at the species and gene level
- Analyze shared evolutionary pressure on COGs
- Predict novel protein function from coevolutionary signal
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.