The properly rendered version of this document can be found at Read The Docs.
If you are reading this on github, you should instead click here.
This workshop was presented at the annual Bioconductor Developer’s Conference.
Google has some pretty amazing big data computational “hammers” that they have been applying to search and video data for a long time. In this workshop we take those same hammers and apply them to whole genome sequences.
We do this all from the comfort of the R prompt using common packages including VariantAnnotation, ggbio, ggplot2, dplyr, bigrquery, and the new Bioconductor package GoogleGenomics which provides an R interface to Google’s implementation of the Global Alliance for Genomics and Health API.
Sign up for Google Cloud Platform by clicking on this link: https://console.cloud.google.com/billing/freetrial
Enable all the Google Cloud Platform APIs we will use in this workshop by clicking on this link.
Follow the Windows, Mac OS X or Linux instructions to install gcloud on your local machine: https://cloud.google.com/sdk/
- Download and install the Google Cloud SDK by running this command in your shell or Terminal:
curl https://sdk.cloud.google.com | bash
Restart your shell or Terminal.
$ gcloud auth login
- Configure the project:
$ gcloud config set project <YOUR_PROJECT_ID>
To further the goals of reproducibility, ease of use, and convenience, you can run this codelab in a Bioconductor Docker container deployed to Google Compute Engine. But this codelab can be run from anywhere since all the heavy lifting is happening in the cloud regardless of where R is running.
Bioconductor maintains Docker containers with R, Bioconductor packages, and RStudio Server all ready to go! Its a great way to set up your R environment quickly and start working. The instructions are below but if you want to learn more, see http://www.bioconductor.org/help/docker/.
- Click on click-to-deploy Bioconductor to navigate to the deployer page on the Cloud Platform Console.
- In field Docker Image choose item
- Click on More to display the additional form fields.
- In field Custom docker image paste in value
- Click on the Deploy Bioconductor button.
- Follow the post-deployment instructions to log into RStudioServer via your browser!
To run the docker container locally:
- Install Docker for your platform.
- Run command
docker run gcr.io/bioc_2015/devel_sequencing
Note that its big, over
4GB, since it is derived from the Bioconductor Sequencing view and contains many annotation databases.
# Install BiocInstaller. source("http://bioconductor.org/biocLite.R") # See http://www.bioconductor.org/developers/how-to/useDevel/ useDevel() # Install devtools which is needed for the special use of biocLite() below. biocLite("devtools") # Install the workshop material. biocLite("googlegenomics/bioconductor-workshop-r", build_vignettes=TRUE, dependencies=TRUE)
- View the workshop documentation.
- Click on “User guides, package vignettes and other documentation.”
- Early on in the workshop you will need an API_KEY. You can get this by clicking on this link: https://console.cloud.google.com/project/_/apiui/credential
- Click on vignette “Bioc2015Workshop” and follow the instructions there to run the vignettes line-by-line or chunk-by-chunk!
- To run line-by-line, put your cursor on the desired line and click the “Run” button or use keyboard shortcuts for Windows/Linux:
- To run chunk-by-chunk, put your cursor in the desired chunk and click the “Chunks -> Run Current Chuck” button. or use keyboard shortcuts for Windows/Linux:
If you just want to read the rendered results of the four codelabs, here they are:
If you would like to pause your VM when not using it:
- Go to the Google Cloud Platform Console and select your project: https://console.cloud.google.com/project/_/compute/instances
- Click on the checkbox next to your VM.
- Click on Stop to pause your VM.
- When you are ready to use it again, Start your VM. For more detail, see: https://cloud.google.com/compute/docs/instances/stopping-or-deleting-an-instance
If you want to delete your deployment:
- First copy any data off of the data disk that you wish to keep. The data disk will be deleted when the deployment is deleted.
- Click on Deployments to navigate to your deployment and delete it.