Annovar Annotation

The properly rendered version of this document can be found at Read The Docs.

If you are reading this on github, you should instead click here.

If your source data is single-sample VCF, gVCF, or Complete Genomics masterVar format, this page offers some solutions to annotate all variants found within the cohort using Annovar or similar tools.

  1. First, load your data into Google Genomics and export your variants to BigQuery. See Load Genomic Variants for more detail as to how to do this.
  2. Note that merging has occurred during the import process, so each unique variant within the cohort will be a separate record within the variant set, with all calls for that variant nested within the record. For more information see Variant Import merge logic details.
  3. To create an export file similar to a VCF, run a query like the following and materialize the results to a new table. https://github.com/StanfordBioinformatics/mvp_aaa_codelabs/blob/master/sql/multisample-vcf.sql
  4. Export the table to Cloud Storage and then download it to a Compute Engine instance with sufficient disk space.
  5. Use sed or another file editing tool to finish the transformation needed. See also https://github.com/StanfordBioinformatics/mvp_aaa_codelabs/blob/master/bin/bq-to-vcf.py For example:
  • Add the #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT header line.
  • Convert commas to tabs.
  1. Then run Annovar or similar tools on the file(s).
  2. Lastly, import the result of the annotation back into BigQuery for use in your analyses.

Have feedback or corrections? All improvements to these docs are welcome! You can click on the “Edit on GitHub” link at the top right corner of this page or file an issue.

Need more help? Please see https://cloud.google.com/genomics/support.