The properly rendered version of this document can be found at Read The Docs.
If you are reading this on github, you should instead click here.
- First, load your data into Google Genomics and export your variants to BigQuery. See Load Genomic Variants for more detail as to how to do this.
- Note that merging has occurred during the import process, so each unique variant within the cohort will be a separate record within the variant set, with all calls for that variant nested within the record. For more information see Variant Import merge logic details.
- To create an export file similar to a VCF, run a query like the following and materialize the results to a new table. https://github.com/StanfordBioinformatics/mvp_aaa_codelabs/blob/master/sql/multisample-vcf.sql
- Export the table to Cloud Storage and then download it to a Compute Engine instance with sufficient disk space.
sedor another file editing tool to finish the transformation needed. See also https://github.com/StanfordBioinformatics/mvp_aaa_codelabs/blob/master/bin/bq-to-vcf.py For example:
- Add the
#CHROM POS ID REF ALT QUAL FILTER INFO FORMATheader line.
- Convert commas to tabs.
- Then run Annovar or similar tools on the file(s).
- Lastly, import the result of the annotation back into BigQuery for use in your analyses.