Perform Quality Control on Variants

The properly rendered version of this document can be found at Read The Docs.

If you are reading this on github, you should instead click here.

There are a collection of quality control checks for variants documented in codelab Quality Control using Google Genomics. The methods include:

  • Sample Level
    • Genome Call Rate
    • Missingness Rate
    • Singleton Rate
    • Heterozygosity Rate
    • Homozygosity Rate
    • Inbreeding Coefficient
    • Sex Inference
    • Ethnicity Inference
    • Genome Similarity
  • Variant Level
    • Ti/Tv by Genomic Window
    • Ti/Tv by Alternate Allele Counts
    • Ti/Tv by Depth
    • Missingness Rate
    • Hardy-Weinberg Equilibrium
    • Heterozygous Haplotype

These methods were co-developed with researchers working on the Million Veterans Program data. For more detail, please see the paper and diagram of their full pipeline with some additional quality control checks on github.

To make use of this codelab upon your own data:

  1. First, load your data into Google Genomics and export your variants to BigQuery. See Load Genomic Variants for more detail as to how to do this.
  2. Each section of the codelab discusses how to run that part on your own data. For example, update the BigQuery table name in Part 1: Data Overview

Have feedback or corrections? All improvements to these docs are welcome! You can click on the “Edit on GitHub” link at the top right corner of this page or file an issue.

Need more help? Please see