Personal Genome Project Data

The properly rendered version of this document can be found at Read The Docs.

If you are reading this on github, you should instead click here.

This dataset comprises roughly 180 Complete Genomics genomes. See the Personal Genome Project and the publication for full details:

Ball MP1, Thakuria JV, Zaranek AW, Clegg T, Rosenbaum AM, Wu X, Angrist M, Bhak J, Bobe J, Callow MJ, Cano C, Chou MF, Chung WK, Douglas SM, Estep PW, Gore A, Hulick P, Labarga A, Lee JH, Lunshof JE, Kim BC, Kim JI, Li Z, Murray MF, Nilsen GB, Peters BA, Raman AM, Rienhoff HY, Robasky K, Wheeler MT, Vandewege W, Vorhaus DB, Yang JL, Yang L, Aach J, Ashley EA, Drmanac R, Kim SJ, Li JB, Peshkin L, Seidman CE, Seo JS, Zhang K, Rehm HL, Church GM.
Published: July 24, 2012
DOI: 10.1073/pnas.1201904109

Google Cloud Platform data locations


Google Genomics variant set for dataset pgp_20150205: 9170389916365079788 contains:


Google is hosting a copy of the PGP Harvard data in Google Cloud Storage. All of the data is in this bucket: gs://pgp-harvard-data-public

If you wish to browse the data you will need to install gsutil.

Once installed, you can run the ls command on the pgp bucket:

$ gsutil ls gs://pgp-harvard-data-public
....lots more....

The sub folders are PGP IDs, so if we ls a specific one:

$ gsutil ls gs://pgp-harvard-data-public/hu011C57/

And then keep diving down through the structure, you can end up here:

$ gsutil ls gs://pgp-harvard-data-public/hu011C57/GS000018120-DID/GS000015172-ASM/GS01669-DNA_B05/ASM/
... and more ...

Your genome data is located at: gs://pgp-harvard-data-public/{YOUR_PGP_ID}

If you do not see the data you are looking for, you should contact PGP directly through your web profile.

Have feedback or corrections? All improvements to these docs are welcome! You can click on the “Edit on GitHub” link at the top right corner of this page or file an issue.

Need more help? Please see