Instructions for converting NatGeno raw data

Disclaimer: I do not have any affiliation with GEDmatch and thus am not responsible for anything related to that website (furthermore, they have their own disclaimers). Please note that if using GEDmatch, you are uploading your own personal genomic data to a third-party website, and there is of course inherent risk in that (which GEDmatch outlines as well in their disclaimer). I also do not have any professional affiliation with or financial interest in National Geographic or their National Genographic project.

On a Mac or Linux computer, unzip the raw data you downloaded from the Genographic website. From this unzipped folder, you only want the .all.csv file. Rename this .all.csv file to "genome.csv".
Put this “genome.csv” file in the same folder as the 2 files downloaded from this website (the Script and Supporting File). If you are unfamiliar with command line interfaces, it will be easiest to simply put the three files into a folder on the Desktop. Make sure the Script is named "Script_GEDmatch.sh" and the Supporting File is named "Genographic_SNP_Master_List".
Go into the "Script_GEDmatch.sh" file. Open it up in TextEditor, TextWrangler, Xcode, or whatever text editor you have. Avoid opening it up in Word as this might add odd formatting to the file.
In the "Script_GEDmatch.sh" file, search and replace (or better yet, replace all) "123456" with another 6-digit number; there are 4 total instances of this in the document. It can be replaced with whatever number. This will be the "kit number" in GEDmatch that refers to your data. If your upload to GEDmatch does not work, that might be because that number was already taken by someone else's kit; if so, go back to this step and change it to another 6-digit number and repeat all the following steps after this.
Open up the command line terminal, which you can usually find (on a Mac) by clicking on the Spotlight on the upper right hand corner of the screen and typing in “Terminal”.
Within the terminal, go to the directory that has your three files. If you are unfamiliar with command line interfaces, you can simply put the three files into a folder on the Desktop, and enter "cd Desktop/BLANK" in the terminal, with BLANK being the name of the folder on the Desktop.
Enter "bash Script_GEDmatch.sh" in the command line terminal. It should only take a few seconds (no more than a minute) and you will have two outputs, both .gz files. You will also still have your original three files.
Go to www.GEDmatch.com and open a new account. Go into your account, and in the upper right hand corner, there will be the upload options. You will be uploading your data as an FtDNA kit. In order to do that, you must upload both .gz files; one is autosomal and the other is X (you should know which is which based on the names of the files). There are two separate upload links, and you have to go through both of them and upload each file separately in order to have your data uploaded. Other than that, the website should be pretty self-explanatory as to what to do.
Hopefully, from here, the website should say the upload is finished and successful, at which point you can explore the website. The upload process should not take too long, probably a few minutes (depending on internet speed). Batch processing will take several days, which you will have to wait on in order to do the family finder type of functions, but you should be able to do admixture and other analyses immediately.

Xander Xue

Simons Center for Quantitative Biology

Cold Spring Harbor Laboratory

One Bungtown Road

Cold Spring Harbor, NY 11724

xanderxue at gmail dot com

Alexander T. Xue

Research Publications Curriculum Vitae Software