Sunday 17 November 2013

The Y-chromosome sequence interpretation service from YFull.com

This article is for advanced genetic genealogists who have had their Y-chromosome sequenced or who are interested in doing so.

With the forthcoming SNP tsunami, the analysis and interpretation of the Y-chromosome results provided by the various companies will be one of the key determining factors in the success of their products. Fortunately within the genetic genealogy community we have a number of intrepid pioneers who have volunteered to serve as guinea pigs by testing at all the companies so that we will eventually be able to do comparisons between all the products. David Hollister, who runs the Hollister one-name study and is the co-administrator of the Hollister DNA Project, is one of our brave guinea pigs. He has already had his Y-chromosome sequenced with Full Genomes Corporation. He has previously tested with the Genographic Project, and has had STR testing at Family Tree DNA. David is now waiting for his results from the Chromo 2 test from BritainsDNA and the BIG Y test from Family Tree DNA. Another genetic genealogist Itaï Perez has already provided a comprehensive look at the Full Genomes Y-sequencing results in a guest post on CeCe Moore's blog so I see no point in covering the same ground. However, David has recently submitted his Full Genomes data to another service by the name of YFull.com for an alternative interpretation. David was really excited by his results and was so "blown away" by the reports he received from YFull that I asked him if he might be able to share some screenshots so that other genetic genealogists might get a feel for what to expect from this service. David has very kindly agreed and has also obtained the consent of the YFull team for me to publish these screenshots. You will need to click on each image to see larger versions of the screenshots.

This is David's home page on his YFull account. Note that according to YFull there are 41,828 known Y-SNPs and 478 short tandem repeats (Y-STRs).

This report shows David's position on the Y-haplotree and his results for all the SNPs tested on his branch of tree. Separate reports are available for "controversial" SNPs and no calls.

This report provides a list of private and unknown SNPs. 247 private and unknown SNPs were found in David's sequence: 66 were deemed to be of best quality, 10 were of acceptable quality, and 13 were of low quality. For 111 SNPs only one reading could be obtained. A temporary internal ID system is used to identify the private SNPs and they all bear the prefix YFS, an abbreviation for YFull Singleton.

This report shows results for the Indels. Indel is the term used to describe insertions and deletions - positions in the sequence where extra As, Cs, Ts and Gs have been inserted or where they are absent.

There is a handy SNP index that allows you to query your results by SNP name.

Here is the report showing results for the 478 STRs tested.

This pie chart shows the percentage of "good" and "uncertain" alleles. 90.2% of the alleles were classified as "good". Note that next generation sequencing with a read length of 100 bps does not pick up some of the longer STRs in the sequence.

YFull have recently introduced a group feature. There are currently groups available for haplogroups R1a and G2a.

YFull are based in Moscow in Russia. They are currently providing a free service for a limited period, but I understand that they will at some point start charging a small fee. They are able to use data for any Y-chromosome which has been sequenced at a minimum 25X coverage and with a read length of at least 100 base pairs. Data needs to be provided in the form of a BAM file. If you have tested with Full Genomes they will provide you with your BAM file on request. Results are not yet available from Family Tree DNA's BIG Y test but I understand that they will also make the BAM files available. It remains to be seen what level of analysis and interpretation FTDNA will provide.

We can expect the interpretation of Y-chromosome sequencing results to change over time as our knowledge improves, and as more comparative results become available. In the meantime YFull certainly provides an interesting complement to the service provided by Full Genomes. No doubt we can expect other similar services to appear on the scene in the coming months as more sequences become available.

See also
- ISOGG Y-DNA SNP testing chart
- The new Big Y test from Family Tree DNA
- A confusion of SNPs
- A simplified Y-tree and a common standard for Y-DNA haplogroup and SNP nomenclature 

© 2013 Debbie Kennett

2 comments:

Kelly said...

Debbie,
Super cool. I wonder whether this will ultimately be the place to compare cross platform (like GEDMATCH) but for the Y. this could include all matter of y data including Full Y, Big Y, GENO 2.0, CHROMO 2.0 etc.

I know that Justin and Greg are planning upgrades to their site but have been busy getting their labs processing smoothly.

It certainly is an exciting time in the Y world.
Kelly

Debbie Kennett said...

Kelly

It could well develop into a useful site to compare Y sequence results across different platforms.

The competition between Full Genomes and FTDNA can only benefit the consumer in the long run as they are both spurring each other on.

It is certainly a very exciting time in the Y world. We're entering completely uncharted territory. It's rather like the Y-chromosome equivalent of exploring the New World and not knowing what you will find there!