Wednesday, 14 March 2018

Farewell to Oxford Ancestors

Oxford Ancestors has announced that they will be closing down. There does not appear to have been any official announcement but the following notice from Bryan Sykes, the founder of the company, appears on the website (see also the screenshot above):
Oxford Ancestors is closing down after 18 years. I have enjoyed those years immensely and it has been a rare privilege to have you send me your DNA from all over the world. We started because I wanted people to be able to share in the excitement of the research being done in university laboratories like my own in Oxford but rarely reaching beyond the halls of academe. That has all changed now and cheap DNA tests are widely available, even if their meaning is sometimes dubious. The popularity of ‘ethnic testing’ is a case in point, where even religious persuasion is given a genetic foundation by some companies. Have they never heard of the outrages of ‘racial purity’ and the eugenics movement or is it just one more business opportunity? 
But I digress. Thank you all for your patronage over the years. I am leaving Oxford this Summer to live abroad and write more books and I did not feel the company could be run well like that. 
In practical terms, all outstanding orders will be fulfilled in accordance with our Terms and Conditions and the databases will operate as usual for a few more months. 
Bryan Sykes MA PhD DSc
Oxford Ancestors was launched in May 2000 and was the first UK company to offer genetic ancestry tests direct to the consumer. Family Tree DNA and Gene Tree launched in the US at around the same time. Of these three founding companies, only Family Tree DNA is now still in business.

Oxford Ancestors initially offered a mitochondrial DNA test and later added a Y-chromosome DNA test along with a Male Match service. Both tests were low resolution  – an HVR1 mtDNA test and a 10-marker Y-STR test. Unlike their competitors, Oxford Ancestors did not upgrade their offerings and did not drop their prices as the technology improved.

The current Oxford Ancestors Matriline test costs £199 but still only covers HVR1 (400 bases of the 16569 bases on the mtDNA genome). Family Tree DNA now offers a full mitochondrial sequence test (sequencing all 16569 bases) for US $199 (£142). If you're lucky and you buy the test in a sale, and at a time when the exchange rate is favourable, it's possible to get a full sequence test at FTDNA for just over £100. A full mtDNA sequence test is also available from YSEQ for US $165 (£118) though without the benefit of a large matching database.

The current Oxford Ancestors Y-clan test covers just 26 markers (Y-STRs) which is still insufficient to distinguish between different surname lineages. Family Tree DNA began offering a 37-marker test in December 2003, a 67-marker test in August 2006 and a 111-marker test in April 2011. YSEQ also offers a range of Y-STR panels. It's now also possible to buy comprehensive Y-chromosome sequencing tests such as the BigY from Family Tree DNA and the Y-Elite from Full Genomes Corporation, though the cost of these tests is still beyond the reach of the average genealogist.

Because of the high prices and the low resolution of the Oxford Ancestors tests, the genealogists who had originally started surname projects at Oxford Ancestors gradually migrated their projects to other companies, and mostly to Family Tree DNA. Today FTDNA have a monopoly on surname projects. There are now 9,950 surname projects at FTDNA representing 559,646 unique surnames.

However, despite the limitations of the tests offered by Oxford Ancestors, the company has earned its rightful place in the history of genetic genealogy. Many of the pioneers of the genetic genealogy community were introduced to DNA testing by Oxford Ancestors. Ann Turner, co-author with Megan Smolenyak of Trace Your Roots With DNA, took an mtDNA test with Oxford Ancestors which inspired her to launch the Genealogy DNA list on Rootsweb, the first ever genetic genealogy mailing list.

The groundbreaking paper by Bryan Sykes and Catherine Irven on Surnames and the Y-chromosome  (Am J Hum Genet 2000 66(4): 1417-1419) inspired a number of pioneering genealogists to start DNA projects for their surname. Chris Pomery was the first person in the UK to set up a surname DNA project outside of academia. He started the Pomeroy DNA Project at Oxford Ancestors in September 2000, later transferring to DNA Heritage and then Family Tree DNA. I first heard about DNA testing for genealogy when I joined the Guild of One-Name Studies at the beginning of 2006. I set up my Cruwys DNA Project at Family Tree DNA after hearing Chris Pomery speak about DNA and surnames at a local family history meeting.

The demise of Oxford Ancestors is a timely reminder that nothing lasts forever. In the time I've been involved in genetic genealogy I've already witnessed the demise of three other British companies  – Family Genetics, DNA Heritage and BritainsDNA.  Many companies in other countries have also folded or been taken over. Although the market is now dominated by a few large companies there is no guarantee that any of them will still be here in ten or twenty years' time. Following the LOCKSS mantra (Lots of Copies Keep Stuff Safe), I always recommend getting your DNA in as many different databases as possible. If you've tested at Family Tree DNA make sure you fill out the beneficiary form. If you've tested elsewhere you can share your log in details with a trusted friend or relative to ensure that your DNA record can continue working for you in the long term. It's also important to make sure that you download copies of your DNA results and your raw data. If you're running a DNA project make sure you have downloaded all the project data to your own computer for backup.

I think it's unlikely that anyone is now running a DNA project at Oxford Ancestors but, if you are, you will want to make sure you download all the available data while you have the chance. If you're a member of the Guild of One-Name Studies you can contact our DNA Advisor, Susan Meates, and she will help you to migrate your project to Family Tree DNA. See the DNA section on the Guild website for Susan's contact details.

Thanks to Andrew Millard for alerting us to the news in the ISOGG Facebook group.

Tuesday, 13 March 2018

DNA lectures from the 2017 Institute for Genetic Genealogy now online

The Institute for Genetic Genealogy (i4gg) is a two-day conference held annually in the US. The 2017 conference took place in San Diego, California, in December 2017. Many of the top names in American genetic genealogy, such as Blaine Bettinger and Ce Ce Moore, were presenting at this conference. There were also talks from representatives from the five major testing companies.

 All the sessions were recorded and these recordings are now available to purchase online. You can either purchase the videos individually for $10 each or pay $99 to access all 22 recordings. There are some very interesting talks and I've already bought access to the entire programme and am looking forward to watching them all. To see the full programme and purchase the videos click here.

Monday, 12 March 2018

Genetic Genealogy Ireland Belfast and Game of Thrones

I had a wonderful time in February in Belfast attending the first ever Genetic Genealogy Ireland/Back To Our Past conference in Northern Ireland. It was a great opportunity to meet up with my genetic genealogy friends and we took a few extra days in Belfast to see some of the sights.

The conference was held in the magnificent Titanic Centre which is a fantastic venue in its own right with first-class facilities. We were based in the Titanic Suite on the top floor which features a replica of the Titanic staircase. There were over forty exhibitors though it was a shame that the two biggest genealogy companies, Ancestry and Findmypast, were not represented. Family Tree DNA and MyHeritage DNA were the only two DNA companies present. The feedback from the show was very positive so I'm hoping we'll all be back again next year with an even bigger and better event.

We had an excellent series of genetic genealogy lectures spread out over two days. Family Tree DNA generously provided sponsorship for the DNA lecture area. The lectures were livestreamed in the Genetic Genealogy Ireland Facebook group. Recordings of the talks are now being uploaded to the Genetic Genealogy Ireland YouTube channel. Five talks are currently available and can be viewed from the links below or directly on YouTube. (If you're receiving this via e-mail unfortunately the embedded YouTube links do not work.)

My talk was on some of the mysteries of the Titanic that were solved by DNA.
(Direct YouTube link here

Donna Rutherford gave a fantastic presentation on the genetics of the characters in the Game of Thrones. (Direct YouTube link

Ed Gilbert gave us an update on the Irish DNA Atlas Project.
(Direct YouTube link

Martin McDowell told us about the successful DNA project run by the North of Ireland Family History Society. (Direct YouTube link

Michelle Leonard gave a very useful talk on the practical application of autosomal DNA testing featuring lots of case studies. (Direct YouTube link

Look out for more videos from the conference on the Genetic Genealogy Ireland YouTube channel over the next couple of weeks. See the Genetic Genealogy Ireland blog for the full lecture schedule.

Recordings of the presentation from the October 2017 Genetic Genealogy Ireland conference in Dublin will also be added once the Belfast lectures are all online.

While we were in Belfast a number of us signed up to go on two Game of Thrones tours which provided a great opportunity to see some of the stunning coastal scenery and countryside in Northern Ireland. Both tours were led by extras from the Game of Thrones. They were very entertaining hosts and gave us some fascinating insights into how the shows were made.

The Iron Islands Tour took us north of Belfast along the County Antrim coast up to the Giant's Causeway. The Winterfell Locations Trek allowed us to explore the countryside to the south of Belfast and took us into the Tollymore Forest at the foot of the Mourne Mountains. This location was used in the filming of the first ever episode of the first series of Game of Thrones. The forest also provided the inspiration for the Narnia books by C. S. Lewis. These were my all-time favourite books as a child. If you ever get the chance to visit Belfast I can highly recommend both of these tours.

I wasn't previously very interested in watching Game of Thrones but now, inspired by my trip, I'm determined to try and catch up on the DVDs. My husband is a great fan of the programme but had failed to convince me to watch it.

After the conference we had a free day to explore the exhibition at the Titanic Centre.

I've shared below a selection of the photographs I took in Dublin which I hope will give you a sense of all the fun we had and what a wonderful country Northern Ireland is.

The magnificent Titanic Centre, the setting for Genetic Genealogy Ireland Belfast

The Titanic Centre viewed from the harbour.

The entrance to the Titanic Centre.

Carnlough Harbour was the location used for Braavo's Canal in the Game of Thrones.

Daenerys Targaryen (aka Linda) and Catelyn Stark (aka Katherine) at Carnlough Harbour.
These steps feature in a famous scene in the Game of Thrones involving a character called Arya

The Caves of Cushendon. The cave on the left is now better known as Melisandre's Cave
 after the famous smoke baby scene in the Game of Thrones

The beautiful Antrim coast.

Carrick-a-Rede rope bridge

Catelyn Stark (aka Katherine), Daenerys Targaryen (aka Linda) and Melisandre (aka Donna)
braving the cold near Carrick-a-Rede rope bridge.

Rathlin Island can be seen in the distance. The first ever ancient DNA paper from Ireland included three Bronze Age samples from Rathlin Island (See Cassidy et al 2016.)
A disused quarry at Larrybane which was used as Renly's Camp in Game of Thrones

Genetic genealogists having fun dressing up on Ballintoy Beach
 which featured in a number of scenes in the Game of Thrones

The Giant's Causeway

Bregagh Road in Ballymoney now better known as the Dark Hedges from the Game of Thrones 

Melisandre (aka Donna), Daenerys Targaryen (aka Linda) and Catelyn Stark (aka Katherine) at the Dark Hedges

Martin McDowell tells us about the North of Ireland Family History Society's DNA Project

Michelle Leonard explains how autosomal DNA tests work

Brad Larkin on DNA clans and the monarchy

The Family Tree DNA stand at Back To Our Past

The MyHeritage stand at Back to Our Past

The magnificent replica Titanic staircase in the Titanic Suite

Professor Jim Mallory spoke about the Origins of the Irish

Ed Gilbert gave us an update on the Irish DNA Atlas Project

Donna Rutherford dressed up as Melisandre to talk about the genetics of the Game of Thrones

Melisandre (aka Donna), Daenerys Targaryen (aka Linda) and Catelyn Stark (aka Katherine)
helped to answer questions about DNA testing in our panel session!

A gathering of genetic genealogists on the Titanic staircase

Belfast city centre and the River Lagan at night


Castle Ward was transformed into Winterfell for the Game of Thrones with a lot of CGI wizardry

The Dire Wolves from the Game of Thrones

Inch Abbey was the setting for Rob Starrk's camp in Game of Thrones

The view from Inch Abbey

Tracy, Linda, Debbie and Donna having fun dressing up at Inch Abbey

Maurice, Katherine and Linda in full battle dress 

The Tollymore Forest at the foot of the Mourne mountains
The SS Nomadic was specially built to transfer White Star passengers onto the Titanic at Cherbourg.
The fully restored ship is now on display in the Titanic Quarter in Belfast

A big fish spotted outside the Harbour Commissioner's Office

The oldest building in Belfast

Samson and Delilah, the giant Harland and Wolff cranes, dominate the Belfast skyline

Sunday, 4 March 2018

MyHeritage DNA updates announced at Rootstech

There has been a lot of exciting news from MyHeritage DNA at Rootstech in the last few days.

For genetic genealogists the most important announcement is the launch of a major upgrade to the chromosome browser and the ability to download our match lists and matching segment data.

MyHeritage very helpfully provides a country flag so that you can see where the person lives and this information is included in the downloaded match list. This will be a very useful way of filtering our matches so that we can focus on the matches who live in the countries where we stand the best chance of finding the genealogical connections. I currently have 2157 matches at MyHeritage. 1214 of those matches are in the US. However, I'm very encouraged to see that I have 232 matches from Great Britain, 72 from Australia, 14 from Ireland and 27 from New Zealand. More surprisingly, however, I also have matches from people living in countries where I wouldn't expect to have genealogical connections, such as Belgium, Germany, the Netherlands, Norway,  Portugal, Spain and Switzerland.

Many of the low confidence matches are likely to be false matches. Of my my 2063 matches at MyHeritage 768 match my dad and 699 match my mum. This means that 71% of my matches match one of my parents, but conversely 29% of my matches don't match either of my parents and are therefore not likely to be true matches.

Of the matches that are real, I suspect that the vast majority of matches with Americans and continental Europeans are likely to be very distant and a reflection of our shared European ancestry within the last one thousand years rather than having any genealogical significance.

The upgraded chromosome browser now allows us to compare up to seven people at a time. The browser includes a feature that allows us to see which segments triangulate with each other. This is what is known as true triangulation. For segments to truly triangulate they must not just overlap in the browser but each person in the group must also match each other.

Here's a screenshot of the new one-to-many chromosome browser feature. I am the focus person and I'm doing a comparison with my top match at MyHeritage and my dad.

If I scroll down and look at the match in the chromosome browser I can see the regions of each chromosome that we share in common, along with details of the amount of sharing.

You can add additional people to the group to see if they also triangulate on the same segment but you need to add people one at a time and check each triangulation individually because everyone in the group has to match each other. If just one person in the group doesn't match the others then the circle around the segment will not appear.

In my initial exploration of the new MyHeritage features I've noticed that a lot of my matches all seem to pile up on the same segment. A number of us have noticed big pile-ups at the start of chromosome 15 (see this discussion in the Genetic Genealogy Tips and Techniques group on Facebook). I've found that, while some of the people who appear to match on chromosome 15 don't triangulate and the matches are probably not real because of the low SNP count, there are other matches that do triangulate. I'm not certain at present if these matches will be worth pursuing.

I've also got another problematic area on chromosome 3 where 15 people triangulate on the same segment. Seven of these people triangulate with me and my dad on a 14.6 segment which contains 5162 SNPs.

The remaining eight people in this group triangulate on a smaller portion of this segment. One person in this triangulated group is in the Netherlands and the remaining 14 people are all in the US. As all my ancestry is from the British Isles this is clearly not a genealogically relevant match.

We will need to be very careful when drawing conclusions about triangulated segments viewed in the chromosome browser. It's not just the seven matches you can see in the chromosome browser that you need to consider but also how many other people share the segment. The more frequent the segment is in the population the less likely it is to fall within a meaningful genealogical timeframe. I hope that MyHeritage might consider adding an algorithm along the lines of AncestryDNA's Timber algorithm which would downweight matches on portions of the genome which are prone to over-matching.

For background reading on the subject of triangulation see my two previous blog posts:
For more details of the upgraded chromosome browser feature see the official blog post from MyHeritage DNA:

Other news from MyHeritage

MyHeritage has announced the launch of DNA Quest, an initiative which will provide 15,000 free DNA kits to adoptees. DNA Quest is an expansion of a previous project which provided free kits to reunite adoptees from the Israeli Yemenite community. The programme is currently restricted to US residents, but I'm hoping that it will one day be possible to expand the programme to other countries in need such as Ireland.
MyHeritage has also announced the release of a number of new data collections including the 1939 Register, which is an important resource for tracing twentieth century ancestors in England and Wales and serves as a census substitute. The 1939 Register was previously only available on Findmypast. It is available at MyHeritage with a data subscription.
MyHeritage scientists have published a groundbreaking paper in the prestigious journal Science. They used public family trees at to explore migrations and longevity.
MyHeritage's Chief Scientific Officer Yaniv Erlich gave a very interesting presentation at Rootstech about the MyHeritage DNA test. The recording is now available on the Rootstech website:
101 is an American term for a lecture aimed at beginners but this was actually quite a technical lecture looking at how the MyHeritage matching process works and included useful explanations about phasing, imputation and stitching.

MyHeritage have been very responsive to feedback from the genetic genealogy community. I'm excited to see all these new developments and I look forward to many more updates in the coming year. If you're not yet in the MyHeritage DNA database but you've tested elsewhere you can currently do a free transfer using this link.

Update 6th March 2018
The lunchtime talk given by Gilad Japhet at Rootstech Perspectives on combining genealogy and genetics on  is now available on the Legacy Family Tree webinar website.

Friday, 2 March 2018

DNA interviews at Rootstech

Updated 5th March 2018.

Jill Ball has been out and about at Rootstech interviewing some of the speakers and representatives from the various companies and sharing them on her YouTube channel. She has a very interesting interview with Jonny Perl who developed the wonderful new DNA Painter website. Jonny was the very worthy winner of Rootstech's Innovator Showdown Contest.  Remarkably he only took his first DNA test in December 2016! You can watch the interview below or direct on YouTube.


Jill Ball has also interviewed Hannah Morden of Living DNA at Rootstech. You can watch the interview below or on YouTube.

See also my blog post from yesterday on Living DNA's new Family Networks feature.

Here is an interview with CeCe Moore, the genetic genealogist on the US TV programme Finding Your Roots. The direct YouTube link can be found here.

Melanie McComb, the Shamrock Genealogist, has also done a very interesting interview with CeCe, touching on some of the ethical implications of adoption searches. The interview is available on Twitter via this link. (You shouldn't need a Twitter account to watch the interview.)

If you want to keep up with what's going on at Rootstech Randy Seaver is maintaining a useful compilation of blog posts from people who are at the event. He's also written a very helpful post on how to download the free handouts from the various talks.

Louis Kessler is doing an excellent job of keeping track of events from afar. Check out his posts:
You can watch the Rootstech livestream here.

If you missed the livestream you can watch the recordings here.

Thursday, 1 March 2018

New Family Networks feature from Living DNA

Yesterday at Rootstech, Living DNA provided a sneak preview of Family Networks, their long-anticipated relative-matching system. It is described as a "new DNA-driven matching system and family tree reconstruction method". You can find out more in the video below (also available on YouTube.)

Family Networks is now in private beta-testing and will be in open beta in the third quarter of this year when it will become available to all existing and new Living DNA users. I've been sent a few screenshots which I've reproduced below.

Here is a tree view.

This is the chromosome browser.

This is what the match list will look like.

Here is the official press release I received from Living DNA.
Innovative family tree and matching system will take the guesswork out of DNA relationships

Living DNA, the global consumer genetics company, has today publicly previewed its new ‘Family Networks’ platform for the first time – set to be the most precise DNA-driven matching service on the market.

Officially unveiled in Salt Lake City in Utah at RootsTech 2018, the world’s largest family-history technology conference, Living DNA’s Family Networks requires no prior user-generated family research, allowing users to build a detailed family tree based solely on their DNA, gender, and age.

Living DNA will analyse a user's unique motherline and fatherline DNA data (mtDNA and YDNA), on top of the family ancestry line (autosomal) to deliver matches – something no other company can do.

David Nicholson, managing director and co-founder at Living DNA, comments:

“With Family Networks, we will not only predict how users are related to direct matches, but we can also find and connect people to DNA matches going back up to 13 generations.

“The technology behind Family Networks automatically works out which genetic trees are possible to uncover relations. This new capability offers distinct benefits to a range of users, from avid genealogists and family history hobbyists through to adoptees and others searching for their family members. It will reduce the risk of human error and take away the tedious task of figuring out how each person in a user’s list are related to one another. We’re truly taking the guesswork out of DNA relationships.”

Living DNA’s Family Networks is scheduled to be made available to all existing and new Living DNA users by autumn 2018. The company states that the cutting-edge technology will give all customers – even those who upload from other DNA testing sites – a level of relationship prediction and accuracy that is beyond anything currently on the market.

David Nicholson adds:

“Living DNA’s precise and unique technology processes users’ DNA to identify relatives and define relationships deeper back in time. Through this rich experience, users will even be able to learn how they’re related to people with whom they share no DNA today.  
“As we don’t ask for Gedcom files or other user research to build a family tree, Family Networks can be especially useful for adoptees and family searchers who are trying to locate long-lost family members but who don’t have any information on their biological family. Just by using their gender and date of birth in conjunction with their DNA, we will be able to translate their matches into a potential family tree, giving them a clearer place to start from.”

Living DNA breaks down users’ DNA into 80 worldwide regions, including 21 in the UK, more than any other testing company. The company offers a 3-in-1 test as standard: from a simple mouth swab,

Living DNA not only covers a user’s family line ancestry, but—unlike most other tests—it also includes the user’s motherline and (if male) fatherline ancestry.

Living DNA’s test itself is run on a custom-built Living DNA Orion Chip. It is one of the first bespoke DNA chips in the world to be built using the latest GSA technology from market leader Illumina, and tests over 656,000 autosomal (family) markers, 4,700 mitochondrial (maternal) markers and 22,000 Y-chromosomal (paternal) markers.
There are a few additional details in a slightly different press release which appears on the Living DNA website. The relevant text is reproduced below.
Free DNA-Driven Family Tree Reconstruction and Matching System Method Offers Greater Accuracy Than Competing Products, Takes Guesswork Out of DNA Relationships. 
SALT LAKE CITY, Utah – Feb. 28, 2018 – Living DNA, the global consumer genetics company, today announced it will preview “Family Networks”—a new DNA-driven matching system and family tree reconstruction method—at RootsTech 2018, the world’s largest family-history technology conference taking place Feb. 28 – March 3 in Salt Lake City, Utah. Requiring no prior user-generated family research, Living DNA’s family reconstruction tree method is based solely on users’ DNA, gender, and age. Unlike competing organisations, Living DNA’s Family Networks will provide the most precise matching service on the market by analysing a user's unique motherline and fatherline DNA data (mtDNA and YDNA), on top of the family ancestry line (autosomal). 
With Family Networks, we not only predict how users are related to direct matches, but we can also infer through DNA up to 13 generations back to connect matches with whom they share no DNA with today,” said Living DNA co-founder and Managing Director David Nicholson. “The technology behind Family Networks runs through millions of ways in which users in the network are related and automatically works out which genetic trees are possible. This new capability offers distinct benefits to a range of users, from avid genealogists to family history hobbyists, to adoptees and others searching for their family members. It will reduce the risk of human error and support the task of figuring out how each person in a user’s list are related to one another. 
Family Networks will go into private beta in Q2 and open beta in Q3 2018 where it will be available to all existing and new Living DNA users. The unique computation this feature provides gives customers - even those who upload from other DNA testing sites - a level of relationship prediction and specificity beyond anything currently on the market. Where competing offerings rely solely on time-consuming and often error-prone user research, Living DNA’s amazing power tools process users’ DNA to identify relatives and define relationships deeper back in time. Through this extremely rich experience, users can even learn how they’re related to people with whom they share no DNA today. 
Users need to only provide their gender and birthdate for Living DNA to build a family tree that shows where their matches fit into their family tree, with no need of Gedcom files or any other user input. This can be especially useful for adoptees and family searchers who are trying to locate long-lost family members but who don’t have any information on their biological family, Living DNA can translate their matches into a potential family tree, giving them a clearer place to start from.
I strongly believe that genetic networks are the future of genetic genealogy so I'm excited to see that Living DNA have developed this new feature. It will be interesting to see how it works out in practice.

See also

Monday, 22 January 2018

Small segments and pile-ups - a visualisation

We've recently been discussing the problem of pile-ups in the All Genetic Genealogy group on Facebook. A pile-up is a term used in genetic genealogy to describe multiple shared autosomal DNA segments that are stacked up on top of each other on the same part of the genome. The presence of a pile-up should be considered as a warning sign. For any shared segment to have genealogical significance we would expect it to be shared only with descendants of the common ancestral couple. If we share a segment with hundreds or thousands of people it is extremely unlikely that we will  share that section of DNA by virtue of a recent genealogical relationship within the last ten generations or so, and it is much more likely to be indicative of a false match or a more distant relationship.

Pile-ups can occur for a number of different reasons:
  • Lack of phasing. Phasing is the process of sorting the DNA letters (the As, Cs, Ts and Gs) onto the paternal and maternal chromosomes. AncestryDNA and MyHeritage now used phased matching which means that they phase our genotypes before trying to identify shared sections of DNA. 23andMe and Family Tree DNA use a process of half-identical matching. Our DNA is not phased but instead the algorithms zigzag backwards and forwards across two columns of unsorted DNA letters looking for consecutive runs of matching SNPs. Half-identical matching works well at identifying large shared segments of DNA but is less successful on smaller segments, and particularly segments under about 10 centiMorgans (cMs) in size. if a match does not survive phasing it is a false match.
  • SNP-poor regions. The autosomal DNA tests used for genetic genealogy provide information on between 630,000 and 700,000 genetic markers known as SNPs (single nucleotide polymorphisms) which are scattered across the genome. These SNPs are only a tiny fraction of the three billion letters which make up the human genome, but the SNPs are specially selected for being the most informative about variations within and between populations. When trying to identify shared regions of the genome the companies are looking for long runs of consecutive SNPs that are the same (identical by state or IBS) in two individuals. Segments which pass the companies' matching thresholds are declared to be identical by descent (IBD) and are possibly indicative of shared ancestry in a genealogical timeframe. Some companies will also apply additional algorithms to filter out known problematic regions which are unlikely to be IBD. However, because not all of our SNPs are being tested, the length of a segment can be falsely inflated. One hypothesis is that lots of small segments can become conflated into longer segments. (1) This problem is particularly likely to occur in sections of the genome which have poor coverage on the chips. (2) 
  • Excess IBD. This is a term used to describe sections of the genome which are known to be widely shared in humans or in certain populations. Such regions often offer some type of evolutionary advantage. For an overview of known excess IBD regions see the section on excess IBD sharing in the ISOGG Wiki article on IBD. In addition to looking at the size of a shared segment, some IBD detection algorithms will, therefore, also take into account the frequency of the segment. (3) The more people who share a segment, the older it is likely to be. AncestryDNA apply their proprietary Timber algorithm to phased segments and they downweight the cM count for segments that are widely shared in their database. (4)
Each individual has their own personal pile-ups. It can be instructive to map out your pile-ups so that you are aware of your own danger zones. I've previously used Don Worth's ADSA (autosomal DNA segment analyser) tool which is available from DNAGedcom to look at my pile-ups. I've also use the matching segment search at GEDmatch (this tool is available to Tier 1 subscribers). (5)  These tools are very useful for identifying problems in specific regions but it 's difficult to get a good idea of the bigger picture.

Following on from our discussion in the All Genetic Genealogy Facebook group, Dan Edwards has been working on an exciting tool to provide a new way of visualising pile-ups. It's possible that the tool will eventually be made available on the web but for the moment it is a bespoke service. Dan has been experimenting on some of my data. He has produced for me some charts showing the distribution of shared segments across my 22 autosomes and on the X-chromosome. Dan has kindly given me permission to share my charts which are reproduced below.

The charts are based on my Family Finder chromosome browser data from Family Tree DNA. FTDNA updated their match thresholds in May 2016, but they are still the only company that continue to include small segments under 6 cMs when inferring a relationship. It is generally accepted by genetic genealogists that the use of such small segments is problematical. (6)

The problem with small segments can be clearly seen in the charts below. Rather than being distributed evenly across my genome, the smaller shared segments form huge spires and skyscrapers. As the segment size increases the pile-ups are greatly reduced, but there are still some parts of my genome which have some quite sizeable pile-ups on segments over 10 cMs in size. Chromosomes 9, 14, 18 and 19, in particular, seem to have a few problem areas which it is probably best for me to avoid. As more matches come in, these spires and skyscrapers can be expected to grow even more. Remember too that FTDNA only reports "matches" on small segments if the match thresholds have already been met. If matches were reported on all matches in the database down to 1 cM it's likely that the spires would be even more pronounced.

If Dan is able to develop his tool further and make it more widely available it will be interesting to see how other people's pile-ups compare with mine. I hope that we might also be able to identify a reason for some of the pile-ups. In the meantime I hope you enjoy looking at my pictures.


(1) See: Chiang CWK, Ralph P, Novembre J (2016). Conflation of short identity-by-descent segments bias their inferred length distribution. G3 Genes Genomes Genetics 6: 1287.

(2) For a useful overview of SNP coverage on the chips used by AncestryDNA and 23andMe see Rebekah Canada's series of articles on the subject of exploring microarray chips.

(3) For a good overview of the methodology of IBD detection see Browning and Browning (2012):  Identity by descent between distant relatives: detection and applications (Annual Review of Genetics 2012; 46: 617-33). The authors state: "The key idea behind IBD segment detection is haplotype frequency. If the frequency of a shared haplotype is very small, the haplotype is unlikely to be observed twice in independently sampled individuals, so one can infer the presence of an IBD segment. This criterion can be applied in several ways. The first is length of sharing, which is a proxy for frequency. If two densely genotyped haplotypes are identical at all or most (allowing for some genotyping error) assayed alleles over a very large segment of a chromosome, then the haplotypes are likely to be identical by descent across the whole segment. The second is direct use of haplotype frequency: Shared haplotypes with estimated frequency below some threshold are determined to be identical by descent. The third makes use of a population genetics model to infer probability of IBD. Given the frequency of the shared haplotype and a probability model for the IBD process along the chromosome, one can estimate the probability that the individuals are identical by descent at any position on the segment."

(4) For a good explanation of how the AncestryDNA algorithm works read the blog post by Julie Granka on Filtering DNA matches at AncestryDNA with Timber. Take a look in particular at the figure in that blog post. Although the majority of phased segments filtered out by Timber are smaller segments under 15 cMs, note that it also downweights some larger segments up to 50 cMs in size.

(5) Peter Alefounder has developed a tool known as the Geneal Segment Stacker but I've not yet had time to play around with it. There are further details in this thread in the ISOGG Facebook group.

(6) For an excellent summary on the current state of our knowledge on the subject of small segments see the blog post A small segment round up by Blaine Bettinger.

Further reading