Wednesday, January 17, 2018

Analyzing Ancestry DNA Matches on a Snowy Day in the South


What is the relationship between the number of matches at Ancestry DNA and the number of matches that are 4th-6th cousins or closer? It seems as if those numbers should correlate, but they don't. Rather the variation seems to be related to ethnicity.




I'm learning a new (sort of free) data analysis tool and, during today's snow holiday, I took the opportunity to experiment with my own data instead of my employer's. Here are my results and thoughts, gathered while ten inches of snow fell and interrupted by a two hour power outage.

A contact told me that a very large number of pages -- a large number of matches -- tend to belong to testers who have heritage from the American South. My heritage is about 30% American South. Of the 14 tests to which I have access, only two have that high a percentage. But others of the tests have higher page counts. One of my in-laws has a normal page count, yet an absurd number of close matches, as shown above.

I used Excel to collect all the statistics. I gathered the ethnicity percentages for each test. The number of close matches came from the main page. Then the fun part was estimating the number of pages, typing the page number into the page number box, hitting enter and seeing what happened. Then paging backward or forward to see the total number of pages. All page counts were rounded up.

In the graph you can see how the number of pages is a fairly close range, but the number of close cousins does not always move in the same direction. Not what you would expect, is it?

What is the ethnic breakdown for the person with so many close cousins? That person is of Hispanic descent, as are some others in the graph. Choosing Europe South, Iberian Peninsula and Native American hits the high points for that person. Now there is a clearer relationship between the number of close cousins and the ethnicity. It seems that this could be due to endogamy in the Hispanic population coupled with a high birth rate and a curiosity about ethnicity within that population.





Just to round out the exercise, here's a similar look at how European ethnicity relates to match counts. Notice that higher Scandinavian ethnicity (darker blue) results in fewer close matches and fewer pages.





It was a good day to spend some time thinking about the characteristics of Ancestry DNA matches and, in the process, to learn more about the tool.

No comments:

Post a Comment