Thursday, November 24, 2016

SQLite Queries for AncestryDNA Helper


All technical content can now be found on a new blog named SQLite for AncestryDNA Helper.

The previous post showed how to install and use the SQLite Manager plug-in for Firefox to access the AncestryDNA Helper database. Now that you've installed the plug-in, here are a couple of queries to use.

The first step is to look at the table named tests. Use the Browse & Search tab to look at the information. You need to know the exact spelling of each of your tests. I have four test kits in this particular database and for this post, I'll use only my kit as an example.

Now that I have the exact spelling for the kit name, let's do some data mining.




Click into the Execute SQL tab. There will be a starter SQL statement in the SQL box. Just clear it out.





Counting Surnames


In this first example, I'm counting all the A surnames in the ancestors of my matches, sorting them with the largest number at the top and exporting them to a CSV that can be opened in Excel.

Paste the SQL into the box. It doesn't need to all be showing. Click on Run SQL and wait for the result.

Using the Actions button you can then view the result (as shown) in a table format or in the raw CSV format. When you are happy with the result, you can save the result in CSV format to a file so that you can work with it further in Excel or your favorite spreadsheet program.

All other queries are run in the exact same way.




Here is the SQL query:




-- Match ancestor count by surname, one letter of the alphabet
-- Column headings: "Kit","Match Surname","Number of Ancestors"
SELECT    tests.name as "Kit",
    UPPER(surname) as "Match Surname",
    count(*) as "Number of Ancestors"
FROM ancestors
    join tests on ancestors.testid = tests.id
where tests.name= "Elizabeth L Richards"
   and surname like "A%"
  and matchid not in
    (select id from tests)
group by     tests.name,
    UPPER(surname)
order by    count(*) desc,
    UPPER(surname) asc



Notice the colored and bolded text. Replace my test name (in red) with your test name. The letter A can be any letter of the alphabet. If you remove the entire black bolded line, every surname will be counted instead of just one letter. The brown section prevents my other three tests from being included in the counts. That section can be removed if you want to count ancestors from other kits you have in your database.


Mining One Surname


Having chosen the Alexander surname to research further, I want to dig into those ancestors of matches. This query will return all the ancestors except those of my immediate family.





Here is the SQL query:



-- Match ancestors for one kit and one surname
--Column headings: "Kit","Match Surname","Match Full Name", "Born","Died","Metaphone","Predicted Relationship","Confidence","Hint","Private","cM","Segments","Match Admin","Match Name","Note","Ethnic Regions","Ethnic Trace Regions"

SELECT    tests.name as "Kit",
    UPPER(surname) as "Match Surname",
    fullname "Match Full Name",
    IFNULL(born,"") as "Born",
    IFNULL(died,"") as "Died",
    metaphone as "Metaphone",
    range as "Predicted Relationship",
    confidence as "Confidence",
    case hint when 1 then "Yes" else "" end as "Hint",
    case private when 1 then "Yes" else "" end as "Private",
    IFNULL(centimorgans,"") as "cM",
    IFNULL(segments,"") as "Segments",
    matches.admin as "Match Admin",
    matches.name as "Match Name",
    note as "Note",
    ethnicregions as "Ethnic Regions",
    ethnictraceregions as "Ethnic Trace Regions"
FROM ancestors
    join tests on ancestors.testid = tests.id
    join matches on ancestors.testid = matches.testid and ancestors.matchid = matches.matchid
where tests.name= "Elizabeth L Richards"
   and UPPER(surname) = UPPER("Alexander")
   and range not like "%mmediate family%"
order by     UPPER(surname),
    fullname,
    born,
    died



Notice the red and bolded text. Replace my name with your test name. Replace the surname Alexander with any surname. The bolded line causes ancestors of immediate and close family to be ignored. If you remove the entire bolded line, the ancestors of those immediate family members will be included in the list.

There are many more queries that can (and should) be constructed. These two are intended as a beginning to assist with the current problems with downloading ancestors.

4 comments:

  1. I'm trying to use these files with the DNAGedmatch triangulation tool. https://www.dnagedcom.com/Ancestry/LoadAncestry.aspx It wants a "Matches for" file and an "Ancestors for" file. The "Matches for" file works file when I put it in there, but I'm getting an error when I put the "Ancestors for" file that I'm generating using the first SQL query you have in your blog post. (I'm changing the kit name to my kit name, and I'm deleting the bolded line that restricts it only to ancestors beginning with the letter "A", so it should be creating a .csv of all ancestors.)

    Are you familiar with this particular triangulation tool? Is there a different query I should use when creating the .csv file?

    ReplyDelete
    Replies
    1. Excellent question. These queries don't support that tool. I'll work on creating those queries next.

      Delete
  2. Elizabeth,
    Thank you so much for this help and for creating and sharing your queries. Would you know how to create a query for the "Shared Matches" on Ancestry?
    Many thanks!
    Deb

    ReplyDelete
    Replies
    1. Deb, I sure wish that were possible. The AncestryDNA Helper database doesn't capture that information and also doesn't capture locations.

      Delete