Tip 12: Why Use Spreadsheets
When you first open up a kit and Ancestry shows you a thousand matches, it is a daunting task. My own habit is to start with the largest match who actually has a tree and go from there. Sometimes an answer jumps off the screen at you. But I don’t really ever see those anymore. People only come to me with the really hideous problems. Some are unsolvable because for some reason there are no good matches. Some have great matches, but everyone is related. In that case, clusters are only marginally helpful. The computer has to dump them into one cluster, but they actually match two clusters. You just need to be more specific.
It is at this point that I pull out my spreadsheets. Back in 2013 when I started with this, I had never SEEN a spreadsheet. Don’t laugh. I came out of the world of education, not the world of business. So I never needed one. When I took my first class in 2013, I had to learn all about how to use them along with staggering through the actual DNA. I remember sitting with my laptop and struggling. We had no other tools. DNAgedcom.com had the only tools in existence and were very limited.
To create a spreadsheet, you basically have to download everything that can be downloaded from any site that allows it. You then have to merge the spreadsheets. To do this, you have to move columns around so that the data for each column is the same. I think you need to show the source, i.e., where a particular match came from. There are two basic ways of doing it. You can color the background with a different color to show which company has that match. Or you can create a column and mark it there. A column is better because you can then use color coding for other things. But again these things always depend upon what your problem is. I like columns because if I am lucky enough to have a match on two sites, I can mark them that way.
Back in 2013, when I started, this did not take too much time because there were not that many matches, and we did not have as many sites. Now this spreadsheet can rapidly get out of control. For this reason, create a subdirectory for your backups and backup regularly. Do not skip this step. If you spend all afternoon working on it and then hit the wrong thing and the whole thing gets messed up, you will wish you had followed my advice. Don’t tell me your program backs up automatically. You can destroy a spreadsheet, and the program will save the destroyed one over the precious correct version.
Those of you who work with spreadsheets day in and day out are not as apt to totally destroy a spreadsheet. But remember, I had to learn spreadsheets, so I have destroyed my whole spreadsheet a couple of times through ignorance. Don’t ask what I did. If I had known what I was doing it would not have happened. Since anyone can make a mistake, SAVE A BACKUP. Keep separate backups. Do not save a backup over an old one.
I tend to have my columns in this order.
- Source of match, using 2, M, F, G. Getting your eye used to initials is better. The narrower your columns the better. 2. Name of match 3. Email. I make sure this is kept no matter what because matches can unexpectedly disappear out of the blue. But I usually make that column hidden. I unhide them to see if any one email address has a bunch of kits. Those people nearly always KNOW stuff and will help. 4. Chromosome number
- Starting point 6. Ending point 7. Total cM segment 8. Total of all segments 9. Surnames 10. Notes on match
I color code by ICW match groups. I use a separate color for different lines. When you start you are lucky if you have a clue as to what side the match is even on. I look at everything to get hints. As I figure things out, I add them to the spreadsheet. If you have two unknown sides and don’t know which is which, then color them yellow or green. You can change that later if you figure out that one side has to be the mother, etc.
You have to have a way to keep notes and I prefer them in my spreadsheet. That means I often add more columns on the right. On a match on chromosome 5, I might have a note that says, Mason, ch 17 so I know the match has more than one segment and that it is a larger match. Or I may put in 3C
I try not to make it more complicated than it has to be. And I don’t fill out every block for every match.
I start by going back to an earlier backup and I sort it by size of segments and pick out some of the good size segments. Once I have identified some that look likely, I save that sorted version in my backup directory. I go back to my main working spreadsheet and find that person with that segment. Look at everyone they appear to overlap. Look at where you got the match and go and check them to see if the company thinks they are icw. If so mark them that way. I only work with one set at one time. And I start with the largest segment of the set. I make sure everyone is ICW. Often that is a matter of A matches B, B is also on a different site where B matches C and D. Therefore if A and B match, they are all icw. This often takes time and energy and a great deal of persistence.
When I get a group of ICW individuals, I start writing messages to the matches. I say, You and I and Donna Jones and Mary Wolfe, and John Johnston seem to all match. Do you know how you match any of these people? Or I say, you appear to match Mary Watson who is on 23aandme. Do you know who she is? You both appear to be 3C on my father’s side, but I can’t be sure.
I started out not knowing that a lot of my matches were very convoluted. My father’s family is Craven out of North Carolina. I can’t tell you how many matches I had, and they all seemed like 3C because they were around a hundred. But you can’t assume that. When I started working the trees, I found out that every Craven family I could find married 2nd and 3C cousins. When you are up in the mountains of NC, there is no one else around to marry. My own set of Cravens did not do this. But the cousin matches did and that caused the matches to be too big.
After struggling through this, I realized that all these supposed 3C were actually double 4C or farther. Most of them were double 4C or 4C plus 5C or something else confusing. For this reason, you really have to build trees and when you see this, note that you have Mason who married Katz and Katz who married a Katz and Mason who married a Johnson who is different from the Johnson on the other side. And there may be four different Johnson families involved, none of whom seem to have the same origin. A spreadsheet can guide you through this kind of mess. If you do not have at least one segment where A Johnson and B Johnson overlap and match, you have to think maybe they are not the same
Johnson family. A lack of a match does not prove anything because of random inheritance. But having a shared match does absolutely prove you share a CA.
Look for a solid group with matching segments. Don’t waste a lot of time on small segments. I am not saying to throw them away, but small segments can be deceptive. I have a lot under 20 which don’t seem to be real. I match these people, but they overlap but don’t match people on either my father or mother’s line. I have a special color for these. I do not ever delete these because I will forget I deleted them, find them again and have to go through the whole process and then remember, oh yeah, I did this one before. After a hundred searches, I just can’t remember every weird 15 cM match.
Ancestry will tell you that A B C are icw D. There is a fallacy there. A could be the maternal half-brother of D. B could be a distant 20 cM match from D and maybe even a false match. C could be a match to D’s father. The only way to tell what is going on is to look at segments. This is why old timers like me are so disgusted with Ancestry. Why can’t they just give us a Chromosome Browser? I don’t think we will ever get one. It has taken years to even find out which ones are ICW. When they came out with that, it was a major game changer.
If an Ancestry person is not on another site, I try to get them onto FTDNA. It is only $19. I point out that FTDNA is where the oldest matches are. Many of them are now deceased so you can’t check them out any other way. I would prefer them on Gedmatch but getting them there is getting harder and harder to do. It is also easier to get them onto MyHeritage. If anyone has recent European ancestors, that is where most of the best matches are. I tailor my pitch to the situation. I recently talked to one of my own 4C and convinced her to test at MyHeritage because her other side is mostly recent German.
Remember that the ICW theory only means that two people have a match. It does not guarantee that they both match you in the same way they match each other. When everyone on my father’s side is up in the mountains of TN or NC, you can make no assumptions at all. I have a match to someone with a Craven ancestor but that is not how I match her. Craven is on my father’s side and her father’s side, and they are distantly related. We actually match through the Katz on my mother’s side and her mother’s side.
If you are going to do this, you need to have a very cynical and suspicious mind about what computer programs tell you. They are set up for nice straightforward situations. All they can tell you is that yes or no, two people are related. They cannot tell you how. And that is also all that a spreadsheet can tell you. But at least it tells you that three people overlap, and match. If so, they must have a Common Ancestral couple. It is up to you to build enough trees to know how they match.