During the 4th of July I started to have this question: which of my ancestors served in the Revolutionary War? I began my search for records of proof of service on FamilySearch and Ancestry. As I was clicking up and down my family tree, I wondered if there was a way to read the family history (GEDCOM) files in R. After an in-depth search, I couldn’t find anyone with a good GEDCOM parser that would turn it into CSV format in R (however, a similar project was recently begun; https://www.r-bloggers.com/gedcom-reader-for-the-r-language-analysing-family-history/). So, I decided that I would take a crack at it and start exploring my family’s data.
I decided to go with the family tree off of my FamilySearch account mainly because a lot of the searching has been done for you, and their files come with latitudes and longitudes. One thing to keep in mind, the farther back you go in your tree, the information may not be correct. Luckily, someone already built a way to get your GEDCOM file from FamilySearch. It does require python3. You can go to their site to learn more:
https://github.com/Linekio/getmyancestors
You can also get a GEDCOM file off of Ancestry by following these instructions:
https://support.ancestry.com/s/article/Uploading-and-Downloading-Trees
Feel free to use my read_gedcom function from here:
# install.packages("devtools")
devtools::install_github("jjfitz/readgedcom")
It is still in the developmental stages, but in its current format, it is able to grab useful information such as names, birth/death dates, birth/death places, what family they come from, and what children they had. It does not gather everything that a GEDCOM file stores, but I will be working on it gathering more information shortly.
As mentioned earlier, I will demonstrate this idea using my own line, albeit with changes to locations and other identifiable information. The following graphic is a density plot showing the ages of my direct ancestors.
# Density
gd %>%
filter(age < 120 & age > 0) %>%
ggplot(aes(age, color = sex)) +
geom_density()
In this dataset, I went back 10 generations. It was able to provide 656 males and 630 females.
Next, I built out a pedigree chart.
gd$FAMS[1] <- gd$FAMC[1]
fam_graph <- gd %>%
mutate(FAMS = if_else(FAMS == "", FAMC, FAMS)) %>%
filter(FAMC != "", generation < 5) %>%
select(FAMS, FAMC, anc_title) %>%
graph_from_data_frame()
g_name <- gd %>%
mutate(FAMS = if_else(FAMS == "", FAMC, FAMS)) %>%
filter(FAMC != "", generation < 5) %>%
select(FAMS, FAMC, anc_title)
g_name$V_verts <- V(fam_graph)$name
correct_name <- g_name %>%
left_join(gd, by = c("V_verts" = "FAMC"))
fam_graph <- correct_name %>%
mutate(FAMS.x = if_else(FAMS.x == "", FAMC, FAMS.x)) %>%
filter(FAMC != "", generation < 5) %>%
select(FAMS.x, FAMC, anc_title.y) %>%
graph_from_data_frame()
V(fam_graph)$F_name <- E(fam_graph)$anc_title.y
ggraph(fam_graph, layout = "dendrogram") +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = F_name), vjust = 1, hjust = 1, angle = 35) +
expand_limits(x = -.5, y = -.5)
I was also curious about my ancestors’ migratory path, so here is a map displaying that path, with me as generation 0.
gd %<>%
filter(generation < 7)
leaflet() %>%
addTiles() %>%
addCircles(lng = gd$birthlong,
lat = gd$birthlat,
popup = paste0("Id: ", gd$id, "<br />",
"Birth Place: ", gd$birthplace, "<br />"
),
color = gd$gencol) %>%
addLegend(colors = levels(as_factor(gd$gencol)), labels = levels(as_factor(gd$generation)))
While the general migratory path was interesting, I wanted to focus on the descendants of someone specific, so I found the first member of my family to be born in America on my paternal side, and mapped out his descendants. The following .gif portrays the migratory path of that ancestor (who, incidentally, did fight in the Revolutionary War!) and his descendants.
There are other questions that would be fun to explore, such as: Is there any intermingling between other trees in my line? Who had the most living descendants before they died? What is the average distance between where they were born and where they died?
I hope that this parser can aid others in learning more about their family history, as well as make it easier for those who are just beginning to find the gaps in their heritage.