Palaeontology is the study of the history of life on Earth. Whenever I get asked what I do, my answer always gets a predictable response: either “Oh, like Ross from Friends?” “So Jurassic Park?” or “So you dig dinosaurs?”
Neither of these are close to what myself, my colleagues, or the broader field are doing. Well, apart from the digging dinos. We have to have some perks (not that I’ve actually ever been on a dig…).
What I want to highlight are a couple of recent developments in the field that show that palaeontology is just as technically advanced as any other major domain of science out there. They both involve the genesis and analysis of large data sets that we’re constantly using to test large-scale patterns and processes through time – known as macroevolution. Trying to decipher the patterns and processes of evolution leading towards the modern, extant fauna we have today is key in predicting their future as we destroy the planet.
Diving deep into the literature
The first of these is a machine reading and learning system known as PaleoDeepDive (American spelling…). What this aims to do is surpass manually created databases, such as the Paleobiology Database (PBDB, to which I am a contributor to, and forms the cornerstone of my PhD analyses), by automating the extraction of data (e.g., species names and figures) from published articles.
This is a pretty neat idea, designed to attempt to validate/replicate/refute previously published results using manually-created databases. Further, what it could do is save a hell of a lot of time and effort from slaves PhD students and the like from having to manually generate these large databases. And, well, it works. In a series of comparisons, the new tool performs equally to human-generated results. More importantly, what it does is retain the context of original facts. Instead of information being parsed into a database, as we do with the PBDB, PaleoDeepDive creates a probabilistic database where facts remain in context of the original source, with a probability of being correct.
Anyway, the paper was published in PLOS ONE, so you can read it for free. It’ll certainly be interesting to watch the development of this, and the taking over by the machine!
God, I think I’ve just been made redundant..
R, me ‘earties..
One of the most successful compilations of the fossil record is the Paleobiology Database, mentioned above. I know I’m biased, but having a database with over 1.2 million fossil occurrences in it is kind of a big deal. One of the cool things is that the data for it is all openly licensed, and comes with some tools that you can use to analyse and visualise the data! Yay open science!
Round two of cool new things dragging palaeontology into the 21st century of science involves the interplay of this database with R, a programming language that is pretty much used by every scientist these days who does any form of stats.
Enter the paleobioDB package! This set of functions is designed for anyone to query the database (i.e., extract data from), and process and visualise it in a multitude of ways (e.g., for biodiversity analyses through time). Complete instructions on how to use it, from installing R, to doing some pretty nifty science, are all available here on GitHub.
You know what this means? We have open data, open methods, and instructions on how to analyse and interpret all of it. Freely available. To anyone. You can do science! Become a digital palaeontologist – I’d recommend it, it’s pretty sweet.
So that’s a couple of neat things! Combined with additional awesome things like 3D modelling and printing of fossils, palaeontology really is so much more than dusty old museum specimens and fossil hunts. We still need these as the backbone of our research, but next time you meet a palaeontologist, remember that they are living and working in the 21st century of scientific excellence.
Peters, S. A. et al. (2014). A machine reading system for assembling synthetic paleontological databases, PLOS ONE, 9, e113523. (OA link)
Varela, S. et al. (2015). paleobioDB: an R package for downloading, visualizing and processing data from the Paleobiology Database, Ecography, 38, 1-7. (OA link)