A substantial portion of my PhD involves contributing to the Pal[a]eobiology database, the largest current online database of global fossil occurrences, literature references, and taxonomic data. It’s a public resource, so anyone can access the information contained within the database (yay for open data!). The project is very much on-going, but currently over a million taxonomic occurrences have been input, from a range of international contributors (usually post-docs, research fellows and their minions, aka PhD students). Ultimately, the aim is to provide global, collection-based occurrence and taxonomic data for all groups of any geological age. Along with this, there is a selection of web-based software that anyone can use for the simple statistical analysis of whichever portion of the data they chose to access. According to the website, the long-term goal is also to encourage collaboration across the globe so that palaeontologists can help answer some of the large-scale questions that exist in terms of the evolutionary history of life on Earth. That’s a pretty decent thing to aim for.
Unfortunately, but also thankfully, only ‘professional researchers’ can actually input data. This is to impose a degree of rigour, as if any old Tom Dick or Harry could enter or modify data, there would be serious data validity issues. You can see what the various groups of researchers are working on here, and it’s pretty varied! Anything from calculating clade occurrence times and comparing/combining this with molecular data, to diversity trends of plants through the last few hundred million years.
Where I think the real strength of the database lies, is that it can be analysed in congruence with other datasets, such as various climatic variables, land area, and latitude, to see how individual lineages and groups have co-evolved with the planet and Earth systems through time, and use this as a predictor of future trends in biodiversity; something that is especially important given that we may be heading into a time of exceptional pressure on biological systems.
What I’m gonna be doing with the database in particular, is filling out the entire Upper Jurassic tetrapod occurrence bit, and combining this with the Early Cretaceous dataset, which is nearing completion thanks to a mammoth effort from many people, including my supervisor Phil Mannion, to look at extinction dynamics and selectivity patterns in terrestrial tetrapods throughout this period. More on this, when I actually have the data! Many exceptional publications have already come out of this project, the latest just a couple of days ago looking at the Cretaceous tetrapod record, which I’ll be writing about shortly on here.
Matt Herod
That is a pretty cool project! Both the database and your PhD. I wonder if there is something like this for the geochemistry world? It would be cool if such a thing existed as a global water or soil geochemistry database. To my knowledge there is nothing though. How did the paleo one start and get off the ground? I can tell it needs a lot of filling out though. I just checked out Ontario fossils and there are some major gaps.
tennant
Oh yeah, it’s very much work in progress – mostly it’s a select group of researchers who only fill the bits out they need for their particular research topic, with the over-arching hope that this will eventually lead to completion. No-one dedicated to filling it in stochastically.
The BGS in the UK have some pretty cool data – https://www.bgs.ac.uk/opengeoscience/home.html?Accordion2=3#data but I’m not sure if any other countries have such a thing. For primary data, it would be a monumental effort – this is just extracting info from papers mostly.
Ross Mounce (@rmounce)
Sorry Jon,
but I’m gonna have to pull you up on the logic behind this one:
I’m pretty sure there’s a small database of knowledge called that lets any Tom Dick or Harry enter or modify data. It works rather well and is rather popular I believe. I see no reason why PaleoDB can’t also operate a transparent, moderated wiki-style approach to contributing/correcting data with say a ‘real name-only’ contributor policy like Google+ & Facebook. If someone entered bad data, akin to Wiki-vandalism, the higher-up editors could clean and revert back to the old state; no problem.
Within the realms of science the has been successful. See also WikiProteins & GeneWiki.
I actually think a more open model would be beneficial – it’d be easier to see what the coverage & quality is like within PaleoDB. It’s a bit of a black box to me atm. If you shake it you get data out but sometimes I feel I have to rely on trust on what’s in there and I don’t think that’s good in the long term.
So called ‘amateurs’ have a long history of making excellent and worthwhile contributions to palaeontology. Long before the term ‘citizen science’ was invented. Why not harness this mass of interest for good to clean data in PaleoDB? By preventing this, I think the DB is missing out on a good thing perhaps(?)
tennant
I don’t know anything about the projects you mention – thanks for bringing them to my attention! (obviously I know of Wikipedia..)
I wonder though, do scientists cite them and use the data, with the same amount of legitimate trust as, you know, peer-reviewed data? If that actually even exists.
Feel free to take this up with John Alroy – I’m just a minion, he’s the master!
Ross Mounce (@rmounce)
[bugger, cocked-up my HTML tags first time around, reposting…]
Sorry Jon,
but I’m gonna have to pull you up on the logic behind this one:
I’m pretty sure there’s a small database of knowledge called Wikipedia that lets any Tom Dick or Harry enter or modify data. It works rather well and is rather popular I believe. I see no reason why PaleoDB can’t also operate a transparent, moderated wiki-style approach to contributing/correcting data with say a ‘real name-only’ contributor policy like Google+ & Facebook. If someone entered bad data, akin to Wiki-vandalism, the higher-up editors could clean and revert back to the old state; no problem.
Within the realms of science the RNA wikiproject has been successful. See also WikiProteins & GeneWiki.
I actually think a more open model would be beneficial – it’d be easier to see what the coverage & quality is like within PaleoDB. It’s a bit of a black box to me atm. If you shake it you get data out but sometimes I feel I have to rely on trust on what’s in there and I don’t think that’s good in the long term.
So called ‘amateurs’ have a long history of making excellent and worthwhile contributions to palaeontology. Long before the term ‘citizen science’ was invented. Why not harness this mass of interest for good to clean data in PaleoDB? By preventing this, I think the DB is missing out on a good thing perhaps(?)
Mike Taylor
For what it’s worth, several years back I was working on something that could have used PaleoDB data, and wanted to contribute the missing bits. At the time, I wasn’t able to make those contributions because I didn’t have a Ph.D. So I lost interest at that point. (Things might be better today.)
I strongly agree with Ross that this exclusivism is a real mistake. Even if the PaleoDB people want to retain control, they could still get a huge boost by allowing anyone to enter unauthenticated data, which is only stamped as OK when someone they trust has verified it.
Pingback: The Palaeobiology Database – a quick intr...