EGU Blogs

Diving too deep?

A new initiative has just been announced that could help to revolutionise palaeontology. PaleoDeepDive is essentially an automated version of the Paleobiology Database, which is an online, professionally crowd-sourced and curated database of fossil occurrences pulled from the literature.

They have a launch video here:

Click here to display content from YouTube.

Learn more in YouTube’s privacy policy.


I have a couple of reservations about this. Firstly, how do they expect to mine data from articles that are mostly still locked behind paywalls, at least legally.

I’m also a little concerned about the precision of their algorithms. Towards the end, they mention that in a sample of 500 articles, they get 15000 species names, whereas the PaleobioDB only picks up 1100. Well, in the latter, these names are occurrences – explicit records of fossils in time and space. What these 15000 represent is not clear – are they just those that are mentioned in the text, and therefore don’t really have any use, or are all the palaeontologists really just missing out on 90% of the data when extracting manually?

Additionally, I am concerned about the linking of metadata, such as the location and age of fossils, as well as data about the geology, environment of deposition, taphonomy etc. All of this information has to be sifted out of articles from within a host of information in articles when extraction is manual. I’m not sure if a machine will be able to distinguish between, for example, geological dates from something related, but not directly the age of the fossil, in text.

Anyway, these are just preliminary thoughts, and am sure that they have crossed the developers’ minds at some point, I look forward to seeing how this progresses, and undermines a lot of my work! 😉

Oops.

Oops.

Also, I’d love to hear any thoughts or comments you have about it!

Jon began university life as a geologist, followed by a treacherous leap into the life sciences. He is now based at Imperial College London, investigating the extinction and biodiversity patterns of Mesozoic tetrapods – anything with four legs or flippers – to discover whether or not there is evidence for a ‘hidden’ mass extinction 145 million years ago. Alongside this, Jon researches the origins and evolution of ‘dwarf’ crocodiles called atoposaurids. Prior to this, there was a brief interlude were Jon was immersed in the world of science policy and communication, which has greatly shaped his view on the broader role that science can play, and in particular, the current ‘open’ debate. He tweets as @Protohedgehog.