Diving too deep?

A new initiative has just been announced that could help to revolutionise palaeontology. PaleoDeepDive is essentially an automated version of the Paleobiology Database, which is an online, professionally crowd-sourced and curated database of fossil occurrences pulled from the literature.

I have a couple of reservations about this. Firstly, how do they expect to mine data from articles that are mostly still locked behind paywalls, at least legally.

I’m also a little concerned about the precision of their algorithms. Towards the end, they mention that in a sample of 500 articles, they get 15000 species names, whereas the PaleobioDB only picks up 1100. Well, in the latter, these names are occurrences – explicit records of fossils in time and space. What these 15000 represent is not clear – are they just those that are mentioned in the text, and therefore don’t really have any use, or are all the palaeontologists really just missing out on 90% of the data when extracting manually?

Additionally, I am concerned about the linking of metadata, such as the location and age of fossils, as well as data about the geology, environment of deposition, taphonomy etc. All of this information has to be sifted out of articles from within a host of information in articles when extraction is manual. I’m not sure if a machine will be able to distinguish between, for example, geological dates from something related, but not directly the age of the fossil, in text.

Anyway, these are just preliminary thoughts, and am sure that they have crossed the developers’ minds at some point, I look forward to seeing how this progresses, and undermines a lot of my work! 😉



Also, I’d love to hear any thoughts or comments you have about it!

Partially sane; roads – many; and time

So I hit the 9 month barrier for my PhD the other day. Where ze hell did all that time go??

Well, you can actually see if you want – I’ve uploaded the 9 month report to Figshare, excluding the preliminary results (which are beginning to look cool btw). You can find it here, where it’s already had almost 200 hits. Figshare is so awesome it hurts.

Summary points:

  • The primary task is to assess biodiversity patterns over the Jurassic/Cretaceous interval
  • Primary data collection for this is now complete, and some preliminary stats run on it to account for imperfections in the fossil record
  • There is a hell of a lot to do

I’m actually in Munich at the moment, working on alternative route to assessing this first point. I’m using a method called ‘phylogenetic diversity’, which essentially maps evolutionary trees onto time (stratigraphy), and you can interpolate where you know species should be but haven’t been found, based on their evolutionary relationships and artificially inflate diversity through time. I’m doing this for about 500 species atm, so it’s taking a lot of time, but looking pretty awesome atm – stay tuned! 🙂

Oh, the title? Not a clue – I’ve only had one coffee. PhD research is tough – you work long hours, do difficult work, and get paid a pittance, so times it can be a bit much, but it’s totally worth it; there are many paths the research could take; and thyme, never enough thyme..

Anyway, have a flick through and let me know what you think! If you think there’s something I’m missing, or an avenue in particular you’d like me to explore, drop a comment here (this is funded by UK taxpayers’ cashmoney after all) 🙂

Can fossil mammals help us with our conservation efforts?

How can the dead help the living? This is a question a lot of fossil-fanatics have bent a lot of time towards over recent years, partially due to a desire to make palaeontology ‘relevant’ as a modern science, and secondly to help guide our efforts in conservation biology. A new series, edited by my supervisor Dr. Phil Mannion and others, focusses on the way we interpret palaeobiodiversity, biodiversity in the fossil record, for different groups and the issues and solutions facing the field. The final article in the volume struck me in particular.

How can fossils help us to protect these now and in the future? Source.

