GD
Geodynamics

Wit & Wisdom

Presentation skills – 2. Speech

Presentation skills – 2. Speech

Presenting: some people love it, some people hate it. I firmly place myself in the first category and apparently, this presentation joy translates itself into being a good – and confident – speaker. Over the years, quite a few people have asked me for my secrets to presenting (which – immediate full disclosure – I do not have) and this is the result: a running series on the EGU GD Blog that covers my own personal tips and experience in the hope that it will help someone (you?) become a better and – more importantly – more confident speaker. Last time, we discussed your presentation voice. In this second instalment, I discuss everything related to how you speak.

1. Get rid of ‘uh’

Counting the number of times a speaker says ‘uh’ during a presentation is a fun game, but ideally you would like your audience to focus on the non-uh segments of your talk. Therefore, getting rid of ‘uh’ (or any other filler word for that matter) is important. I have two main tips to get rid of ‘uh’:

Write down your speech and practice (but don’t hold on to it religiously)

Practice. Practice. And practice it again. Maybe a few more times. Almost… no: practice it again.
I am being serious here. If you know exactly what you want to say, you won’t hesitate and fill that moment of hesitation with a prolonged uuuuuhhh. The added benefit of writing down your presentation and practising it religiously is that it will help you with timing your presentation as well. I also find it helpful to read through it (instead of practising it out loud) when I am in a situation that doesn’t allow me to go into full presentation mode (on the plane to AGU for example). However, make sure to practise your presentation out loud even though you wrote it all down: thinking speed (or reading in your head) and talking speed are not the same!

If you write down your presentation, and you know exactly what you want to say, you have to take care to evade another (new) pitfall for saying ‘uh’: now that you know exactly what you want to say and how to say it most efficiently, you start saying ‘uh’ when you can’t remember the exact wording. Let it go. Writing down your speech helps you to clarify the vocabulary needed for your speech, but if you don’t say the exact sentences, just go with something else. You will have a well thought out speech anyway. Just go with the flow and try not to say ‘uh’.

The second main tip for getting rid of ‘uh’ is to

Realise that it is okay to stay silent for a while

If you forget the word you wanted to say and you need some time to think, you can take a break. You can stay silent. You don’t need to fill up the silence with ‘uh’. In fact, a break often seems more natural. Realise that you forgot something, don’t panic, take a breath, take a break (don’t eat a KitKat at this point in your presentation), and then continue when you know what to say again. Even if you don’t forget the exact words or phrasings, taking a breath and pausing in your narrative can be helpful for your audience to take a breath as well. It will seem as if your presentation is relaxed: you are not rushing through 50 slides in 12 minutes. You are prepared, you are in control, you can even take a break to take a breath.

2. Speed

A lot of (conference) presentations will have a fixed time. At the big conferences, like EGU and AGU, you get 12 minutes and not a second more or less. Well, of course you can talk longer than 12 minutes, but this will result in less (if any) time for questions.

I don’t think the conveners will kill you, but don’t pin me down on it

And on top of that, everyone (well, me at the very least) will be annoyed at you for not sticking to the time.

So: sticking to your time limit is important!

But how can you actually do this? Well, there are a few important factors:
1. Preparation: know exactly what you want to say (we will cover this more in a later instalment of this series)
2. The speed at which you speak.

We will be discussing the latter point in this blog entry. For me (and many other people), I know I can stick to the rule of “one slide per minute”, but I always have a little buffer in that I count the title slide as a slide as well. So, my 12-minute long presentation would have 12 slides in total (including the title slides). This actually spreads my 12 minutes over 11 scientific slides, so I can talk a little bit longer about each slide. It also gives me piece of mind to know that I have a bit of extra time. However, the speed at which you talk might be completely different. Therefore, the most important rule about timing your presentations is:

Knowing how fast you (will) speak

I always practice my short presentations a lot. If they are 30 minutes or longer, I like to freewheel with the one slide per minute rule. But for shorter presentations, I require a lot of practice. I always time every presentation attempt and make a point of finishing each attempt (even if the first part goes badly). Otherwise you run the risk of rehearsing the first part of your presentation very well, and kind of forgetting about the second part. When I time my presentation during practice, I always speak too long. For a 12 minute presentation, I usually end up at the 13.5 minute mark. However, I know that when I speak in front of an audience, I (subconsciously?) speed up my speech, so when I time 13.5 minutes, I know that my actual presentation will be a perfect 12 minutes.

The only way to figure out how you change or start to behave in front of an audience is by simply giving a lot of presentations. Try to do that and figure out whether you increase or decrease the speed of your speech during your talk. Take note and remember it for the next time you time your presentation. In the end, presenting with skill and confidence is all about knowing yourself.

3. Articulation and accent

There are as many accents to be heard at a conference as there are scientists talking. Everyone has there own accent, articulation, (presentation) voice, etc. This means that

You should not feel self-conscious about your accent

Some accents are stronger than others and may be more difficult for others to follow. Native speakers are by no means necessarily better speakers and depending on whom you ask, their accent might also not be better than anyone else’s.
Of course your accent might become an issue if people can’t understand you. You can try and consider the following things to make yourself understandable for a big audience:
1. Articulate well.
2. Adapt the speed at which you talk

Some languages are apparently faster than others. French is quite fast for example, whereas (British) English is a slower language. You have to take this into account when switching languages. If you match the pace of the language you are speaking, your accent will be less noticeable, because you avoid any ingrained rythm patterns that are language specific. Then you might still have your accent shine through in your pronunciation of the words, but it will not shine through in the rhythm of your speech.
In addition, you can consider asking a native speaker for help if you are unsure of how to pronounce certain words. Listening or watching many English/American/Australian tv series/films/youtube will also help with your pronunciation.

And that, ladies and gentlemen, is about everything I have to say on the matter of speech. You should now have full control over your presentation voice and all the actual words you are going to say. Next time, we go one step further and discuss your posture during the presentation and your movements.

It’s just coding … – Scientific software development in geodynamics

The Spaghetti code challenge. Source: Wikimedia Commons, Plamen petkov 92, CC-BY-SA 4.0

As big software packages become a commonplace in geodynamics, which skills should a geodynamicist aim at having in software development? Which techniques should be considered a minimum standard for our software? This week Rene Gassmöller, project scientist at UC Davis, Computational Infrastructure for Geodynamics, shares his insights on the best practices to make scientific software better, and how we can work to translate these into our field. Enjoy the read!

Rene  Gassmöller

Nowadays we often equate geodynamics with computational geodynamics. While there are still interesting analytical studies to be made, and important data to be gathered, it is increasingly common that PhD students in geodynamics are expected to work exclusively on data interpretation, computational models, and in particular the accompanying development of geodynamic software packages. But as it turns out, letting an unprepared PhD student (or unprepared postdoc or faculty member for that matter) work on a big software package is a near guarantee for the project to develop into a sizeable bowl of spaghetti code (see figure above for a representative illustration).

Note, that I intentionally write about ‘software packages’ instead of ‘code’, as many of these packages — think of Gplates (Müller et al, 2018), ObsPy (Krischer et al, 2015), FeniCS (Alneas et al, 2015) , or the project I am working on, ASPECT (Heister et al, 2017) — have necessarily left the stage of a quickly written ‘code’ for a single purpose, and developed into multi-purpose tools with a complex internal structure. With this growing complexity, the activity of scientific ‘coding’ evolved into ‘developing software’. However, when students enter the field of geophysics, they are rarely prepared for this challenge. Hannay et al. (2009) report that while researchers typically spend 30% or more of their time developing software, 90% of them are primarily self-taught, and only few of them received formal training for writing software, including tests and documentation. Nobody told them: Programming and engineering software are two very different things. Many undergraduate and graduate geoscience curricula today include classes about the basics of programming (e.g. in Python, R, or Matlab), and also discuss numerical and computational methods. While these concepts are crucial for solving scientific problems, they are not sufficient for managing the complexity of growing scientific software. Writing a 50-line script is a very different task from contributing to an inherited and poorly documented PhD project of 1,000 lines, which again is very different from managing a multi-developer project of 100,000 lines of source code. A recurring theme is that these differences are only discovered when damage has already been done. Hannay et al. (2009) note:

Codes often start out small and only grow large with time as the software proves its usefulness in scientific investigations. The demand for proper software engineering is therefore seldom visible until it is “too late”.

But what are these ‘proper software engineering techniques’?

Best practices vs. Best techniques in practice

In a previous blog post, Krister Karlsen already discussed the value of version control systems for reproducibility of computational research. It is needless to say that these systems (originally also termed source code control systems, e.g. Rochkind, 1975) are just as valuable for scientific software development as they are for reproducibility of results. However, they are not sufficient for developing reliable scientific software. Wilson et al. (2014) summarize a list of 8 best practices that make scientific software better:

  1. Write programs for people, not computers.
    • A program should not require its readers to hold more than a handful of facts in memory at once.
    • Make names consistent, distinctive, and meaningful.
    • Make code style and formatting consistent.
  2. Let the computer do the work.
    • Make the computer repeat tasks.
    • Save recent commands in a file for re-use.
    • Use a build tool to automate workflows.
  3. Make incremental changes.
    • Work in small steps with frequent feedback and course correction.
    • Use a version control system.
    • Put everything that has been created manually in version control.
  4. Don’t repeat yourself (or others).
    • Every piece of data must have a single authoritative representation in the system.
    • Modularize code rather than copying and pasting.
    • Re-use code instead of rewriting it.
  5. Plan for mistakes.
    • Add assertions to programs to check their operation.
    • Use an off-the-shelf unit testing library.
    • Turn bugs into test cases.
    • Use a symbolic debugger.
  6. Optimize software only after it works correctly.
    • Use a profiler to identify bottlenecks.
    • Write code in the highest-level language possible.
  7. Document design and purpose, not mechanics.
    • Document interfaces and reasons, not implementations.
    • Refactor code in preference to explaining how it works.
    • Embed the documentation for a piece of software in that software.
  8. Collaborate.
    • Use pre-merge code reviews.
    • Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems.
    • Use an issue tracking tool.

There is a lot to be said about each of these techniques, but that would be beyond the scope of this blog post (please see Wilson et al.’s excellent and concise paper if you are interested). What I would like to emphasize here is that these techniques are often requested, but rarely taught. What are peer code reviews? How do I gradually introduce tests and refactor a legacy code? Who knows if it is better to use unit testing, integration testing, regression testing, or benchmarking for a given change of the code? And do I really need to know the difference? After all, a common argument against using software development techniques in applied computational science disciplines boils down to:

  • We can not expect these software development techniques from geodynamicists.
  • We should not employ the same best practices as Google, Amazon, Apple, because they do not apply to us.
  • There is no time to learn/apply these techniques, because we have to conduct our research, write our publications, secure our funding.

While from a philosophical standpoint it is easy to dismiss these statements as not adhering to best practices, and possibly impacting the reliability of the created software, it is harder to tackle them from a practical perspective. Of course it is true that implementing a sophisticated testing infrastructure for a one-line shell command is neither useful nor necessary. Maybe the same is true for a 20 line script that is written to specifically convert one dataset into another, but in this case putting it under version control would already be useful in order to record your process and apply it to other datasets. And from my own experience it is extraordinarily easy to miss the threshold at 40-100 lines at which writing documentation and implementing first testing procedures become crucial to avoid cursing yourself in the future for not explaining what you did and why you did it. So why are there detailed instructions for lab notes and experimental procedures, but not for geodynamic software design and reliability of scientific software? Geoscience, chemistry, and physics have established multi-semester lab and field exercises, to drill students towards careful scientific analysis. Should we develop comparable exercises for scientific software development (beyond numerical methods and basic programming)? How would an equivalent of these classes look like for computational methods? And is there a point where the skills of software development and geodynamics research grow so far apart we have to consider them separately and establish a unique career track, such as the Research Software Engineer that is becoming more popular in the UK?

In my personal opinion we have made great progress over the last years in defining best practices for scientific software (see e.g. https://software.ac.uk/resources/online-sustainability-evaluation, or https://geodynamics.org/cig/dev/best-practices/). However, it is still considered a personal task to acquire the necessary skills and to find the correct balance between careful engineering and overdesigning software. Establishing courses and resources that discuss these questions could greatly benefit our community, and allow for a more reliable scientific progress in geodynamics.

Collaborative software development – The overlooked social challenge

The contributor funnel. The atmosphere and usability of a project influence how many users will join a project, how long they stick around, and if they will take responsibility for the project by contributing to it or eventually become maintainers. Credit: https://opensource.guide/

Now that we covered every topic a scientist can learn about scientific software development in a single blog post, what can go wrong when you put several of them together to work on a software package? Needless to say, a lot. No matter if your software project is a closed-source, intra-workgroup project, or an open-source project with users and developers spread over different continents, things are going to get exponentially more complicated the more people work on your software. Not only does discussion and interaction take more time, there will also be conflicting ideas about computational methods, software design, or implementation. Using state-of-the-art tools like collaborative development platforms (Github, Gitlab, Bitbucket, pick your favourite) and modern discussion channels like chats (Slack, Gitter), forums (Discourse), or video conferences (Skype, Hangouts, Zoom) can alleviate a part of the communication barriers. But ultimately, the social challenges remain. How does a project decide between competing goals of flexibility and performance? Who is going to enforce a code of conduct in a project to keep the development environment open and friendly? Does a project create a welcoming atmosphere that invites new contributions, or does it repel newcomers by unrealistic standards and inappropriate behavior? How should maintainers of scientific software deal with unrealistic feature requests by users? How to encourage new users to become contributors and take responsibility for the software they benefit from? How to compromise or combine providing improvements to the upstream project versus publishing them as scientific papers? How to provide credit to contributors?

In my opinion it is unfortunate that these questions about scientific software projects are even less discussed than the (now increasing) awareness of reproducibility. On the bright side, there is already a trove of experiences in the open-source community. The same questions about attribution and credit, collaboration and community-management, and correctness and security have been discussed over the past decades in open-source projects all over the world, and nowadays a good number of resources provide guidance, such as https://opensource.guide/, or the excellent book  ‘How to Run a Successful Free Software Project’ (Fogel, 2017). Not all of it can be transferred to science, but we would waste time and energy to dismiss these experiences and instead repeat their mistakes.

Let us talk about engineering scientific software

I realize that in this blog post I opened more questions than I answered. Maybe that is because I am not aware of the answers that are already out there. But maybe it is also caused by a lack of attention that these questions receive. I feel that there are no established guidelines for which software development skills a geodynamicist should have, and what techniques should be considered a minimum standard for our software. If that is the case, I would invite you to have a discussion about it. Maybe we can agree on a set of guidelines and improve the state of software in geodynamics. But at the very least I hope I inspired some thought about the topic, and provided some resources to learn more about a discussion that will likely grow more important over the coming years.

References:

M. S. Alnaes, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes and G. N. Wells. The FEniCS Project Version 1.5. Archive of Numerical Software, vol. 3, 2015, http://dx.doi.org/10.11588/ans.2015.100.20553.

Fogel, K. (2017). Producing Open Source Software: How to Run a Successful Free Software Project. O'Reilly Media, 2nd edition.

Hannay, J. E., MacLeod, C., Singer, J., Langtangen, H. P., Pfahl, D., & Wilson, G. (2009). How do scientists develop and use scientific software?. In Proceedings of the 2009 ICSE workshop on Software Engineering for Computational Science and Engineering (pp. 1-8). IEEE Computer Society.

Heister, T., Dannberg, J., Gassmöller, R., & Bangerth, W. (2017). High accuracy mantle convection simulation through modern numerical methods–II: realistic models and problems. Geophysical Journal International, 210(2), 833-851.

Krischer, L., Megies, T., Barsch, R., Beyreuther, M., Lecocq, T., Caudron, C., & Wassermann, J. (2015). ObsPy: A bridge for seismology into the scientific Python ecosystem. Computational Science & Discovery, 8(1), 014003.

Müller, R.D., Cannon, J., Qin, X., Watson, R.J., Gurnis, M., Williams, S., Pfaffelmoser, T., Seton, M., Russell, S.H. & Zahirovic, S. (2018). GPlates–Building a Virtual Earth Through Deep Time. Geochemistry, Geophysics, Geosystems.

Open Source Guides. https://opensource.guide/. Oct, 2018.

Rochkind, M. J. (1975). The source code control system. IEEE transactions on Software Engineering, (4), 364-370.

Wilson, G., Aruliah, D.A., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K.D., Mitchell, I.M., Plumbley, M.D. and Waugh, B. (2014). Best practices for scientific computing. PLoS biology, 12(1), e1001745.

Presentation skills – 1. Voice

Presentation skills – 1. Voice

Presenting: some people love it, some people hate it. I firmly place myself in the first category and apparently, this presentation joy translates itself into being a good – and confident – speaker. Over the years, quite a few people have asked me for my secrets to presenting (which – immediate full disclosure – I do not have) and this is the result: a running series on the EGU GD Blog that covers my own personal tips and experience in the hope that it will help someone (you?) become a better and – more importantly – more confident speaker. In this first instalment, I discuss everything regarding your voice.

Disregarding the content of your talk (I can’t really help you with that), mastering your voice is an important first step towards presenting well and presenting with (or feigning) confidence. An important thing to always remember, is that your audience doesn’t know how you feel. If you come across as confident, people will perceive you as such, even though you are not necessarily feeling confident yourself. With time, I promise that you will in the end feel at ease and confident in front of an audience.
Using your voice optimally is, obviously very important: it is the one thing people will have to listen to in order to get your message. Therefore, knowing how to use your voice is essential to presenting well. And note that your ‘presenting voice’ doesn’t necessarily need to match up with your ‘normal voice’.

1. Volume

First things first: make sure all people can hear you wherever they are in the room! This is a very basic tip, but one of the most important ones as well: if people can’t hear you, it doesn’t matter how well you present, they won’t understand what you’re talking about, because they literally won’t be able to hear it. Depending on your voice, this will result in one of the following adjustments to get into proper ‘presentation voice mode’:
• You will raise your voice to make sure everyone in the back can clearly hear you. I always do this myself, so my ‘presentation voice’ is always louder than my more natural, soft everyday-talking voice.
• You will lower your voice, so that the people in the first row don’t get blown away: you don’t want your voice to be so loud as to be a nuisance for people sitting close by.

Make sure your voice carries across the room

To test how loudly you need to speak, you can ‘scout’ the room beforehand with a friend. Make sure they stay at the back of the room, and walk up to the front of the room and start talking in your ‘presentation voice’. Can your friend clearly hear everything you say? Then you are good to go. Otherwise, you can adjust and test the volume of your voice according to the comments of your friend. No time/opportunity for a test round of your voice volume? Start your presentation with ‘Can everybody hear me?’ and you’ll soon find out how loud you need to speak.

Help! There is a microphone: now what?!

If there is a microphone available, you should refrain from using your loud presentation voice, because no one wants to go home after a conference with hearing damage. Often, you can test out the microphone shortly before your presentation. Make use of that opportunity, so that you don’t face any surprises! Also, if there is a stationary microphone (i.e., not a headset), make sure to always talk into to the microphone. Adjust it to your height and make sure your voice is optimally picked up by the microphone. It is very tempting to start looking at your slides and turn your head, but that means your voice isn’t optimally picked up by the microphone, which will result in the fact that people in the back can’t hear you! If you alternate speaking into the microphone and turning your head, the sound of your voice during your presentation becomes a rollercoaster of soft-loud-soft-loud. This is very annoying to listen to, so try to avoid this! Having said that, I find this to be one of the hardest things ever, because I’m not used to talking into a stationary microphone… Let’s say practice makes perfect, right?

2. Tonality

It is incredibly boring to listen to someone who speaks in a dull, monotonous voice. No matter how interesting the content of your talk, if you can’t get the excitement and passion for your research across in your voice, chances are that people will start falling asleep during your presentation. And we all know how hard it is to stay awake during even the most animated of presentations, just because of irritating things like jetlag (or trying to finish your own presentation in the dead of the night on the previous evening). Therefore, I suggest practising the tonality of your voice.

Speak with emotion

If you want your audience to feel excited about your research or motivated to collaborate with you, you need to convey those emotions in your voice. Think about what you want your audience to feel and how you can convey that emotion with your voice. For example, if you want people to get excited, you can increase the pitch of your voice to indicate excitement.

Emphasise the right words

Another way of getting rid of a monotonous voice is putting emphasis on the right words, to make your point. Obviously the effect is negated when you overuse this method, but when used in moderation, you can use emphasis on words to get your message across more easily.
You can practice the tonality of your voice all the time: try reading a book out loud, tell a story about your weekend in an animated way, incorporate it in your day-to-day conversations, etc. Try to let your tonality come across as natural (and not over the top) and engaging. Recording your talks and listening back to them or asking comments from friends/family can help when you practice your presentation.

3. Pitch

The pitch of your voice should be pleasant for the audience. Now, of course you can’t (and shouldn’t) change your voice completely, but a very high-pitched, squeaky voice can be very annoying to listen to and a very deep voice can be hard to understand. So, depending on your voice and on what you think people find pleasant, you could consider slightly altering the pitch of your voice.

Don’t worry if your voice gets squeaky, because there is an easy way around it

My voice (and everyone else’s) gets really high-pitched and squeaky when I get excited and presentations make me very excited. So, I always make sure that my presentation voice has an ever-so-slightly lower pitch than my normal speaking voice (and doesn’t get near the high-pitched excitement voice). By lowering the pitch of my voice I (think I) am more clearly understandable and if I do get excited and my pitch increases due to the emotion in my voice, it is still at a very manageable and pleasant pitch, so no-one gets a headache on my watch.

Bearing these tips in mind, you can start honing your perfect presentation voice. Next time, we will start using our voice and tackle the subject of speech!

Reproducible Computational Science

Reproducible Computational Science

 

Krister with his bat-signal shirt for reproducibility.

We’ve all been there – you’re reading through a great new paper, keen to get to the Data Availability only to find nothing listed, or the uninspiring “data provided on request”. This week Krister Karlsen, PhD student from the Centre for Earth Evolution and Dynamics (CEED), University of Oslo shares some context and tips for increasing the reproducibility of your research from a computational science perspective. Spread the good word and reach for the “Gold Standard”!

Historically, computational methods and modelling have been considered the third avenue of the sciences, but they are now some of the most important, paralleling experimental and theoretical approaches. Thanks to the rapid development of electronics and theoretical advances in numerical methods, mathematical models combined with strong computing power provide an excellent tool to study what is not available for us to observe or sample (Fig. 1). In addition to being able to simulate complex physical phenomena on computer clusters, these advances have drastically improved our ability to gather and examine high-dimensional data. For these reasons, computational science is in fact the leading tool in many branches of physics, chemistry, biology, and geodynamics.

Figure 1: Time–depth diagram presenting availability of geodynamic data. Modified from (Gerya, 2014).

A side effect of the improvement of methods for simulation and data gathering is the availability of a vast variety of different software packages and huge data sets. This poses a challenge in terms of sufficient documentation that will allow the study to be reproduced. With great computing power, comes great responsibility.

“Non-reproducible single occurrences are of no significance to science.” – Popper (1959)

Reproducibility is the cornerstone of cumulative science; the ultimate standard by which scientific claims are judged. With replication, independent researchers address a scientific hypothesis and build up evidence for, or against, it. This methodology represents the self-correcting path that science should take to ensure robust discoveries; separating science from pseudoscience. Reports indicate increasing pressure to publish manuscripts whilst applying for competitive grants and positions (Baker, 2016). Furthermore, a growing burden of bureaucracy takes away precious time designing experiments and doing research. As the time available for actual research is decreasing, the number of articles that mention a “reproducibility crisis?” are rising towards the present day peak (Fig. 2). Does this mean we have become sloppy in terms of proper documentation?

Figure 2: Number of titles, abstracts, or keywords that contain one of the following phrases: “reproducibility crisis,” “scientific crisis,” “science in crisis,” “crisis in science,” “replication crisis,” “replicability crisis”, found in the Web of Science records. Modified from (Fanelli, 2018).

Are we facing a reproducibility crisis?

A survey conducted by Nature asked 1,576 researchers this exact question, and reported 52% responded with “Yes, a significant crisis,” and 38% with “Yes, a slight crisis” (Baker, 2016). Perhaps more alarming is that 70% report they have unsuccessfully tried to reproduce another scientist’s findings, and more than half have failed to reproduce their own results. To what degree these statistics apply to our own field of geodynamics is not clear, but it is nonetheless a timely remainder that reproducibility must remain at the forefront of our dissemination. Multiple journals have implemented policies on data and software sharing upon publication to ensure the replication and reproduction of computational science is maintained. But how well are they working? A recent empirical analysis of journal policy effectiveness for computational reproducibility sheds light on this issue (Stodden et al., 2018). The study randomly selected 204 papers published in Science after the implementation of their code and data sharing policy. Of these articles, 24 contained sufficient information, whereas for the remaining 180 publications the authors had to be contacted directly. Only 131 authors replied to the request, of these 36% provided some of the requested material and 7% simply refused to share code and data. Apparently the implementation of policies was not enough, and there is still a lot of confusion among researchers when it comes to obligations related to data and code sharing. Some of the anonymized responses highlighted by Stodden et al. (2018) underline the confusion regarding the data and code sharing policy:

Putting aside for the moment that you are, in many cases, obliged to share your code and data to enhance reproducibility; are there any additional motivating factors in making your computational research reproducible? Freire et al. (2012) lists a few simple benefits of reproducible research:

1. Reproducible research is well cited. A study (Vandewalle et al., 2009) found that published articles that reported reproducible results have higher impact and visibility.

2. Code and software comparisons. Well documented computational research allows software developed for similar purposes to be compared in terms of performance (e.g. efficiency and accuracy). This can potentially reveal interesting and publishable differences between seemingly identical programs.

3. Efficient communication of science between researchers. New-comers to a field of research can more efficiently understand how to modify and extend an existing program, allowing them to more easily build upon recently published discoveries (this is simply the positive counterpart to the argument made against software sharing earlier).

“Replicability is not reproducibility: nor is it good science.” – Drummond (2009)

I have discussed reproducibility over quite a few paragraphs already, without yet giving it a proper definition. What precisely is reproducibility? Drummond (2009) proposes a distinction between reproducibility and replicability. He argues that reproducibility requires, at the minimum, minor changes in experiment or model setup, while replication is an identical setup. In other words, reproducibility refers to a phenomenon that can be predicted to recur with slightly different experimental conditions, while replicability describes the ability to obtain an identical result when an experiment is performed under precisely the same conditions. I think this distinction makes the utmost sense in computational science, because if all software, data, post-processing scripts, random number seeds and so on, are shared and reported properly, the results should indeed be identical. However, replicability does not ensure the validity of the scientific discovery. A robust discovery made using computational methods should be reproducible with a different software (made for similar purposes, of course) and small perturbations to the input data such as initial conditions, physical parameters, etc. This is critical because we rarely, if ever, know the model inputs with zero error bars. A way for authors to address such issues is to include a sensitivity analysis of different parameters, initial conditions and boundary conditions in the publication or the supplementary material section.

Figure 3: Illustration of the “spectrum of reproducibility”, ranging from not reproducible to the gold standard that includes code, data and executable files that can directly replicate the reported results. Modified from (Peng, 2011).

However, the gold standard of reproducibility in computation-involved science, like geodynamics, is often described as what Drummond would classify as replication (Fig. 3). That is, making all data and code available to others to easily execute. Even though this does not ensure reproducibility (only replicability), it provides other researchers a level of detail regarding the work-flow and analysis that is beyond what can usually be achieved by using common language. And this deeper understanding can be crucial when trying to reproduce (and not replicate) the original results. Thus replication is a natural step towards reproduction. Open-source community codes for geodynamics, like eg. ASPECT (Heister et al., 2017), and more general FEM libraries like FEniCS (Logg et al., 2012), allows for friction-free replication of results. An input-file describing the model setup provides a 1-to-1 relation to the actual results1 (which in many cases is reasonable because the data are too large to be easily shared). Thus, sharing the post-processing scripts accompanied by the input file on eg. GitHub, will allow for complete replication of the results, at low cost in terms of data storage.

Light at the end of the tunnel?

In order to improve practices for reproducibility, contributions will need to come from multiple directions. The community needs to develop, encourage and maintain a culture of reproducibility. Journals and funding agencies can play an important role here. The American Geosciences Union (AGU) has shared a list of best practices regarding research data2 associated with a publication:

• Deposit the data in support of your publication in a leading domain repository that handles such data.

• If a domain repository is not available for some of all of your data, deposit your data in a general repository such as Zenodo, Dryad, or Figshare. All of these repositories can assign a DOI to deposited data, or use your institution’s archive.

• Data should not be listed as “available from authors.”

• Make sure that the data are available publicly at the time of publication and available to reviewers at submission—if you are unable to upload to a public repository before submission, you may provide access through an embargoed version in a repository or in datasets or tables uploaded with your submission (Zenodo, Dryad, Figshare, and some domain repositories provide embargoed access.) Questions about this should be sent to journal staff.

• Cite data or code sets used in your study as part of the reference list. Citations should follow the Joint Declaration of Data Citation Principles.

• Develop and deposit software in GitHub which can be cited, or include simple scripts in a supplement. Code in Github can be archived separately and assigned a DOI through Zenodo for submission.

In addition to best practice guidelines, wonderful initiatives from other communities include a research prize. The European College of Neuropsychopharmacology offers a (11,800 USD) award for negative results, more specifically for careful experiments that do not confirm an accepted hypothesis or previous result. Another example is the International Organization for Human Brain Mapping who awards 2,000 USD for the best replication study − successful or not. Whilst not a prize per se, at recent EGU General Assemblies in Vienna the GD community have held sessions around the theme of failed models. Hopefully, similar initiatives will lead by example so that others in the community will follow.

1To the exact same results, information about the software version, compilers, operating system etc. would also typically be needed.

2 AGU’s definition of data includes all code, software, data, methods and protocols used to produce the results here.

References

AGU, Best Practices. https://publications.agu.org/author-resource-center/publication-policies/datapolicy/data-policy-faq/ Accessed: 2018-08-31.

Baker, Monya. Reproducibility crisis? Nature, 533:26, 2016.

Drummond, Chris. Replicability is not reproducibility: nor is it good science. 2009.

Fanelli, Daniele. Opinion: Is science really facing a reproducibility crisis, and do we need it to?Proceedings of the National Academy of Sciences, 115(11):2628–2631, 2018.

Freire, Juliana; Bonnet, Philippe, and Shasha, Dennis. Computational reproducibility: state-of-theart, challenges, and database research opportunities. In Proceedings of the 2012 ACM SIGMOD international conference on management of data, pages 593–596. ACM, 2012.

Gerya, Taras. Precambrian geodynamics: concepts and models. Gondwana Research, 25(2):442–463, 2014.

Heister, Timo; Dannberg, Juliane; Gassm"oller, Rene, and Bangerth, Wolfgang. High accuracy mantle convection simulation through modern numerical methods. II: Realistic models and problems. Geophysical Journal International, 210(2):833–851, 2017. doi: 10.1093/gji/ggx195. URL https://doi.org/10.1093/gji/ggx195.

Logg, Anders; Mardal, Kent-Andre; Wells, Garth N., and others, . Automated Solution of Differential Equations by the Finite Element Method. Springer, 2012. ISBN 978-3-642-23098-1. doi: 10.1007/978-3-642-23099-8.

Peng, Roger D. Reproducible research in computational science. Science, 334(6060):1226–1227, 2011.

Popper, Karl Raimund. The Logic of Scientific Discovery . University Press, 1959.

Stodden, Victoria; Seiler, Jennifer, and Ma, Zhaokun. An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences , 115(11):2584–2589, 2018.

Vandewalle, Patrick; Kovacevic, Jelena, and Vetterli, Martin. Reproducible research in signal processing. IEEE Signal Processing Magazine , 26(3), 2009