How to increase reproducibility and transparency in your research

How to increase reproducibility and transparency in your research

Contemporary science faces many challenges in publishing results that are reproducible. This is due to increased usage of data and digital technologies as well as heightened demands for scholarly communication. These challenges have led to widespread calls for more research transparency, accessibility, and reproducibility from the science community. This article presents current findings and solutions to these problems, including recent new software that makes writing submission-ready manuscripts for journals of Copernicus Publications a lot easier.

While it can be debated if science really faces a reproducibility crisis, the challenges of computer-based research have sparked numerous articles on new good research practices and their evaluation. The challenges have also driven researchers to develop infrastructure and tools to help scientists effectively write articles, publish data, share code for computations, and communicate their findings in a reproducible way, for example Jupyter, ReproZip and research compendia.

Recent studies showed that the geosciences and geographic information science are not beyond issues with reproducibility, just like other domains. Therefore, more and more journals have adopted policies on sharing data and code. However, it is equally important that scientists foster an open research culture and teach researchers how they adopt more transparent and reproducible workflows, for example at skill-building workshops at conferences offered by fellow researchers, such as the EGU short courses, community-led non-profit organisations such as the Carpentries, open courses for students, small discussion groups at research labs, or individual efforts of self-learning. In the light of prevailing issues of a common definition of reproducibility, Philip Stark, a statistics professor and associate dean of mathematical and physical sciences at the University of California, Berkeley, recently coined the term preproducibility: “An experiment or analysis is preproducible if it has been described in adequate detail for others to undertake it.” The neologism intends to reduce confusion and also to embrace a positive attitude for more openness, honesty, and helpfulness in scholarly communication processes.

In the spirit of these activities, this article describes a modern workflow made possible by recent software releases. The new features allow the EGU community to write preproducible manuscripts for submission to the large variety of academic journals published by Copernicus Publications. The new workflow might require hard-earned adjustments for some researchers, but it pays off because of an increase in transparency and effectivity. This is especially the case for early career scientists. An open and reproducible workflow enables researchers to build on others’ and own previous work and better collaborate on solving the societal challenges of today.

Reproducible research manuscripts

Open digital notebooks, which interweave data and code and can be exported to different output formats such as PDF, are powerful means to improve transparency and preproducibility of research. Jupyter Notebook, Stencila and R Markdown let researchers combine long-form text of a publication and source code for analysis and visualisation in a single document. Having text and code side-by-side makes them easier to grasp and ensures consistency, because each rendering of the document executes the whole workflow using the original data. Caching for long-lasting computations is possible, and researchers working with supercomputing infrastructures or huge datasets may limit the executed code to purposes of visualisation using processed data as input. Authors can transparently expose specific code snippets to readers but also publish the complete source code of the document openly for collaboration and review.

The popular notebook formats are plain text-based, like Markdown in case of R Markdown. Therefore an R Markdown document can be managed with version control software, which are programs for managing multiple versions and contributions, even by different people, to the same documents. Version control provides traceability of authorship, a time machine for going back to any previous “working” version, and online collaboration such as on GitLab. This kind of workflow also stops the madness of using file names for versions yet still lets authors use awesome file names and apply domain-specific guidelines for packaging research.

R Markdown supports different programming languages besides the popular namesake R and is a sensible solution even if you do not analyse data with scripts nor have any code in your scholarly manuscript. It is easy to write, allows you to manage your bibliography effectively, can be used for websites, books or blogs, but most importantly it does not fall short when it is time to submit a manuscript article to a journal.

The rticles extension package for R provides a number of templates for popular journals and publishers. Since version 0.6 (published Oct 9 2018) these templates include the Copernicus Publications Manuscript preparations guidelines for authors. The Copernicus Publications staff was kind enough to give a test document a quick review and all seems in order, though of course any problems and questions shall be directed to the software’s vibrant community and not the publishers.

The following code snippet and screen shot demonstrate the workflow. Lines starting with # are code comments and explain the steps. Code examples provided here are ready to use and only lack the installation commands for required packages.

# load required R extension packages:

# create a new document using a template:
rmarkdown::draft(file = "MyArticle.Rmd",
                 template = "copernicus_article",
                 package = "rticles", edit = FALSE)

# render the source of the document to the default output format:
rmarkdown::render(input = "MyArticle/MyArticle.Rmd")

{: .language-r}

The commands created a directory with the Copernicus Publications template’s files, including an R Markdown (.Rmd) file ready to be edited by you (left-hand side of the screenshot), a LaTeX (.tex) file for submission to the publisher, and a .pdf file for inspecting the final results and sharing with your colleagues (right-hand side of the screenshot). You can see how simple it is to format text, insert citations, chemical formulas or equations, and add figures, and how they are rendered into a high-quality output file.

All of these steps may also be completed with user-friendly forms when using RStudio, a popular development and authoring environment available for all operating systems. The left-hand side of the following screenshot shows the form for creating a new document based on a template, and the right-hand shows side the menu for rendering, called “knitting” with R Markdown because code and text are combined into one document like threads in a garment.

And in case you decide last minute to submit to a different journal, rticles supports many publishers so you only have to adjust the template while the whole content stays the same.

Sustainable access to supplemental data

Data published today should be published and properly cited using appropriate research data repositories following the FAIR data principles. Journals require authors to follow these principles, see for example the Copernicus Publications data policy or a recent announcement by Nature. Other publishers required, or still do today, to store supplemental information (SI), such as dataset files, extra figures, or extensive descriptions of experimental procedures, as part of the article. Usually only the article itself receives a digital object identifier (DOI) for long-term identification and availability. The DOI minted by the publisher is not suitable for direct access to supplemental files, because it points to a landing page about the identified object. This landing page is designed to be read by humans but not by computers.

The R package suppdata closes this gap. It supports downloading supplemental information using the article’s DOI. This way suppdata enables long-term reproducible data access when data was published as SI in the past or in exceptional cases today, for example if you write about a reproduction of a published article. In the latest version available from GitHub (suppdata is on its way to CRAN) the supported publishers include Copernicus Publications. The following example code downloads a data file for the article “Divergence of seafloor elevation and sea level rise in coral reef ecosystems” by Yates et al. published in Biogeosciences in 2017. The code then creates a mostly meaningless plot shown below.

# load required R extension package:

# download a specific supplemental information (SI) file
# for an article using the article's DOI:
csv_file <- suppdata::suppdata(
  x = "10.5194/bg-14-1739-2017",
  si = "Table S1 v2 UFK FOR_PUBLICATION.csv")

# read the data and plot it (toy example!):
my_data <- read.csv(file = csv_file, skip = 3)
plot(x = my_data$NAVD88_G03, y = my_data$RASTERVALU,
     xlab = "Historical elevation (NAVD88 GEOID03))",
     ylab = "LiDAR elevation (NAVD88 GEOID03)",
     main = "A data plot for article 10.5194/bg-14-1739-2017",
     pch = 20, cex = 0.5)

{: .language-r}

Main takeaways

Authoring submission-ready manuscripts for journals of Copernicus Publications just got a lot easier. Everybody who can write manuscripts with a word processor can learn quickly R Markdown and benefit from a preproducible data science workflow. Digital notebooks not only improve day-to-day research habits, but the same workflow is suitable for authoring high-quality scholarly manuscripts and graphics. The interaction with the publisher is smooth thanks to the LaTeX submission format, but you never have to write any LaTeX. The workflow is based on an established Free and Open Source software stack and embraces the idea of preproducibility and the principles of Open Science. The software is maintained by an active, growing, and welcoming community of researchers and developers with a strong connection to the geospatial sciences. Because of the complete and consistent notebook, you, a colleague, or a student can easily pick up the work at a later time. The road to effective and transparent research begins with a first step – take it!


The software updates were contributed by Daniel Nüst from the project Opening Reproducible Research (o2r) at the Institute for Geoinformatics, University of Münster, Germany, but would not be able without the support of Copernicus Publications, the software maintainers most notably Yihui Xie and Will Pearse, and the general awesomeness of the R, R-spatial, Open Science, and Reproducible Research communities. The blog text was greatly improved with feedback by EGU’s Olivia Trani and Copernicus Publications’ Xenia van Edig. Thank you!

By Daniel Nüst, researcher at the Institute for Geoinformatics, University of Münster, Germany

[This article is cross posted-on the Opening Reproducible Research project blog]


Preprint power: changing the publishing scene

Preprint power: changing the publishing scene

Open access publishing has become common practice in the science community. In this guest post, David Fernández-Blanco, a contributor to the EGU Tectonics and Structural Geology Division blog, presents one facet of open access that is changing the publishing system for many geoscientists: preprints.

Open access initiatives confronting the publishing system

The idea of open access publishing and freely sharing research outputs is becoming widely embraced by the scientific community. The limitations of traditional publishing practices and the misuse of this system are some of the key drivers behind the rise of open access initiatives. Additionally, the open access movement has been pushed even further by current online capacities to widely share research as it is produced.

Efforts to make open access the norm in publishing have been active for quite some time now. For example, almost two decades ago, the European Geosciences Union (EGU) launched its first open access journals, which hold research papers open for interactive online discussion. The EGU also allows manuscripts to be reviewed online by anyone in the community, before finally published in their peer-reviewed journals.

This trend is also now starting to be reflected at an institutional level. For example, all publicly funded scientific papers in Europe could be free to access by 2020, thanks to a reform promoted in 2016 by Carlos Moedas, the European Union’s Commissioner for Research, Science and Innovation.

More recently, in late 2017, around 200 German universities and research organisations cancelled the renewal of their Elsevier subscriptions due to unmet demands for lower prices and an open access policies. Similarly, French institutions refused a new deal with Springer in early 2018. Now, Swedish researchers have followed suit, deciding to cancel their agreement with Elsevier. All these international initiatives are confronting an accustomed publishing system.

The community-driven revolution

Within this context, it’s no surprise that the scientific community has come up with various exciting initiatives that promote open access, such as creating servers to share preprints. Preprints are scientific contributions ready to be shared with other scientists, but that are not yet (or are in the process of being) peer-reviewed. A preprint server is an online platform hosting preprints and making them freely available online.

Many journals that were slow to accept these servers are updating their policies to adapt to the steadily growing increase of preprint usage by a wide-range of scientific communities. Now most journals welcome manuscripts hosted by a preprint server. Even job postings and funding agencies are changing their policies. For example, the European Research Council (ERC) Starting and Consolidator Grants are now taking applicant preprints into consideration.

Preprints: changing the publishing system

ArXiv is the oldest and most established preprint server. It was created in 1991, initially directed towards physics research. The server receives on average 10,000 submissions per month and now hosts over one million manuscripts. Arxiv sets a precedent for preprints, and now servers covering other scientific fields have emerged, such as bioRxiv and ChemRxiv.

Credit: EarthArXiv

EarthArXiv was the first to fill the preprint gap for the Earth sciences. It was launched in October 2017 by Tom Narock, an assistant professor at Notre Dame of Maryland University in Baltimore (US), and Christopher Jackson, a professor at Imperial College London (UK). In the first 24 hours after its online launch, this preprint server already had nine submissions from geoscientists.

The server holds now more than 400 preprints, approved for publication after moderation, and gets around 1,600 downloads monthly. The platform’s policy may well contribute to its success – EarthArXiv is an independent preprint server strongly supported by the Earth sciences community, now run by 125 volunteers. The logo, for example, was a crowdsourcing effort. Through social media, EarthArXiv asked the online community to send their designs; then a poll was held to decide which one of the submitted logos would be selected. Additionally, the server’s Diversity Statement and Moderation Policy were both developed communally.

Credit: ESSOAr

In February 2018, some months after EarthArXiv went live, another platform serving the Earth sciences was born: the American Geophysical Union’s Earth and Space Science Open Archive, ESSOAr. The approach between both platforms is markedly different; ESSOAr is partially supported by Wiley, a publishing company, while EarthArXiv is independent of any publishers. The ESSOAr server is gaining momentum by hosting conference posters, while EarthArXiv plans to focus on preprint manuscripts, at least for the near future. The ESSOAr server hosts currently 120 posters and nine preprints.

What is the power of preprints?

How can researchers benefit from these new online sources?

No delays:

Preprint servers allow rapid dissemination. Through preprints, new scientific findings are shared directly with other scientists. The manuscript is immediately available after being uploaded, meaning it is searchable right away. There is no delay for peer-review, editorial decisions, or lengthy journal production.


A DOI is assigned to the work, so it is citable as soon as it is uploaded. This is especially helpful to early career scientists seeking for employment and funding opportunities, as they can show and prove their scholarly track record at any point.


Making research visible to the community can lead to helpful feedback and constructive, transparent discussions. Some servers and participating authors have promoted their preprints through social media, in many cases initiating productive conversations with fellow scientists. Hence, preprints promote not only healthy exchanges, but they may also lead to improvements to the initial manuscript. Also, through these exchanges, which occur outside of the journal-led peer-review route, it is possible to network and build collaborative links with fellow scientists.

No boundaries:

Preprints allow everyone to have access to science, making knowledge available across boundaries.

The servers are open without cost to everyone forever. This also means tax payers have free access to the science they pay for.


Preprint servers are a useful way to self-archive documents.  Many preprint servers also host postprints, which are already published articles (after the embargo period applicable to some journals).

Given the difference between the publishing industry’s current model and preprint practices, it is not surprising to find an increasing number of scientists stirring the preprint movement. It is possible that many of such researchers are driven by a motivation to contribute to a transparent process and promote open science within their community and to the public. This motivation is indeed the true power of preprints.

Editor’s note: This is a guest blog post that expresses the opinion of its author, whose views may differ from those of the European Geosciences Union. We hope the post can serve to generate discussion and a civilised debate amongst our readers.

Geosciences Column: Landslide risk in a changing climate, and what that means for Europe’s roads

Geosciences Column: Landslide risk in a changing climate, and what that means for Europe’s roads

If your morning commute is already frustrating, get ready to buckle up. Our climate is changing, and that may increasingly affect some of central Europe’s major roads and railways, according to new research published in the EGU’s open access journal Natural Hazards and Earth System Sciences. The study found that, in the face of climate change, landslide-inducing rainfall events will increase in frequency over the century, putting central Europe’s transport infrastructure more at risk.  

How do landslides affect us?

Landslides that block off transportation corridors present many direct and indirect issues. Not only can these disruptions cause injuries and heavy delays, but in broader terms, they can negatively affect a region’s economic wellbeing.

One study for instance, published in Procedia Engineering in 2016, examined the economic impact of four landslides on Scotland’s road network and estimated that the direct cost of the hazards was between £400,000 and £1,700,000. Furthermore the study concluded that the consequential cost of the landslides was around £180,000 to £1,400,000.

Such landslides can have a societal impact on European communities as well, as disruptions to road and railway networks can impact access to daily goods, community services, and healthcare, the authors of the EGU study explain.

Modelling climate risk

To analyse climate patterns and how they might affect hazard risk in central Europe, the researchers first ran a set of global climate models, simulations that predict how the climate system will respond to different greenhouse gas emission scenarios. Specifically, the scientists ran climate projections based on the Intergovernmental Panel on Climate Change’s A1B socio-economic pathway, a scenario defined by rapid economic growth, technological advances, reduced cultural and economic inequality, a population peak by 2050, and a balanced reliance on different energy sources.

They then determined how often the conditions in their climate projections would trigger landslide events specifically in central Europe using a climate index that estimates landslide potential from the duration and intensity of rainfall events. The index, established by Fausto Guzzetti of National Research Council of Italy and his colleagues, suggests that landslide activity most likely occurs when a rainfall event satisfies the following three conditions: the event lasts more than three days, total downpour is more than 37.3 mm and at least one day of the rainfall period experiences more than 25.6 mm.

The researchers also incorporated into their models data on central Europe’s road infrastructure as well as the region’s geology, including topography, sensitivity to erosion, soil properties and land cover.

Overview of a particularly risk-prone region along the lowlands of Alsace and the Black Forest mountain range: (a) location of the region in central Europe and median of the increase in landslide-triggering climate events for (b) the near future and (c) the remote future.

The fate of Europe’s roadways

The results of the researchers’ models suggest that the number of landslide-triggering rainfall events will increase from now up until 2100. Their simulations also find while that these hazardous rainfall events slightly increase in frequency between 2021 and 2050, the number of these occurrences will be more significant between 2050 and 2100.  

While the flat, low-altitude areas of central Europe will only experience minor increases in landslide-inducing rainfall activity, regions with high elevation, like uplands and Alpine forests, are most at risk, their findings suggest.

The study found that many locations along the north side of the Alps in France, Germany, Austria and the Czech Republic may face up to seven additional landslide-triggering rainfall events as our climate changes. This includes the Vosges, the Black Forest, the Swabian Jura, the Bergisches Land, the Jura Mountains, the Northern Limestone Alps foothills, the Bohemian Forest, and the Austrian and Bavarian Alpine forestlands.

The researchers go on to explain that much of the Trans-European Transport Networks’ main corridors will be more exposed to landslide-inducing rainfall activity, especially the Rhine-Danube, the Scandinavian-Mediterranean, the Rhine-Alpine, the North Sea-Mediterranean, and the North Sea-Baltic corridors.

The scientists involved with the study hope that their findings will help European policy makers make informed plans and strategies when developing and maintaining the continents’ infrastructure.  

The publication issue: the opinions of EGU early career scientists!

The publication issue: the opinions of EGU early career scientists!

The EGU’s General Assemblies have a long tradition of Great Debates – sessions of Union-wide interest which aim to discuss some of the greatest challenges faced by our discipline. Past topics have included exploitation of mineral resources at the sea bed, water security given an ever growing population and climate geoengineering, to name but a few.  This year’s meeting saw the first Great Debate aimed, specifically, at an Early Career Scientist (ECS) audience which boasted an innovative format too: Should early career scientists be judged by their publication record? A set of group debates. Today’s post, written by Mathew Stiller-Reeve, a convener of the session, summarises some of the main outcomes of the discussion.

We, early career scientists, are told that we need to become expert writers, presenters, and teachers if we are going to make it in the world of research. Many of us agree such transferrable skills are extremely important. But if we invest time in developing these skills, it sometimes feels like time wasted. All said and done, we only seem to be judged on our publication record and our h-index. How many papers have we published in high impact journals, and how often have they been cited?

Early career scientists seem very clued up on transferrable skills. They want to invest in these skills. Therefore, we wanted to hear from them about whether ‘early career scientists [should] be judged mainly on their publication record?’ And so we put this question to them (and others) at a Great Debate at the EGU’s 2017 General Assembly. We also wanted to test out a new format where the audience had the opportunity to voice their opinions about important issues concerning modern academia. The publication issue affects us all, so we should have a say.

With only 8 people at each table and over 40 minutes to debate, everyone had an opportunity to speak their mind and contribute to developing solutions. The room was buzzing with over 100 early career and more established scientists discussing, agreeing, disagreeing, and finding compromises.

In the end, each table was tasked to debate and boil their thoughts down to one or two policy-type statements. These statements will be presented to the EGU Council to inform them of where EGU early career scientists stand on this matter.

So without further ado, here are the conclusions of the tables:

– We need more criteria. Quality is most important, measured by prizes, PhD results and the incorporation of the community via new media.

-More activities need to be taken into account in a measurable way, but according to scaled categories #notjustanumber.

-The current system is cheap, easy and fast. A person should be judged on the broader contributions to society, to their colleagues, to their disciplines. We should move beyond metrics.

-Because scientists are more than a list of publications, assess them individually. Talk to them and read their output, including publications, blogs and chapter/book contributions.

-We should not be judged on publication record alone. We need a multi-variant set of criteria for assessment for judgment of impact beyond just academic publications.

-One suggestion is a weighted metric depending on the position you’re applying for which considers other factors such as teaching, outreach, conference participation etc.

-No, the h-index should not be the sole number, even though it is not a totally useless number.

-Quality should be judged on more than quantity and the large number of authors on publications devaluates the contributions of early career scientists.

-Publications are the accepted way of communication in science, but there is not any one number describing the quality of the early career scientist, whom in our humble opinion should not only be judged on the quantity of papers but also on their quality as a part of a complete set of research skills, including other contributions such as project development.

-We acknowledge the publication record as a reliable metric, but we suggest an additional step for assessing applications, based on video or audio presentations to emphasize your other outstanding qualities.

-We doubt that we are mainly judged on our publication record and we think that publications should be part of what we are judged on.

-When hiring, follow the example of the Medical Department at Utrecht University: only ask for the 3 papers, teaching or outreach experiences you think are important for the position you are applying for: we are more than numbers.

Should they be adopted? Do you agree? How can we adopt them?

The message in many of the statements from the Early Career Scientists at the European Geosciences Union is quite clear: We are more than numbers! Several suggestions arose from the debate: new metrics, video presentations, and even new application processes. Now the statements from the debate are recorded. This will hopefully inspire us (and others) to find better solutions. At the very least, the discussion has begun. Solutions are impossible if we don’t talk!

By Mathew Stiller-Reeve, co-founder of ClimateSnack and researcher at Bjerknes Centre for Climate Research, Norway

Editor’s note: This is a guest blog post that expresses the opinion of its author and those who participated at the Great Debate during the General Assembly, whose views may differ from those of the European Geosciences Union. We hope the post can serve to generate discussion and a civilised debate amongst our readers.