How to increase reproducibility and transparency in your research

How to increase reproducibility and transparency in your research

Contemporary science faces many challenges in publishing results that are reproducible. This is due to increased usage of data and digital technologies as well as heightened demands for scholarly communication. These challenges have led to widespread calls for more research transparency, accessibility, and reproducibility from the science community. This article presents current findings and solutions to these problems, including recent new software that makes writing submission-ready manuscripts for journals of Copernicus Publications a lot easier.

While it can be debated if science really faces a reproducibility crisis, the challenges of computer-based research have sparked numerous articles on new good research practices and their evaluation. The challenges have also driven researchers to develop infrastructure and tools to help scientists effectively write articles, publish data, share code for computations, and communicate their findings in a reproducible way, for example Jupyter, ReproZip and research compendia.

Recent studies showed that the geosciences and geographic information science are not beyond issues with reproducibility, just like other domains. Therefore, more and more journals have adopted policies on sharing data and code. However, it is equally important that scientists foster an open research culture and teach researchers how they adopt more transparent and reproducible workflows, for example at skill-building workshops at conferences offered by fellow researchers, such as the EGU short courses, community-led non-profit organisations such as the Carpentries, open courses for students, small discussion groups at research labs, or individual efforts of self-learning. In the light of prevailing issues of a common definition of reproducibility, Philip Stark, a statistics professor and associate dean of mathematical and physical sciences at the University of California, Berkeley, recently coined the term preproducibility: “An experiment or analysis is preproducible if it has been described in adequate detail for others to undertake it.” The neologism intends to reduce confusion and also to embrace a positive attitude for more openness, honesty, and helpfulness in scholarly communication processes.

In the spirit of these activities, this article describes a modern workflow made possible by recent software releases. The new features allow the EGU community to write preproducible manuscripts for submission to the large variety of academic journals published by Copernicus Publications. The new workflow might require hard-earned adjustments for some researchers, but it pays off because of an increase in transparency and effectivity. This is especially the case for early career scientists. An open and reproducible workflow enables researchers to build on others’ and own previous work and better collaborate on solving the societal challenges of today.

Reproducible research manuscripts

Open digital notebooks, which interweave data and code and can be exported to different output formats such as PDF, are powerful means to improve transparency and preproducibility of research. Jupyter Notebook, Stencila and R Markdown let researchers combine long-form text of a publication and source code for analysis and visualisation in a single document. Having text and code side-by-side makes them easier to grasp and ensures consistency, because each rendering of the document executes the whole workflow using the original data. Caching for long-lasting computations is possible, and researchers working with supercomputing infrastructures or huge datasets may limit the executed code to purposes of visualisation using processed data as input. Authors can transparently expose specific code snippets to readers but also publish the complete source code of the document openly for collaboration and review.

The popular notebook formats are plain text-based, like Markdown in case of R Markdown. Therefore an R Markdown document can be managed with version control software, which are programs for managing multiple versions and contributions, even by different people, to the same documents. Version control provides traceability of authorship, a time machine for going back to any previous “working” version, and online collaboration such as on GitLab. This kind of workflow also stops the madness of using file names for versions yet still lets authors use awesome file names and apply domain-specific guidelines for packaging research.

R Markdown supports different programming languages besides the popular namesake R and is a sensible solution even if you do not analyse data with scripts nor have any code in your scholarly manuscript. It is easy to write, allows you to manage your bibliography effectively, can be used for websites, books or blogs, but most importantly it does not fall short when it is time to submit a manuscript article to a journal.

The rticles extension package for R provides a number of templates for popular journals and publishers. Since version 0.6 (published Oct 9 2018) these templates include the Copernicus Publications Manuscript preparations guidelines for authors. The Copernicus Publications staff was kind enough to give a test document a quick review and all seems in order, though of course any problems and questions shall be directed to the software’s vibrant community and not the publishers.

The following code snippet and screen shot demonstrate the workflow. Lines starting with # are code comments and explain the steps. Code examples provided here are ready to use and only lack the installation commands for required packages.

# load required R extension packages:

# create a new document using a template:
rmarkdown::draft(file = "MyArticle.Rmd",
                 template = "copernicus_article",
                 package = "rticles", edit = FALSE)

# render the source of the document to the default output format:
rmarkdown::render(input = "MyArticle/MyArticle.Rmd")

{: .language-r}

The commands created a directory with the Copernicus Publications template’s files, including an R Markdown (.Rmd) file ready to be edited by you (left-hand side of the screenshot), a LaTeX (.tex) file for submission to the publisher, and a .pdf file for inspecting the final results and sharing with your colleagues (right-hand side of the screenshot). You can see how simple it is to format text, insert citations, chemical formulas or equations, and add figures, and how they are rendered into a high-quality output file.

All of these steps may also be completed with user-friendly forms when using RStudio, a popular development and authoring environment available for all operating systems. The left-hand side of the following screenshot shows the form for creating a new document based on a template, and the right-hand shows side the menu for rendering, called “knitting” with R Markdown because code and text are combined into one document like threads in a garment.

And in case you decide last minute to submit to a different journal, rticles supports many publishers so you only have to adjust the template while the whole content stays the same.

Sustainable access to supplemental data

Data published today should be published and properly cited using appropriate research data repositories following the FAIR data principles. Journals require authors to follow these principles, see for example the Copernicus Publications data policy or a recent announcement by Nature. Other publishers required, or still do today, to store supplemental information (SI), such as dataset files, extra figures, or extensive descriptions of experimental procedures, as part of the article. Usually only the article itself receives a digital object identifier (DOI) for long-term identification and availability. The DOI minted by the publisher is not suitable for direct access to supplemental files, because it points to a landing page about the identified object. This landing page is designed to be read by humans but not by computers.

The R package suppdata closes this gap. It supports downloading supplemental information using the article’s DOI. This way suppdata enables long-term reproducible data access when data was published as SI in the past or in exceptional cases today, for example if you write about a reproduction of a published article. In the latest version available from GitHub (suppdata is on its way to CRAN) the supported publishers include Copernicus Publications. The following example code downloads a data file for the article “Divergence of seafloor elevation and sea level rise in coral reef ecosystems” by Yates et al. published in Biogeosciences in 2017. The code then creates a mostly meaningless plot shown below.

# load required R extension package:

# download a specific supplemental information (SI) file
# for an article using the article's DOI:
csv_file <- suppdata::suppdata(
  x = "10.5194/bg-14-1739-2017",
  si = "Table S1 v2 UFK FOR_PUBLICATION.csv")

# read the data and plot it (toy example!):
my_data <- read.csv(file = csv_file, skip = 3)
plot(x = my_data$NAVD88_G03, y = my_data$RASTERVALU,
     xlab = "Historical elevation (NAVD88 GEOID03))",
     ylab = "LiDAR elevation (NAVD88 GEOID03)",
     main = "A data plot for article 10.5194/bg-14-1739-2017",
     pch = 20, cex = 0.5)

{: .language-r}

Main takeaways

Authoring submission-ready manuscripts for journals of Copernicus Publications just got a lot easier. Everybody who can write manuscripts with a word processor can learn quickly R Markdown and benefit from a preproducible data science workflow. Digital notebooks not only improve day-to-day research habits, but the same workflow is suitable for authoring high-quality scholarly manuscripts and graphics. The interaction with the publisher is smooth thanks to the LaTeX submission format, but you never have to write any LaTeX. The workflow is based on an established Free and Open Source software stack and embraces the idea of preproducibility and the principles of Open Science. The software is maintained by an active, growing, and welcoming community of researchers and developers with a strong connection to the geospatial sciences. Because of the complete and consistent notebook, you, a colleague, or a student can easily pick up the work at a later time. The road to effective and transparent research begins with a first step – take it!


The software updates were contributed by Daniel Nüst from the project Opening Reproducible Research (o2r) at the Institute for Geoinformatics, University of Münster, Germany, but would not be able without the support of Copernicus Publications, the software maintainers most notably Yihui Xie and Will Pearse, and the general awesomeness of the R, R-spatial, Open Science, and Reproducible Research communities. The blog text was greatly improved with feedback by EGU’s Olivia Trani and Copernicus Publications’ Xenia van Edig. Thank you!

By Daniel Nüst, researcher at the Institute for Geoinformatics, University of Münster, Germany

[This article is cross posted-on the Opening Reproducible Research project blog]


Geosciences Column: Using volcanoes to study carbon emissions’ long-term environmental effect

Geosciences Column: Using volcanoes to study carbon emissions’ long-term environmental effect

In a world where carbon dioxide levels are rapidly rising, how do you study the long-term effect of carbon emissions?

To answer this question, some scientists have turned to Mammoth Mountain, a volcano in California that’s been releasing carbon dioxide for years. Recently, a team of researchers found that this volcanic ecosystem could give clues to how plants respond to elevated levels of carbon dioxide over long periods of time. The scientists suggest that studying carbon-emitting volcanoes could give us a deeper understanding on how climate change will influence terrestrial ecosystems through the decades. The results of their study were published last month in EGU’s open access journal Biogeosciences.

Carbon emissions reached a record high in 2018, as fossil-fuel use contributed roughly 37.1 billion tonnes of carbon dioxide to the atmosphere. Emissions are expected to increase globally if left unabated, and ecologists have been trying to better understand how this trend will impact plant ecology. One popular technique, which involves exposing environments to increased levels of carbon dioxide, has been used since the 1990s to study climate change’s impact.

The method, also known as the Free-Air Carbon dioxide Enrichment (FACE) experiment, has offered valuable insight into this matter, but can only give a short-term perspective. As a result, it’s been more challenging for scientists to study the long-term impact that emissions have on plant communities and ecosystems, according to the new study.

FACE facilities, such as the Nevada Desert FACE Facility, creates 21st century atmospheric conditions in an otherwise natural environment. Credit: National Nuclear Security Administration / Nevada Site Office via Wikimedia Commons

Carbon-emitting volcanoes, on the other hand, are often well-studied systems and have been known to emit carbon dioxide for decades to even centuries. For example, experts have been collecting data on gas emissions from Mammoth Mountain, a lava dome complex in eastern California, for almost twenty years. The volcano releases carbon dioxide at high concentrations through faults and fissures on the mountainside, subsequently leaving its forest environment exposed to the emissions. In short, the volcanic ecosystem essentially acts like a natural FACE experiment site.

“This is where long-term localized emissions from volcanic [carbon dioxide] can play a game-changing role in how to assess the long-term [carbon dioxide] effect on ecosystems,” wrote the authors in their published study. Research with longer study periods would also allow scientists to assess climate change’s effect on long-term ecosystem dynamics, including plant acclimation and species dominance shifts.

Through this exploratory study, the researchers involved sought to better understand whether the long-term ecological response to carbon-emitting volcanoes is actually representative to the ecological impact of increased atmospheric carbon dioxide.

Remotely sensed imagery acquired over Mammoth Mountain, showing (a) maps of soil CO2 flux simulated based on accumulation chamber measurements, shown overlaid on aerial RGB image, (b) above-ground biomass (c) evapotranspiration, and (d) normalized difference vegetation index (NDVI). Credit: K. Cawse-Nicholson et al.

To do so, the scientists analysed characteristics of the forest ecosystem situated on the Mammoth Mountain volcano. With the help of airborne remote-sensing tools, the team measured several ecological variables, including the forest’s canopy greenness, height and nitrogen concentrations, evapotranspiration, and biomass. Additionally they examined the carbon dioxide fluxes within actively degassing areas on Mammoth Mountain.

They used all this data to model the structure, composition, and function of the volcano’s forest, as well as model how the ecosystem changes when exposed to increased carbon emissions. Their results revealed that the carbon dioxide fluxes from Mammoth Mountain’s soil were correlated to many of the ecological variables analysed. Additionally, the researchers discovered that parts of the observed environmental impact of the volcano’s emissions were consistent with outcomes from past FACE experiments.  

Given the results, the study suggests that these kind of volcanic systems could work as natural test environments for long-term climate research. “This methodology can be applied to any site that is exposed to elevated [carbon dioxide],” the researchers wrote. Given that some plant communities have been exposed to volcanic emissions for hundreds of years, this method could help paint a more comprehensive picture of our future environment as Earth’s climate changes.

By Olivia Trani, EGU Communications Officer


Cawse-Nicholson, K., Fisher, J. B., Famiglietti, C. A., Braverman, A., Schwandner, F. M., Lewicki, J. L., Townsend, P. A., Schimel, D. S., Pavlick, R., Bormann, K. J., Ferraz, A., Kang, E. L., Ma, P., Bogue, R. R., Youmans, T., and Pieri, D. C.: Ecosystem responses to elevated CO2 using airborne remote sensing at Mammoth Mountain, California, Biogeosciences, 15, 7403-7418,, 2018.

Imaggeo on Mondays: Hole in a hole in a hole…

Imaggeo on Mondays: Hole in a hole in a hole…

This photo, captured by drone about 80 metres above the ground, shows a nested sinkhole system in the Dead Sea. Such systems typically take form in karst areas, landscapes where soluble rock, such as limestone, dolomite or gypsum, are sculpted and perforated by dissolution and erosion. Over time, these deteriorating processes can cause the surface to crack and collapse.

The olive-green hued sinkhole, about 20 m in diameter, is made up of a mud material coated by a thin salted cover. When the structures collapse, they can form beautiful blocks and patterns; however, these sinkholes can form quite suddenly, often without any warning, and deal significant damage to roads and buildings. Sinkhole formations have been a growing problem in the region, especially within the last four decades, and scientists are working hard to better understand the phenomenon and the risks it poses to nearby communities and industries.

Some researchers are analysing aerial photos of Dead Sea sinkholes (taken by drones, balloons and satellites, for example) to get a better idea of how these depressions take shape.

“The images help to understand the process of sinkhole formation,” said Djamil Al-Halbouni, a PhD student at the GFZ German Research Centre for Geosciences in Potsdam, Germany and the photographer of this featured image. “Especially the photogrammetric method allows to derive topographic changes and possible early subsidence in this system.” Al-Halbouni was working at the sinkhole area of Ghor Al-Haditha in Jordan when he had the chance to snap this beautiful photo of one of the Dead Sea’s many sinkhole systems.

Recently, Al-Halbouni and his colleagues have employed a different kind of strategy to understand sinkhole formation: taking subsurface snapshots of Dead Sea sinkholes with the help of artificial seismic waves. The method, called shear wave reflection seismic imaging, involves generating seismic waves in sinkhole-prone regions; the waves then make their way through the sediments below. A seismic receiver is positioned to record the velocities of the waves, giving the researchers clues to what materials are present belowground and how they are structured. As one Eos article reporting on the study puts it, the records were essentially an “ultrasound of the buried material.”

The results of their study, recently published in EGU’s open access journal, Solid Earth, give insight into what kind of underground conditions are more likely to give way to sinkhole formation, allowing local communities to better pinpoint sites for future construction, and what spots are best left alone. This study and further work by Al-Halbouni and his colleagues have been published in a special issue organised by EGU journals: “Environmental changes and hazards in the Dead Sea region.”

By Olivia Trani, EGU Communications Officer

Imaggeo is the EGU’s online open access geosciences image repository. All geoscientists (and others) can submit their photographs and videos to this repository and, since it is open access, these images can be used for free by scientists for their presentations or publications, by educators and the general public, and some images can even be used freely for commercial purposes. Photographers also retain full rights of use, as Imaggeo images are licensed and distributed by the EGU under a Creative Commons licence. Submit your photos at

Preprint power: changing the publishing scene

Preprint power: changing the publishing scene

Open access publishing has become common practice in the science community. In this guest post, David Fernández-Blanco, a contributor to the EGU Tectonics and Structural Geology Division blog, presents one facet of open access that is changing the publishing system for many geoscientists: preprints.

Open access initiatives confronting the publishing system

The idea of open access publishing and freely sharing research outputs is becoming widely embraced by the scientific community. The limitations of traditional publishing practices and the misuse of this system are some of the key drivers behind the rise of open access initiatives. Additionally, the open access movement has been pushed even further by current online capacities to widely share research as it is produced.

Efforts to make open access the norm in publishing have been active for quite some time now. For example, almost two decades ago, the European Geosciences Union (EGU) launched its first open access journals, which hold research papers open for interactive online discussion. The EGU also allows manuscripts to be reviewed online by anyone in the community, before finally published in their peer-reviewed journals.

This trend is also now starting to be reflected at an institutional level. For example, all publicly funded scientific papers in Europe could be free to access by 2020, thanks to a reform promoted in 2016 by Carlos Moedas, the European Union’s Commissioner for Research, Science and Innovation.

More recently, in late 2017, around 200 German universities and research organisations cancelled the renewal of their Elsevier subscriptions due to unmet demands for lower prices and an open access policies. Similarly, French institutions refused a new deal with Springer in early 2018. Now, Swedish researchers have followed suit, deciding to cancel their agreement with Elsevier. All these international initiatives are confronting an accustomed publishing system.

The community-driven revolution

Within this context, it’s no surprise that the scientific community has come up with various exciting initiatives that promote open access, such as creating servers to share preprints. Preprints are scientific contributions ready to be shared with other scientists, but that are not yet (or are in the process of being) peer-reviewed. A preprint server is an online platform hosting preprints and making them freely available online.

Many journals that were slow to accept these servers are updating their policies to adapt to the steadily growing increase of preprint usage by a wide-range of scientific communities. Now most journals welcome manuscripts hosted by a preprint server. Even job postings and funding agencies are changing their policies. For example, the European Research Council (ERC) Starting and Consolidator Grants are now taking applicant preprints into consideration.

Preprints: changing the publishing system

ArXiv is the oldest and most established preprint server. It was created in 1991, initially directed towards physics research. The server receives on average 10,000 submissions per month and now hosts over one million manuscripts. Arxiv sets a precedent for preprints, and now servers covering other scientific fields have emerged, such as bioRxiv and ChemRxiv.

Credit: EarthArXiv

EarthArXiv was the first to fill the preprint gap for the Earth sciences. It was launched in October 2017 by Tom Narock, an assistant professor at Notre Dame of Maryland University in Baltimore (US), and Christopher Jackson, a professor at Imperial College London (UK). In the first 24 hours after its online launch, this preprint server already had nine submissions from geoscientists.

The server holds now more than 400 preprints, approved for publication after moderation, and gets around 1,600 downloads monthly. The platform’s policy may well contribute to its success – EarthArXiv is an independent preprint server strongly supported by the Earth sciences community, now run by 125 volunteers. The logo, for example, was a crowdsourcing effort. Through social media, EarthArXiv asked the online community to send their designs; then a poll was held to decide which one of the submitted logos would be selected. Additionally, the server’s Diversity Statement and Moderation Policy were both developed communally.

Credit: ESSOAr

In February 2018, some months after EarthArXiv went live, another platform serving the Earth sciences was born: the American Geophysical Union’s Earth and Space Science Open Archive, ESSOAr. The approach between both platforms is markedly different; ESSOAr is partially supported by Wiley, a publishing company, while EarthArXiv is independent of any publishers. The ESSOAr server is gaining momentum by hosting conference posters, while EarthArXiv plans to focus on preprint manuscripts, at least for the near future. The ESSOAr server hosts currently 120 posters and nine preprints.

What is the power of preprints?

How can researchers benefit from these new online sources?

No delays:

Preprint servers allow rapid dissemination. Through preprints, new scientific findings are shared directly with other scientists. The manuscript is immediately available after being uploaded, meaning it is searchable right away. There is no delay for peer-review, editorial decisions, or lengthy journal production.


A DOI is assigned to the work, so it is citable as soon as it is uploaded. This is especially helpful to early career scientists seeking for employment and funding opportunities, as they can show and prove their scholarly track record at any point.


Making research visible to the community can lead to helpful feedback and constructive, transparent discussions. Some servers and participating authors have promoted their preprints through social media, in many cases initiating productive conversations with fellow scientists. Hence, preprints promote not only healthy exchanges, but they may also lead to improvements to the initial manuscript. Also, through these exchanges, which occur outside of the journal-led peer-review route, it is possible to network and build collaborative links with fellow scientists.

No boundaries:

Preprints allow everyone to have access to science, making knowledge available across boundaries.

The servers are open without cost to everyone forever. This also means tax payers have free access to the science they pay for.


Preprint servers are a useful way to self-archive documents.  Many preprint servers also host postprints, which are already published articles (after the embargo period applicable to some journals).

Given the difference between the publishing industry’s current model and preprint practices, it is not surprising to find an increasing number of scientists stirring the preprint movement. It is possible that many of such researchers are driven by a motivation to contribute to a transparent process and promote open science within their community and to the public. This motivation is indeed the true power of preprints.

Editor’s note: This is a guest blog post that expresses the opinion of its author, whose views may differ from those of the European Geosciences Union. We hope the post can serve to generate discussion and a civilised debate amongst our readers.