GeoLog

open access

How to increase reproducibility and transparency in your research

How to increase reproducibility and transparency in your research

Contemporary science faces many challenges in publishing results that are reproducible. This is due to increased usage of data and digital technologies as well as heightened demands for scholarly communication. These challenges have led to widespread calls for more research transparency, accessibility, and reproducibility from the science community. This article presents current findings and solutions to these problems, including recent new software that makes writing submission-ready manuscripts for journals of Copernicus Publications a lot easier.

While it can be debated if science really faces a reproducibility crisis, the challenges of computer-based research have sparked numerous articles on new good research practices and their evaluation. The challenges have also driven researchers to develop infrastructure and tools to help scientists effectively write articles, publish data, share code for computations, and communicate their findings in a reproducible way, for example Jupyter, ReproZip and research compendia.

Recent studies showed that the geosciences and geographic information science are not beyond issues with reproducibility, just like other domains. Therefore, more and more journals have adopted policies on sharing data and code. However, it is equally important that scientists foster an open research culture and teach researchers how they adopt more transparent and reproducible workflows, for example at skill-building workshops at conferences offered by fellow researchers, such as the EGU short courses, community-led non-profit organisations such as the Carpentries, open courses for students, small discussion groups at research labs, or individual efforts of self-learning. In the light of prevailing issues of a common definition of reproducibility, Philip Stark, a statistics professor and associate dean of mathematical and physical sciences at the University of California, Berkeley, recently coined the term preproducibility: “An experiment or analysis is preproducible if it has been described in adequate detail for others to undertake it.” The neologism intends to reduce confusion and also to embrace a positive attitude for more openness, honesty, and helpfulness in scholarly communication processes.

In the spirit of these activities, this article describes a modern workflow made possible by recent software releases. The new features allow the EGU community to write preproducible manuscripts for submission to the large variety of academic journals published by Copernicus Publications. The new workflow might require hard-earned adjustments for some researchers, but it pays off because of an increase in transparency and effectivity. This is especially the case for early career scientists. An open and reproducible workflow enables researchers to build on others’ and own previous work and better collaborate on solving the societal challenges of today.

Reproducible research manuscripts

Open digital notebooks, which interweave data and code and can be exported to different output formats such as PDF, are powerful means to improve transparency and preproducibility of research. Jupyter Notebook, Stencila and R Markdown let researchers combine long-form text of a publication and source code for analysis and visualisation in a single document. Having text and code side-by-side makes them easier to grasp and ensures consistency, because each rendering of the document executes the whole workflow using the original data. Caching for long-lasting computations is possible, and researchers working with supercomputing infrastructures or huge datasets may limit the executed code to purposes of visualisation using processed data as input. Authors can transparently expose specific code snippets to readers but also publish the complete source code of the document openly for collaboration and review.

The popular notebook formats are plain text-based, like Markdown in case of R Markdown. Therefore an R Markdown document can be managed with version control software, which are programs for managing multiple versions and contributions, even by different people, to the same documents. Version control provides traceability of authorship, a time machine for going back to any previous “working” version, and online collaboration such as on GitLab. This kind of workflow also stops the madness of using file names for versions yet still lets authors use awesome file names and apply domain-specific guidelines for packaging research.

R Markdown supports different programming languages besides the popular namesake R and is a sensible solution even if you do not analyse data with scripts nor have any code in your scholarly manuscript. It is easy to write, allows you to manage your bibliography effectively, can be used for websites, books or blogs, but most importantly it does not fall short when it is time to submit a manuscript article to a journal.

The rticles extension package for R provides a number of templates for popular journals and publishers. Since version 0.6 (published Oct 9 2018) these templates include the Copernicus Publications Manuscript preparations guidelines for authors. The Copernicus Publications staff was kind enough to give a test document a quick review and all seems in order, though of course any problems and questions shall be directed to the software’s vibrant community and not the publishers.

The following code snippet and screen shot demonstrate the workflow. Lines starting with # are code comments and explain the steps. Code examples provided here are ready to use and only lack the installation commands for required packages.

# load required R extension packages:
library("rticles")
library("rmarkdown")

# create a new document using a template:
rmarkdown::draft(file = "MyArticle.Rmd",
                 template = "copernicus_article",
                 package = "rticles", edit = FALSE)

# render the source of the document to the default output format:
rmarkdown::render(input = "MyArticle/MyArticle.Rmd")

{: .language-r}

The commands created a directory with the Copernicus Publications template’s files, including an R Markdown (.Rmd) file ready to be edited by you (left-hand side of the screenshot), a LaTeX (.tex) file for submission to the publisher, and a .pdf file for inspecting the final results and sharing with your colleagues (right-hand side of the screenshot). You can see how simple it is to format text, insert citations, chemical formulas or equations, and add figures, and how they are rendered into a high-quality output file.

All of these steps may also be completed with user-friendly forms when using RStudio, a popular development and authoring environment available for all operating systems. The left-hand side of the following screenshot shows the form for creating a new document based on a template, and the right-hand shows side the menu for rendering, called “knitting” with R Markdown because code and text are combined into one document like threads in a garment.

And in case you decide last minute to submit to a different journal, rticles supports many publishers so you only have to adjust the template while the whole content stays the same.

Sustainable access to supplemental data

Data published today should be published and properly cited using appropriate research data repositories following the FAIR data principles. Journals require authors to follow these principles, see for example the Copernicus Publications data policy or a recent announcement by Nature. Other publishers required, or still do today, to store supplemental information (SI), such as dataset files, extra figures, or extensive descriptions of experimental procedures, as part of the article. Usually only the article itself receives a digital object identifier (DOI) for long-term identification and availability. The DOI minted by the publisher is not suitable for direct access to supplemental files, because it points to a landing page about the identified object. This landing page is designed to be read by humans but not by computers.

The R package suppdata closes this gap. It supports downloading supplemental information using the article’s DOI. This way suppdata enables long-term reproducible data access when data was published as SI in the past or in exceptional cases today, for example if you write about a reproduction of a published article. In the latest version available from GitHub (suppdata is on its way to CRAN) the supported publishers include Copernicus Publications. The following example code downloads a data file for the article “Divergence of seafloor elevation and sea level rise in coral reef ecosystems” by Yates et al. published in Biogeosciences in 2017. The code then creates a mostly meaningless plot shown below.

# load required R extension package:
library("suppdata")

# download a specific supplemental information (SI) file
# for an article using the article's DOI:
csv_file <- suppdata::suppdata(
  x = "10.5194/bg-14-1739-2017",
  si = "Table S1 v2 UFK FOR_PUBLICATION.csv")
supplemental

# read the data and plot it (toy example!):
my_data <- read.csv(file = csv_file, skip = 3)
plot(x = my_data$NAVD88_G03, y = my_data$RASTERVALU,
     xlab = "Historical elevation (NAVD88 GEOID03))",
     ylab = "LiDAR elevation (NAVD88 GEOID03)",
     main = "A data plot for article 10.5194/bg-14-1739-2017",
     pch = 20, cex = 0.5)

{: .language-r}

Main takeaways

Authoring submission-ready manuscripts for journals of Copernicus Publications just got a lot easier. Everybody who can write manuscripts with a word processor can learn quickly R Markdown and benefit from a preproducible data science workflow. Digital notebooks not only improve day-to-day research habits, but the same workflow is suitable for authoring high-quality scholarly manuscripts and graphics. The interaction with the publisher is smooth thanks to the LaTeX submission format, but you never have to write any LaTeX. The workflow is based on an established Free and Open Source software stack and embraces the idea of preproducibility and the principles of Open Science. The software is maintained by an active, growing, and welcoming community of researchers and developers with a strong connection to the geospatial sciences. Because of the complete and consistent notebook, you, a colleague, or a student can easily pick up the work at a later time. The road to effective and transparent research begins with a first step – take it!

Acknowledgements

The software updates were contributed by Daniel Nüst from the project Opening Reproducible Research (o2r) at the Institute for Geoinformatics, University of Münster, Germany, but would not be able without the support of Copernicus Publications, the software maintainers most notably Yihui Xie and Will Pearse, and the general awesomeness of the R, R-spatial, Open Science, and Reproducible Research communities. The blog text was greatly improved with feedback by EGU’s Olivia Trani and Copernicus Publications’ Xenia van Edig. Thank you!

By Daniel Nüst, researcher at the Institute for Geoinformatics, University of Münster, Germany

[This article is cross posted-on the Opening Reproducible Research project blog]

References

EGU Photo Competition 2019: Now open for submissions!

EGU Photo Competition 2019: Now open for submissions!

If you are pre-registered for the 2019 General Assembly (Vienna, 7 – 12 April), you can take part in our annual photo competition! Winners receive a free registration to next year’s General Assembly!

The tenth annual EGU photo competition opened on 15 January. Up until 15 February, every participant pre-registered for the General Assembly can submit up to three original photos and one moving image on any broad theme related to the Earth, planetary, and space sciences.

Shortlisted photos will be exhibited at the conference, together with the winning moving image, which will be selected by a panel of judges. General Assembly participants can vote for their favourite photos and the winning images will be announced online on the last day of the meeting. 

If you submit your images to the photo competition, they will also be included in the EGU’s open access photo and video database, Imaggeo. You retain full rights of use for any photos or videos submitted to the database as they are licensed and distributed by EGU under a Creative Commons license.

You will need to register on Imaggeo so that the organisers can appropriately process your photos. For more information, please check the EGU Photo Competition page on Imaggeo.

Previous winning photographs from 2010 to 2018 can be seen on the previous winners’ pages.

In the meantime, get shooting!

EGU 2019 will take place from 07 to 12 April 2019 in Vienna, Austria. For more information on the General Assembly, see the EGU 2019 website and follow us on Twitter (#EGU19 is the official conference hashtag) and Facebook.

Imaggeo on Mondays: The best of imaggeo in 2018

Imaggeo on Mondays: The best of imaggeo in 2018

Imaggeo, our open access image repository, is packed with beautiful images showcasing the best of the Earth, space and planetary sciences. Throughout the year we use the photographs submitted to the repository to illustrate our social media and blog posts.

For the past few years we’ve celebrated the end of the year by rounding-up some of the best Imaggeo images. But it’s no easy task to pick which of the featured images are the best! Instead, we turned the job over to you!  We compiled a Facebook album which included all the images we’ve used  as header images across our social media channels and on Imaggeo on Mondays blog post in 2018 an asked you to vote for your favourites.

Today’s blog post rounds-up the best 12 images of Imaggeo in 2018, as chosen by you, our readers.

Of course, these are only a few of the very special images we highlighted in 2018, but take a look at our image repository, Imaggeo, for many other spectacular geo-themed pictures, including the winning images of the 2018 Photo Contest. The competition will be running again this year, so if you’ve got a flair for photography or have managed to capture a unique field work moment, consider uploading your images to Imaggeo and entering the 2019 Photo Competition.

A view of the southern edge of the Ladebakte mountain in the Sarek national park in north Sweden. At this place the rivers Rahpajaka and Sarvesjaka meet to form the biggest river of the Sarek national park, the Rahpaädno. The rivers are fed by glaciers and carry a lot of rock material which lead to a distinct sedimentation and a fascinating river delta for which the Sarek park laying west of the Kungsleden hiking trail is famous.

 

Melt ponds. Credit: Michael Tjernström (distributed via imaggeo.egu.eu)

The February 2018 header image used across our social media channels. The photos features ponds of melted snow on top of sea ice in summer. The photo was taken from the Swedish icebreaker Oden during the “Arctic Summer Cloud Ocean Study” in 2008 as part of the International Polar Year.

 

Karstification in Chabahar Beach, IRAN. Credit: Reza Derakhshani (distributed via imaggeo.egu.eu)

The June 2018 header image used for our social media channels. The photo was taken on the Northern coast of the Oman Sea, where the subduction of Oman’s oceanic plate under the continental plate of Iran is taking place.

 

River in a Charoite Schist. Credit: Bernardo Cesare (distributed via imaggeo.egu.eu)

A polarized light photomicrograph of a thin section of a charoite-bearing schist. Charoite is a rare silicate found only at one location in Yakutia, Russia. For its beautiful and uncommon purple color it is used as a semi-precious stone in jewelry.

Under the microscope charoite-bearing rocks give an overall feeling of movement, with charoite forming fibrous mats that swirl and fold as a result of deformation during metamorphism. It may be difficult to conceive, but these microstructures tell us that solid rocks can flow!

 

Refuge in a cloudscape. Credit: Julien Seguinot (distributed via imaggeo.egu.eu)

The action of glaciers combined with the structure of the rock to form this little platform, probably once a small lake enclosed between a moraine at the mountain side and the ice in the valley.

Now it has become a green haven in the mountain landscape, a perfect place for an alp. In the Alps, stratus clouds opening up on autumn mornings often create gorgeous light display.

 

Antarctic Fur Seal and columnar basalt Credit: Etienne Pauthenet (distributed via imaggeo.egu.eu).

This female fur seal is sitting on hexagonal columns of basalt rock, that can be found in Pointe Suzanne at the extreme East of the Kerguelen Islands, near Antarctica. This photo was the November 2018 header image for our social media channels.

 

Silent swamp predator. Credit: Nikita Churilin (distributed via imaggeo.egu.eu).

A macro shot of a Drosera rotundifolia modified sundew leaf waiting for an insect at swamp Krugloe. This photo was the January 2018 header image and one of the finalists in the 2017 Imaggeo Photo Competition.

 

Once there was a road…the clay wall. Credit: Chiara Arrighi (distributed via imaggeo.egu.eu)

The badlands valley of Civita di Bagnoregio is a hidden natural gem in the province of Viterbo, Italy, just 100 kilometres from Rome. Pictured here is the ‘wall,’ one of the valley’s most peculiar features, where you can even find the wooden structural remains of a trail used for agricultural purposes in the 19th and 20th centuries.

 

New life on ancient rock. Credit: Gerrit de Rooij (distributed via imaggeo.egu.eu).

“After two days of canooing in the rain on lake Juvuln in the westen part of the middle of Sweden, the weather finally improved in the evening, just before we reached the small, unnamed, uninhabited but blueberry-rich island on which this picture was taken. The wind was nearly gone, and the ragged clouds were the remainder of the heavier daytime cloud cover,” said Gerrit de Rooij, who took this photograph and provided some information about the picture, which features some of the oldest rocks in the world but is bursting with new life, in this blog post.

 

Cordillera de la Sal. Credit: Martin Mergili (distributed via imaggeo.egu.eu)

The photograph shows the Valle de la Luna, part of the amazing Cordillera de la Sal mountain range in northern Chile. Rising only 200 metres above the basin of the Salar de Atacama salt flat, the ridges of the Cordillera de la Sal represent a strongly folded sequence of clastic sediments and evapourites (salt can be seen in the left portion of the image), with interspersed volcanic material.

 

Robberg Peninsula – a home of seals. Credit: Elizaveta Kovaleva (distributed via imaggeo.egu.eu).

“This picture is taken from the Robberg Peninsula, one of the most beautiful places, and definitely one of my favorite places in South Africa. The Peninsula forms the Robberg Nature Reserve and is situated close to the Plettenberg Bay on the picturesque Garden Route. “Rob” in Dutch means “seal”, so the name of the Peninsula is translated as “the seal mountain”. This name was given to the landmark by the early Dutch mariners, who observed large colonies of these noisy and restless animals on the rocky cliffs of the Peninsula,” said Elizaveta Kovaleva in this blog post.

 

The great jump of the Tequendama. Credit: Maria Cristina Arenas Bautista (distributed via imaggeo.egu.eu)

Tequendama fall is a natural waterfall of Colombia. This blog post highlights a Colombian myth about the origins of the waterfall, which is tied to a real climate event.

 

If you pre-register for the 2019 General Assembly (Vienna, 07 – 12 April), you can take part in our annual photo competition! From 15 January up until 15 February, every participant pre-registered for the General Assembly can submit up three original photos and one moving image related to the Earth, planetary, and space sciences in competition for free registration to next year’s General Assembly!  These can include fantastic field photos, a stunning shot of your favourite thin section, what you’ve captured out on holiday or under the electron microscope – if it’s geoscientific, it fits the bill. Find out more about how to take part at http://imaggeo.egu.eu/photo-contest/information/.

Preprint power: changing the publishing scene

Preprint power: changing the publishing scene

Open access publishing has become common practice in the science community. In this guest post, David Fernández-Blanco, a contributor to the EGU Tectonics and Structural Geology Division blog, presents one facet of open access that is changing the publishing system for many geoscientists: preprints.

Open access initiatives confronting the publishing system

The idea of open access publishing and freely sharing research outputs is becoming widely embraced by the scientific community. The limitations of traditional publishing practices and the misuse of this system are some of the key drivers behind the rise of open access initiatives. Additionally, the open access movement has been pushed even further by current online capacities to widely share research as it is produced.

Efforts to make open access the norm in publishing have been active for quite some time now. For example, almost two decades ago, the European Geosciences Union (EGU) launched its first open access journals, which hold research papers open for interactive online discussion. The EGU also allows manuscripts to be reviewed online by anyone in the community, before finally published in their peer-reviewed journals.

This trend is also now starting to be reflected at an institutional level. For example, all publicly funded scientific papers in Europe could be free to access by 2020, thanks to a reform promoted in 2016 by Carlos Moedas, the European Union’s Commissioner for Research, Science and Innovation.

More recently, in late 2017, around 200 German universities and research organisations cancelled the renewal of their Elsevier subscriptions due to unmet demands for lower prices and an open access policies. Similarly, French institutions refused a new deal with Springer in early 2018. Now, Swedish researchers have followed suit, deciding to cancel their agreement with Elsevier. All these international initiatives are confronting an accustomed publishing system.

The community-driven revolution

Within this context, it’s no surprise that the scientific community has come up with various exciting initiatives that promote open access, such as creating servers to share preprints. Preprints are scientific contributions ready to be shared with other scientists, but that are not yet (or are in the process of being) peer-reviewed. A preprint server is an online platform hosting preprints and making them freely available online.

Many journals that were slow to accept these servers are updating their policies to adapt to the steadily growing increase of preprint usage by a wide-range of scientific communities. Now most journals welcome manuscripts hosted by a preprint server. Even job postings and funding agencies are changing their policies. For example, the European Research Council (ERC) Starting and Consolidator Grants are now taking applicant preprints into consideration.

Preprints: changing the publishing system

ArXiv is the oldest and most established preprint server. It was created in 1991, initially directed towards physics research. The server receives on average 10,000 submissions per month and now hosts over one million manuscripts. Arxiv sets a precedent for preprints, and now servers covering other scientific fields have emerged, such as bioRxiv and ChemRxiv.

Credit: EarthArXiv

EarthArXiv was the first to fill the preprint gap for the Earth sciences. It was launched in October 2017 by Tom Narock, an assistant professor at Notre Dame of Maryland University in Baltimore (US), and Christopher Jackson, a professor at Imperial College London (UK). In the first 24 hours after its online launch, this preprint server already had nine submissions from geoscientists.

The server holds now more than 400 preprints, approved for publication after moderation, and gets around 1,600 downloads monthly. The platform’s policy may well contribute to its success – EarthArXiv is an independent preprint server strongly supported by the Earth sciences community, now run by 125 volunteers. The logo, for example, was a crowdsourcing effort. Through social media, EarthArXiv asked the online community to send their designs; then a poll was held to decide which one of the submitted logos would be selected. Additionally, the server’s Diversity Statement and Moderation Policy were both developed communally.

Credit: ESSOAr

In February 2018, some months after EarthArXiv went live, another platform serving the Earth sciences was born: the American Geophysical Union’s Earth and Space Science Open Archive, ESSOAr. The approach between both platforms is markedly different; ESSOAr is partially supported by Wiley, a publishing company, while EarthArXiv is independent of any publishers. The ESSOAr server is gaining momentum by hosting conference posters, while EarthArXiv plans to focus on preprint manuscripts, at least for the near future. The ESSOAr server hosts currently 120 posters and nine preprints.

What is the power of preprints?

How can researchers benefit from these new online sources?

No delays:

Preprint servers allow rapid dissemination. Through preprints, new scientific findings are shared directly with other scientists. The manuscript is immediately available after being uploaded, meaning it is searchable right away. There is no delay for peer-review, editorial decisions, or lengthy journal production.

Visibility:

A DOI is assigned to the work, so it is citable as soon as it is uploaded. This is especially helpful to early career scientists seeking for employment and funding opportunities, as they can show and prove their scholarly track record at any point.

Engagement:

Making research visible to the community can lead to helpful feedback and constructive, transparent discussions. Some servers and participating authors have promoted their preprints through social media, in many cases initiating productive conversations with fellow scientists. Hence, preprints promote not only healthy exchanges, but they may also lead to improvements to the initial manuscript. Also, through these exchanges, which occur outside of the journal-led peer-review route, it is possible to network and build collaborative links with fellow scientists.

No boundaries:

Preprints allow everyone to have access to science, making knowledge available across boundaries.

The servers are open without cost to everyone forever. This also means tax payers have free access to the science they pay for.

Backup:

Preprint servers are a useful way to self-archive documents.  Many preprint servers also host postprints, which are already published articles (after the embargo period applicable to some journals).

Given the difference between the publishing industry’s current model and preprint practices, it is not surprising to find an increasing number of scientists stirring the preprint movement. It is possible that many of such researchers are driven by a motivation to contribute to a transparent process and promote open science within their community and to the public. This motivation is indeed the true power of preprints.

Editor’s note: This is a guest blog post that expresses the opinion of its author, whose views may differ from those of the European Geosciences Union. We hope the post can serve to generate discussion and a civilised debate amongst our readers.