GeoLog

access

How to increase reproducibility and transparency in your research

How to increase reproducibility and transparency in your research

Contemporary science faces many challenges in publishing results that are reproducible. This is due to increased usage of data and digital technologies as well as heightened demands for scholarly communication. These challenges have led to widespread calls for more research transparency, accessibility, and reproducibility from the science community. This article presents current findings and solutions to these problems, including recent new software that makes writing submission-ready manuscripts for journals of Copernicus Publications a lot easier.

While it can be debated if science really faces a reproducibility crisis, the challenges of computer-based research have sparked numerous articles on new good research practices and their evaluation. The challenges have also driven researchers to develop infrastructure and tools to help scientists effectively write articles, publish data, share code for computations, and communicate their findings in a reproducible way, for example Jupyter, ReproZip and research compendia.

Recent studies showed that the geosciences and geographic information science are not beyond issues with reproducibility, just like other domains. Therefore, more and more journals have adopted policies on sharing data and code. However, it is equally important that scientists foster an open research culture and teach researchers how they adopt more transparent and reproducible workflows, for example at skill-building workshops at conferences offered by fellow researchers, such as the EGU short courses, community-led non-profit organisations such as the Carpentries, open courses for students, small discussion groups at research labs, or individual efforts of self-learning. In the light of prevailing issues of a common definition of reproducibility, Philip Stark, a statistics professor and associate dean of mathematical and physical sciences at the University of California, Berkeley, recently coined the term preproducibility: “An experiment or analysis is preproducible if it has been described in adequate detail for others to undertake it.” The neologism intends to reduce confusion and also to embrace a positive attitude for more openness, honesty, and helpfulness in scholarly communication processes.

In the spirit of these activities, this article describes a modern workflow made possible by recent software releases. The new features allow the EGU community to write preproducible manuscripts for submission to the large variety of academic journals published by Copernicus Publications. The new workflow might require hard-earned adjustments for some researchers, but it pays off because of an increase in transparency and effectivity. This is especially the case for early career scientists. An open and reproducible workflow enables researchers to build on others’ and own previous work and better collaborate on solving the societal challenges of today.

Reproducible research manuscripts

Open digital notebooks, which interweave data and code and can be exported to different output formats such as PDF, are powerful means to improve transparency and preproducibility of research. Jupyter Notebook, Stencila and R Markdown let researchers combine long-form text of a publication and source code for analysis and visualisation in a single document. Having text and code side-by-side makes them easier to grasp and ensures consistency, because each rendering of the document executes the whole workflow using the original data. Caching for long-lasting computations is possible, and researchers working with supercomputing infrastructures or huge datasets may limit the executed code to purposes of visualisation using processed data as input. Authors can transparently expose specific code snippets to readers but also publish the complete source code of the document openly for collaboration and review.

The popular notebook formats are plain text-based, like Markdown in case of R Markdown. Therefore an R Markdown document can be managed with version control software, which are programs for managing multiple versions and contributions, even by different people, to the same documents. Version control provides traceability of authorship, a time machine for going back to any previous “working” version, and online collaboration such as on GitLab. This kind of workflow also stops the madness of using file names for versions yet still lets authors use awesome file names and apply domain-specific guidelines for packaging research.

R Markdown supports different programming languages besides the popular namesake R and is a sensible solution even if you do not analyse data with scripts nor have any code in your scholarly manuscript. It is easy to write, allows you to manage your bibliography effectively, can be used for websites, books or blogs, but most importantly it does not fall short when it is time to submit a manuscript article to a journal.

The rticles extension package for R provides a number of templates for popular journals and publishers. Since version 0.6 (published Oct 9 2018) these templates include the Copernicus Publications Manuscript preparations guidelines for authors. The Copernicus Publications staff was kind enough to give a test document a quick review and all seems in order, though of course any problems and questions shall be directed to the software’s vibrant community and not the publishers.

The following code snippet and screen shot demonstrate the workflow. Lines starting with # are code comments and explain the steps. Code examples provided here are ready to use and only lack the installation commands for required packages.

# load required R extension packages:
library("rticles")
library("rmarkdown")

# create a new document using a template:
rmarkdown::draft(file = "MyArticle.Rmd",
                 template = "copernicus_article",
                 package = "rticles", edit = FALSE)

# render the source of the document to the default output format:
rmarkdown::render(input = "MyArticle/MyArticle.Rmd")

{: .language-r}

The commands created a directory with the Copernicus Publications template’s files, including an R Markdown (.Rmd) file ready to be edited by you (left-hand side of the screenshot), a LaTeX (.tex) file for submission to the publisher, and a .pdf file for inspecting the final results and sharing with your colleagues (right-hand side of the screenshot). You can see how simple it is to format text, insert citations, chemical formulas or equations, and add figures, and how they are rendered into a high-quality output file.

All of these steps may also be completed with user-friendly forms when using RStudio, a popular development and authoring environment available for all operating systems. The left-hand side of the following screenshot shows the form for creating a new document based on a template, and the right-hand shows side the menu for rendering, called “knitting” with R Markdown because code and text are combined into one document like threads in a garment.

And in case you decide last minute to submit to a different journal, rticles supports many publishers so you only have to adjust the template while the whole content stays the same.

Sustainable access to supplemental data

Data published today should be published and properly cited using appropriate research data repositories following the FAIR data principles. Journals require authors to follow these principles, see for example the Copernicus Publications data policy or a recent announcement by Nature. Other publishers required, or still do today, to store supplemental information (SI), such as dataset files, extra figures, or extensive descriptions of experimental procedures, as part of the article. Usually only the article itself receives a digital object identifier (DOI) for long-term identification and availability. The DOI minted by the publisher is not suitable for direct access to supplemental files, because it points to a landing page about the identified object. This landing page is designed to be read by humans but not by computers.

The R package suppdata closes this gap. It supports downloading supplemental information using the article’s DOI. This way suppdata enables long-term reproducible data access when data was published as SI in the past or in exceptional cases today, for example if you write about a reproduction of a published article. In the latest version available from GitHub (suppdata is on its way to CRAN) the supported publishers include Copernicus Publications. The following example code downloads a data file for the article “Divergence of seafloor elevation and sea level rise in coral reef ecosystems” by Yates et al. published in Biogeosciences in 2017. The code then creates a mostly meaningless plot shown below.

# load required R extension package:
library("suppdata")

# download a specific supplemental information (SI) file
# for an article using the article's DOI:
csv_file <- suppdata::suppdata(
  x = "10.5194/bg-14-1739-2017",
  si = "Table S1 v2 UFK FOR_PUBLICATION.csv")
supplemental

# read the data and plot it (toy example!):
my_data <- read.csv(file = csv_file, skip = 3)
plot(x = my_data$NAVD88_G03, y = my_data$RASTERVALU,
     xlab = "Historical elevation (NAVD88 GEOID03))",
     ylab = "LiDAR elevation (NAVD88 GEOID03)",
     main = "A data plot for article 10.5194/bg-14-1739-2017",
     pch = 20, cex = 0.5)

{: .language-r}

Main takeaways

Authoring submission-ready manuscripts for journals of Copernicus Publications just got a lot easier. Everybody who can write manuscripts with a word processor can learn quickly R Markdown and benefit from a preproducible data science workflow. Digital notebooks not only improve day-to-day research habits, but the same workflow is suitable for authoring high-quality scholarly manuscripts and graphics. The interaction with the publisher is smooth thanks to the LaTeX submission format, but you never have to write any LaTeX. The workflow is based on an established Free and Open Source software stack and embraces the idea of preproducibility and the principles of Open Science. The software is maintained by an active, growing, and welcoming community of researchers and developers with a strong connection to the geospatial sciences. Because of the complete and consistent notebook, you, a colleague, or a student can easily pick up the work at a later time. The road to effective and transparent research begins with a first step – take it!

Acknowledgements

The software updates were contributed by Daniel Nüst from the project Opening Reproducible Research (o2r) at the Institute for Geoinformatics, University of Münster, Germany, but would not be able without the support of Copernicus Publications, the software maintainers most notably Yihui Xie and Will Pearse, and the general awesomeness of the R, R-spatial, Open Science, and Reproducible Research communities. The blog text was greatly improved with feedback by EGU’s Olivia Trani and Copernicus Publications’ Xenia van Edig. Thank you!

By Daniel Nüst, researcher at the Institute for Geoinformatics, University of Münster, Germany

[This article is cross posted-on the Opening Reproducible Research project blog]

References

Open geoscience

Not so long ago I was in a meeting with EGU’s young scientist representatives, who had gathered online to discuss the issues facing those early in their academic careers. One member of this dedicated team put forward a compelling notion: that the future of open access is in the hands of today’s early-career researchers. This post aims to answer the question that followed: “how could EGU’s team of eager early-career researchers help their peers grab hold of the open opportunities out there?” by offering up a few routes to open science…

A lot of hard work, carefully created figures and data don’t make it to your publications, but they are still a useful part of the scientific process and can help other scientists if they can see what you found. A great way to share this sort of information is on Figshare – and it’s citable too.

The same goes for conference presentations – don’t let them gather dust on your desktop. The aim of a conference is to share your work more widely, so, when you’re done, put your slides up on sites like SlideShare to share it beyond the conference. Keep your contact details in the presentation and you could find yourself with new collaborators.

Open the doors to more collaborative geoscience. (Credit: Oxyman)

Open the doors to more collaborative geoscience. (Credit: Oxyman)

Posters can be made open too. After our annual General Assembly, we invite authors to upload their posters and presentations, but there’s no need to restrict your openness to the EGU conference. F1000 posters is an open access repository for posters in biology, so if your work bridges the biogeosciences, be sure to submit it there. If you’re in another field, try Figshare (despite the name, it’s not just for figures!).

The EGU offers a number of open access journals for the Earth, planetary and space sciences, but there are many more journals where you can publish your work, if the scope of EGU journals doesn’t quite cover your field. The American Geosciences Institute hosts a comprehensive list of open geo journals on their website, and the Directory of Open Access Journals is exactly what it says on the tin – a hub of high quality open access publications. The stringent criteria required to enter their database means that predatory open access journals are filtered out.

But what about impact? Going open doesn’t mean lower impact, in fact, with your paper being openly available to all, it’s more likely to be seen and cited, so the impact at the article level could well be higher than if it was in a subscription-based publication. You can track the impact of your research outputs using ImpactStory, or by using the Altmetric bookmarklet to keep tabs on more than just citations, from where it’s featured in news articles and blog posts to where it’s been mentioned on social media and more.

Don’t let your work gather dust. (Credit: How Matters)

Don’t let your work gather dust, share it. (Credit: How Matters)

The European Research Council considers that providing free online access to publications is the most effective way of ensuring that the fruits of the research it funds can be accessed, read and used as the basis for further research. Many funders are also moving in this direction, providing further incentive to publish open access papers.

When your manuscript is ready, submit it to a preprint server (e.g., arxiv.org, peerj.com, or biorxiv.org). EGU papers have an open review process, which helps ensure the assessment of a submitted manuscript is thorough and fair, but it also means that the science is out in the open sooner – the merit of a preprint. This helps establish precedence, highlighting that you were working on something first, and can remove barriers to scientific progress (we all know peer review can take a while!). Some establishments aren’t a fan of this though; so before you put a preprint online, check Sherpa/Romeo to make sure your institute, funding body and the journal(s) you’re interested in are on board with the benefits of preprints.

Models are near ubiquitous in the geosciences and their importance in assessing the impact of climate change goes without saying. But what if you couldn’t replicate the results of, say, an important climate model? You would need to go back to the model’s code and see where your calculations and the ones before differed. Sharing code is compulsory for journals like Geoscientific Model Development, but many don’t stipulate the need to share it. You can go one step further to help your community by sharing your code on GitHub, whether it’s compulsory for your latest article or not.

Free the work from your desktop folders. (Credit: opensource.com)

Free the work from your desktop folders. (Credit: opensource.com)

With all these opportunities to go open, wouldn’t it be great if you had an opportunity to keep track of all your outputs? There’s an answer for that too – ORCID. ORCID is a unique researcher identifier that links all your research outputs, from manuscripts and conference abstracts to grant submissions and research figures, ensuring you get credit for the work you do.

For something less formal, but perhaps more open in that you can go beyond the academic community, try blogging about your research – we readily welcome guest posts here on GeoLog, but there are many places you can set your science free. Try The Conversation, SciLogs, pitching your idea to another geoscience blogger or better yet, establishing your own blog to write on. You can also go further to promote your research and facts about your field on social media – a great way to form connections with other academics and put your work in the public eye.

These are just a few thoughts on open geoscience, but there are likely more ways go open than could ever be summarised in a single post. Take this is a starting point, seek out more options for yourself, and, if you already have a few tips on how to make geoscience more open, spread the word.

By Sara Mynott, EGU Communications Officer

If you have any thoughts on other ways geoscientists can move towards open science, please add your thoughts to the comment thread below.