Getting data from the Arctic is often difficult and expensive – instead, stand on the shoulders of giants and investigate over 6000 datasets preserved for future download and reuse in the Arctic Data Center! Read on for more information about the Arctic Data Center and the data contained therein.
The Arctic Data Center is the primary data repository for the Arctic section of the US National Science Foundation (NSF)’s Office of Polar Programs. We’re best known in the research community as a data archive – researchers upload their data to preserve it for the future and make it available for reuse. This isn’t the end of that data’s life, though. These data can then be downloaded by other researchers for different analyses or synthesis projects. In addition to being a data discovery portal, we also offer top-notch tools, support services, training opportunities, and data rescue services.
About the Data
Data reuse is at the core of our mission. We are committed to making sure our data is discoverable and re-usable by researchers. Not only do we have raw datasets, but we also encourage researchers to upload software, code, and packages, as well as the provenance of data products. All of the associated products we have are also available for download and reuse. We really care about sharing!
As it stands right now, we have a great deal of data at the fingertips of researchers – over 56 TB and over 6000 individual datasets (a single metadata record that describes one or more raw data files). With those 6000+ datasets, we track the number of views, downloads, and citations of each of them, both on the catalog level and the dataset level, to investigate how often the data is being looked at and used.
The data that we have in the Arctic Data Center comes from a wide variety of disciplines, which come from the different programs funded by the US National Science Foundation (NSF). For example, the Arctic Observing Network supports scientific and community-based observations of biodiversity, ecosystems, human societies, land, ice, marine and freshwater systems, and the atmosphere as well as their social, natural, and/or physical environments, so that encompasses a lot right there in just that one program. There’s a large diversity of data files – from tabular data in .csv to images in .jpg to applications in .R – so researchers can look at remote sensing images, listen to passive acoustic audio files, run applications, or find something else entirely.
The data we have is licensed at the time of upload as either CC-0 or CC-BY. Because we license data this way, this enables these data to be reusable and downloadable for re-use and synthesis work. To make it easier to cite, we assign a Digital Object Identifier (DOI) to each published dataset. Not only does that help researchers be cited more often, but it also promotes a more transparent process of tracking the data used in papers. We just released a tool that enables researchers to log a citation associated with a particular dataset. Please don’t hesitate to share your Arctic data with us and the rest of the Arctic research community.
Tools & Services
In addition to the citation tool, we have a number of tools available to submitters and researchers who are there to download data. We also partner with other organizations, like Make Data Count and DataONE, and leverage those partnerships to create a better data experience. Here some examples:
- Metadata Quality Checks: We know that data quality is fundamental for researchers to find datasets and to have trust in them to use them for another analysis. For every submitted dataset, the metadata is run through a quality check to increase the completeness of submitted metadata records. These checks are seen by the submitter as well as are available to those that view the data, which helps to increase knowledge of how comprehensive their metadata is before submission. That way, the metadata uploaded to the Arctic Data Center is as complete as possible, and close to following the guideline of being understandable to any reasonable scientist.
- Curation Support: Metadata quality checks are the automatic way that we ensure quality of data in the repository, but the real quality and curation support is done by our curation team. The process by which data gets into the Arctic Data Center is iterative, meaning that our team works with the submitter to ensure good quality and completeness of data.
- Data Portals: Another tool we have available is our data portals service. Data portals allow researchers to create a collection of data on a customized website that showcases that data alongside other important project or lab information. Researchers can use a portal as a lab website or a project landing page; with custom pages, colors, and branding, the portal’s usefulness extends well beyond the Arctic Data Center. This additional information – captured on freeform text pages – is added to the portal with a user-friendly interface.
In addition to the tools and support services, we also interact with the community. We hold trainings focused on reproducible research and open data science skills. We hold these trainings approximately twice a year, and they are open to everyone actively involved in Arctic research from grad student on up. For self-motivated learners, our instructional materials are all open source and available to download here. We’re invested in helping to train the Arctic science community in reproducible techniques, since it facilitates a more open culture of data sharing and reuse.
We also strive to keep our fingers on the pulse of what researchers are looking for in terms of support. We’re active on Twitter to share Arctic updates, data science updates, and specifically Arctic Data Center updates, but we’re also happy to feature new papers or successes that you all have had with working with the data. We can also take data science questions if you’re running into those in the course of your research, or how to make a quality data management plan. Follow us on Twitter @arcticdatactr and interact with us – we love to be involved in your research as it’s happening as well as after it’s completed.
Feel free to reach out at any time to support@arcticdata.io if we can be of service!
This work was supported by NSF award #1546024
Further Reading
- Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). DOI: 10.1038/sdata.2016.18
- NSF 21-041 Dear Colleague Letter: Supporting Data and Sample Reuse in Polar Research: https://www.nsf.gov/pubs/2021/nsf21041/nsf21041.jsp
- Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, et al. “Ten Simple Rules for the Care and Feeding of Scientific Data.” PLOS Computational Biology 10, no. 4 (April 24, 2014): e1003542. DOI: 10.1371/journal.pcbi.1003542.
Edited by Giovanni Baccolo and Marie Cavitte
Erin McLean is the Community Engagement and Outreach Coordinator with the Arctic Data Center, headquartered at the National Center for Ecological Analysis and Synthesis (NCEAS) in Santa Barbara. She holds a bachelor of arts from Boston University in marine science and English literature and a master of science from the University of Rhode Island in biological and environmental sciences. A scientist, educator, and writer, she has built her career on making science more accessible to all. She can be reached at mclean@nceas.ucsb.edu.