Earth and Space Science Informatics

Earth and Space Science Informatics

Research Software Engineers from the Geosciences assemble for the first time at EGU General Assembly 2018

Research Software Engineers from the Geosciences assemble for the first time at EGU General Assembly 2018

On April 12th 2018, the first Research Software Engineers (RSEs) for geosciences meeting was held at the European Geophysical Union (EGU) General Assembly (GA) in Vienna, Austria. The EGU GA is a huge event with over 15.000 people from more than 100 countries. It has a diverse programme with thousands of posters and hundreds of sessions, but what it lacked was an event to bring together scientists who contribute to research software. Daniel Nüst from the Institute for Geoinformatics, Germany, proposed the idea of such an event to a group of regular EGU GA attendees from the German RSE chapter. He was joined by Martin Hammitzsch (GFZ eScience Centre, Germany), Bernadette Fritzsch (AWI, Computing and Data Centre), and David Topping (University of Manchester) as co-conveners for a Townhall Meeting “Research Software Engineers in the Geosciences”. Townhall Meetings, or “townhalls”, are union-wide events. They allow participants to take part in open discussions covering a variety of topics. Townhalls take place in the evening after a full day of regular conference and poster sessions, so the motivation of people showing up is unquestionably high.

For the RSE townhall, it was difficult to stand out in the programme, it being a first time event at a large conference cutting across many domains and 24 divisions, from biogeosciences to seismology and paleontology, with over 600 sessions over the course of a whole week. But searching the submissions some weeks before the assembly for “software”, it seemed clear that software plays an important role for scientists attending EGU as it does of any researcher, so the conveners were hopeful to welcome a few people. And they did! About 40 people from 7 countries representing the wide range of EGU divisions joined the meeting. It was a good mix of seasoned RSE folks and early career scientists as well as senior researchers for whom the topic was relatively new.

First research software engineers meeting at the EGU just started. Great to see people engaging with the role behind a crucial part of science. #rse #RSEng @SoftwareSaved @RSE_de @nordic_rse @nl_rse #EGU18

— Daniel Nüst (@nordholmen) 12. April 2018

The meeting kicked of with a short welcome by the initiator, Daniel, followed by Jens Klump (Science Leader Earth Science Informatics at CSIRO, Australia), Deputy President of EGU’s ESSI Division and advisory board member of the Australian and New Zealand chapter RSE-AUNZ. Both pointed out the relevance of contributions to science made by software and thereby the people contributing to that software in any way.

The motivations for national and international RSE activities were further detailed by three representatives from national chapters. David Topping from UK RSE, the oldest and largest RSE organisation, took a look at the definition of an RSE, at the community history, and its current state in the UK and beyond. Over 15 local groups already exist and more are forming at a high rate, sometimes even competing over members. Martin Hammitzsch presented the German chapter, de-RSE. Initiated only 1.5 years ago, he shared the group’s objectives, how they work to build a community, challenges they face and some lessons learned: a great resource for the attendees from countries without any organisational structure yet. Third up was Niels Drost (Netherlands eScience Center), who introduced the youngest European chapter NL-RSE and its core team, which already generated a considerable reach across the Netherlands.

4 national chapters, their history and plans, are introduced to ~40 participants, albeit meeting at the end of a long conference day. A good start for #EGU18RSE at

— Daniel Nüst (@nordholmen) 12. April 2018

After the short talks, we took advantage of the group size, had a complete round of short introductions, and enganged in a discussion. We found that one of the big challenges was to reach people engaged in Research Software Engineering (RSEng) activities at their work, especially outside of ESSI, and those for whom the “RSE” label does not ring a bell yet. The need for outreach also applies to holders of an office within the union, to reach better acknowledgements of the needs of RSEs in universities, scientific unions, and at scientific conferences.

Some good ideas came up and we plan to reach out to EGU and ESSI leadership and share them. For example, EGU could increase recognition of RSEs and their work by awarding a medal, by offering a special poster track for contributions to scientific software (allowing an additional “software submission” per author), or by tagging abstracts as “RSE” similarly to the “ECS” labels. These are ideas for “top-down” activites that we would like to advocate within the organisation. The usefulness and overall potential of such domain-specific (taking all EGU members as a group for a moment) actions, who are lateral to the national chapters, was commonly agreed on.

But there were also ideas for “bottom-up” activities which can support the RSE organisations’ causes, for example scientists organising sessions related to RSE roles and activities within their divisions, offering short courses specifically around RSEng capabilities (and also labeling the numerous existing courses as such), or organising software and data carpentry courses in the week before or after the conference. One participant suggested a session “Software development for professors – what do my students do with their computers?”, which seems funny at first, but at a second look goes to the heart of RSE outreach activities for raising awareness and teaching software-related skills. It was great to follow the lively discussions, which were enriched with a common understanding of the values and importance of diversity and openness. If you think about convening a session on scientific software yourself at an EGU GA, please get in touch!

An important aspect of structured RSE activities are surveys, because the people self-identifying as RSE are diverse and wide-spread and the roles that RSEs play in scientific research are manifold. We want to contribute to the process of understanding the needs and diversity of people involved in RSEng with the following survey on EGU attendees:

The townhall meeting was a good start to spreading the word about the goals of RSE organisations and the activities to put Research Software Engineering on the map of all stakeholders in science, such as researchers, publishers, funding agencies, and scientific unions. What can we do better next year? Hopefully we can do what more “established” townhalls can offer: snacks and drinks! We should also announce and prominently place a sticker table. Apart from that, this first townhall did an excellent job, just like other long-running townhall meetings at EGU GA: It provided a place for like-minded people to connect and for newcomers to dip their toes into a new topic and be introduced to members of an international friendly community of dedicated people who “do science with code”.

The slides (download all slides here) include many links to further resources on the history and state of RSE-related activities. Please let us know what you think on Twitter: #egurse. See you at the RSE Townhall Meeting at EGU 2019!

RSE Townhall Conveners

Daniel Nüst
Martin Hammitzsch
David Topping
Bernadette Fritzsch

Good practice in the evaluation of researchers

A new statement on good practice in the evaluation of researchers and research programmes has been posted by three national academies (Académie des Sciences, Leopoldina and Royal Society).

It states that “the use of bibliometric indicators for early career scientists must in particular be avoided. Such use will tend to push scientists who are building their career into wellestablished/fashionable research fields, rather than encouraging them to tackle new scientific challenges.” (p. 4). Read the full statement at:




Strengthening Early Career Scientists (ECS) in EGU ESSI

Strengthening Early Career Scientists (ECS) in EGU ESSI

The number of presentations from ECS in the ESSI sessions has been low during recent EGU General Assemblies. We are hence currently trying to get greater involvement and recognition for the ECS in the ESSI Division.

How to get involved as ECS in the ESSI division?

Join the ESSI ECS mailing list

We have a dedicated ESSI ECS mailing list that can be used for discussion between the ESSI ECSs. It is also used for announcements, such as dedicated meetings at the GA 2018.

Submit an abstract to an ESSI session

As mentioned above, we aim to have more ECS presenters in the ESSI sessions. The call for papers has just been published and contributions from ECSs are welcomed and encouraged. A description of how to submit an abstract is available here.

Apply for travel grants

ECSs can also apply for travel grants. Detailed information on travel grants is given here.

Become a mentor

Have you already been at previous EGU GAs? Then you can volunteer as a mentor for ECS who are attending the GA for the first time. Alternatively, if it will be your first EGU why not sign up as a mentee and gain some insider knowledge on how to get the best out of the General Assembly as well as building your network of contacts.


Do you have some news on your own research and/or on topics relevant for other ESSI ECSs? You are welcome to contribute with your own posts to the EGU ESSI blog. If you’re interested please contact us directly.

Online Meetings

We are planning to organize online meetings for ESSI ECSs. These meetings are used to get to know each other and to discuss questions and answers. The online meetings are announced on the ESSI ECS mailing list.

General information

General information and background material for ECSs is available on the general EGU ECS website.

Questions? Ideas?

If you are having ideas for the ESSI ECS community or are having any questions, feel free to use the mailing list or contact us directly.

PICO in the picture

Like everyone else, in the beginning I was skeptical of the newly introduced presentation format at the EGU – the PICO sessions. PICO stands for Presenting Interactive Content. Half talk, half poster – this is a new design that demands a completely new and unfamiliar preparation of the presenter, and yes admittedly at first this meant additional work. However, already during the creation of my first PICO presentation, I realized that the scientific content can be filled with much more life than in a poster, and with much more information than in a talk. This made me change my opinion quickly. Before the PICO gets presented, the presenter has the chance to advertise it in the so called ‘2-minute madness’. Usually they are less crazy than it sounds, but the idea is great – to force scientists to minimize the story to the key facts. Let’s be honest, nobody reads all the abstracts in the program.

Why choose a PICO session at EGU 2015?

PICO presentations at EGU 2014. (Credit: EGU/Stephanie McClellan)

So with this short advertisement the scientists have the chance to get more uninformed people into visiting their research. Later, during the PICO session, I noticed that most attendees are keen to hear more about your scientific adventure. This means, it starts like a regular oral presentation. However, pretty soon, the people start to ask specific or sometimes very specific questions like with a poster. Then you have the chance to guide the presentation into another direction, but you still have your supporting tools at your fingertips – like videos, animations, high resolution graphics, large tables, etc. Next, the questions lead into a conversation and quickly transform into a discussion, which usually attracts other scientists. A big advantage is that questions are asked at the very moment they come up, while presenting the respective slide and do not need to wait until after the talk.

For me the PICO sessions are a valuable addition to the traditional formats. What is missing? Maybe the possibility to access websites as well. Maybe a session mixture of talks, posters, and PICOs. Anyhow, I think particularly the ESSI division should push for PICOs. I cannot think of a better format to show results like in posters or talks, but at the same time have the opportunity to dig much deeper into e.g. programming languages, IT infrastructures, visualization, etc. And it is a perfect showcase of how nowadays topics – in this case scientific presentation types – are positively influenced due to informatics. ESSI should be upfront with these developments as well.

Please watch the non-interactive PICO presentation here.  The interactive version was uploaded to the EGU site.

New ECS representative!

At the EGU ESSI division meeting, Christoph Stasch was elected as the new representative of the early career scientists in the ESSI division following Jennifer Roelens. Christoph works as research associate and consultant at 52°North, a non-profit research organisation in the field of applied geoinformatics. His focus is on simplifying the integration of sensors and processing modules (e.g. environmental simulation models) in spatial information infrastructures and GIS applications. As ESSI ECS representative, Christoph hopes to strengthen the network of ECS in the ESSI division. Interested in participating? Then get in touch with the network or Christoph directly.

Happy GISday!!

As many people within the ESSI division have at least once used GIS software, we would like to wish you a happy GIS day!

Picture by ESRI


Every day, millions of decisions are being powered by Geographic Information Systems (GIS) for education, government, non-profit organizations and businesses. ESSI deals with community-driven and multidisciplinary challenges. GIS plays an important role to develop data-driven solutions that help many organizations visualize, analyze, interpret and present data.


Boon of big data for geoscience investigations

The amount of digital data per person is rising with a geometric progression since 2009. According to the latest report of Oyster IMS, the digital universe will grow by a factor of 300 between 2005 and 2020: from 130 Exabytes to 40,000 Exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every person in 2020). Earth sciences is one of the domains where huge volumes of data are collected.

Let’s take a database of geological samples of the Institute of Geology, Taras Shevchenko National University as an example of Big Data (BD). This database consists of tables, which contain geochemical, petrological, mineralogical and petrophysical information of 11.800 samples of granitoids from the Ukrainian shield (crystalline massif). These tables are downloaded into MS Access tables and consist of general (information about number of sample, geographical coordinates of sample’s pick-point location, mineralogical and chemical composition, photos of fine-section of the sample, petrophysical information) and additional (characteristic of region and geological structures from which sample is taken) tables. Each data entry is assigned a unique identifier to link data tables with each other. The database structure allows to request information about a sample by entering its unique identification number, the type of rock or characteristics about its content. This saves time and energy.


The biggest volumes of information in geophysics are primarily represented by 3D seismic data. Such huge amounts of data are stored because of the large areas, high density and high resolution of the acquired information. For example, “an area of 200 km2 of 3D seismic off-shore acquisition data occupied 30-40 GB of information in 1999-2001; in 2004 with 968 channels per block, 100 – 130 GB of data was acquired. 220-250 GB of data was obtained with 1280 channels in 2010.” – says seismic processor and interpreter P. Kuzmenko. “But nowadays to investigate precisely the structure of a reservoir, a 3-component wide-azimuth and full-azimuth seismic acquisition is applied, which is done with 7600 – 51200 channels and as a result the amount of digital data raises to 1.5 – 10 Tb. Doing this on land, this can take up a volume to 100 Tb. Modern acquired data is stored compactly in electronic databases as digital data. In addition, there is a great amount of ancient geophysical information on paper (maps, well-loggings, reports), which should also be stored in electronic databases to prevent their loss. It means that amount of data for interpretation will raise in geometric progression, if paper materials are converted into digital”.


You can say, “Why do geoscientists need such amounts of raw data? They can analyze it and then delete it!” However, things are not that simple. Data from previous investigations may be useful for further stages of oil-field exploitation, scientific research and comparison with nearby territories.

To sum up, great resources are needed to store and analyze huge amounts of data, but thanks to BD storage and analyzation techniques, important decisions are taken and as a result technologies are developed very rapidly especially in the past 10-20 years. Zettabytes of scientific data contains important information, which can help to develop sustainable life-style, predict and sometimes even prevent dangerous events.

Dear readers!

The team encourages you to send us your own thoughts about Big Data or other ESSI related topics! We invite students, scientists, professionals and other interested in geosciences persons to answer several questions:

  1. What is the boon and what is the bane of your research with Big Earth Science Data?
  2. What challenges do you face in your daily grind of data processing?
  3. What challenges of Big Earth Science data do you address with your research / current work?


Gratitude to Ph.D. in geophysics P. Kuzmenko and Ph.D. in geophysics O. Shabatura for provided information for this article.


Big Earth Science Data – Boon or bane?

Big Earth Science Data – Boon or bane?

We are in the era of Big Data. Big Data is a ‘hot’ topic. It is a popular term often associated with an increase in volume, variety and velocity of data. The Copernicus programme for example, the European Union’s flagship programme on monitoring the Earth’s environment using satellite and in-situ observations, anticipates a massive increase in satellite data volume. It is estimated that solely the Sentinel missions, Copernicus’ space component, will produce 4TB of processed data each day (FDC 2016).

The European Centre for Medium-Range Weather Forecasts (ECMWF) hosts the Meteorological and Archival Retrieval System (MARS), which is the world’s largest archive of meteorological data. The archive currently holds more than 90 PBs of data and continues to grow by additional 3 PB every month.

Big Data and an increase in data volume comes along with an increase in computing and processing power. Or is it the other way around? When Gordon Moore, co-founder of Intel, introduced 1965 his observation that the number of integrated circuits doubles every two years, he did not think of an associated exponential growth of data. Moore’s law since then has been adjusted to 18 month, but it is equally applicable to data growth.

The increase in data volume is partly due to new sensor technologies and new kind of data. The variety and type of data has never been more diverse. Sensors and satellites continuously collect data and monitor the state of the Earth. The Internet of Things brings a  constant flow of unstructured data content. The speed at which data is generated and moved around has increased tremendously. In 2013, IBM was releasing a number that 90% of all of the world’s data has been generated in the past two years. This number has most likely been growing since.

DIK Pyramid - From raw data to value-added information and knowledge

The data, information, knowledge hierarchy – How raw data is turned into value-added information and knowledge for users and decision-makers (J. Wagemann)

To recapitulate: at a first sight, more data, new data sources and a constant data flow sound like a true boon for every data scientist. However, it is vital, when talking about Big Data, to differentiate between raw and unstructured data and value-added information. Information is extracted from raw data. Information and insight is what is real needed. The challenge is to turn Petabytes of raw and unstructured data into kilo- or megabytes of refined information (Rowley 2007). Decision-makers need refined and actionable information to base decisions, policies and recommended actions on. The question is if we can expect an increase in information at the same speed as Big Data is generated. And there the bane of Big Data comes into play.

The bare presence of Big Data is not enough. Turning Big Data into information brings new challenges along the entire data value chain. We face challenges in data generation, where we have new data sources and types from social-media, citizen-empowered science, crowdsourcing and unmanned aerial vehicles. We face challenges in data storage and management, where questions related to high performance computing architectures, interoperability of data management systems and cloud computing have to be addressed. Data governance, data licensing and metadata are further essential areas that have to be dealt with.

We also face challenges in data analysis, where data mining and machine learning is a ‘hot’ and popular topic. And we face challenges in data insights, especially related to data communication and visualization. The best research and findings are valueless if not communicated properly.

These challenges along the entire data value chain are well reflected in the four official subprogrammes of the Earth Science and Space Informatics (ESSI) division of the European Geoscience Union, which are: (i) Community-driven challenges and solutions dealing with Informatics, (ii) Infrastructures across the Earth and Space Sciences, (iii) Open Science 2.0 Informatics for Earth and Space Sciences and (iv) Visualization for scientific discovery and communication.

ESSI is a very interdisciplinary field and compared to other geoscientific disciplines, a rather recent but important research field. The keen interest of the geoscientific community in ESSI was reflected at this year’s EGU, where there was often a mismatch between a too small size of the room for the ESSI sessions and the high number of people interested.

Coming back to the question in the headline: “Big Earth Data – Boon or bane?” Let’s find  a compromise maybe. What about considering Big Earth Science Data as boon, as a start? The amount of freely available, high-resolution data products and current processing capabilities give us new opportunities that have never been possible before. And we gain hidden insights into the state of our Earth. Key to the great potential Big Data incorporates is open data access. Related to the importance of open data policies I recommend Barbara Ryan’s TEDx talk about Unleashing the Power of Earth Observations (TED 2014). Barbara Ryan is the General Secretary of the Group of Earth Observations (GEO) and illustratively explains the positive outcomes of opening up the entire Landsat archive in 2008.

The era of big data forces us to rethink and disrupt our common data processing approach. Currently, a data scientist spends 80% of the time with managing and pre-processing the data and has only 20% for the actual data evaluation. Every stakeholder along the data value chain, from data generator over data provider to data user has to work on innovative approaches to tackle concurrent challenges and to leverage the full potential of Big Earth Science Data. The bane comes into play, if we continue generating and storing massive amounts of data and fail to turn it into value-added content.

What is the boon and what is the bane of your research with Big Earth Science Data? What challenges do you face in your daily grind of data processing? What challenges of Big Earth Science data do you address with your research / current work?
We would like to know about it. This blog post is the first of hopefully monthly blog post contributions of the ESSI division and we are looking for any contributions within the ESSI community.

Related to this blog post, the movie Big Earth Data is highly recommended.


FDC (2016): Data Volume | Copernicus. – (last access: 2016-06-29)

Rowley, J (2007): The wisdom hierarchy: representations of the DIKW hierarchy. Journal of Information Sciences 33/2: 163-180.