The amount of digital data per person is rising with a geometric progression since 2009. According to the latest report of Oyster IMS, the digital universe will grow by a factor of 300 between 2005 and 2020: from 130 Exabytes to 40,000 Exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every person in 2020). Earth sciences is one of the domains where huge volumes of data are collected.
Let’s take a database of geological samples of the Institute of Geology, Taras Shevchenko National University as an example of Big Data (BD). This database consists of tables, which contain geochemical, petrological, mineralogical and petrophysical information of 11.800 samples of granitoids from the Ukrainian shield (crystalline massif). These tables are downloaded into MS Access tables and consist of general (information about number of sample, geographical coordinates of sample’s pick-point location, mineralogical and chemical composition, photos of fine-section of the sample, petrophysical information) and additional (characteristic of region and geological structures from which sample is taken) tables. Each data entry is assigned a unique identifier to link data tables with each other. The database structure allows to request information about a sample by entering its unique identification number, the type of rock or characteristics about its content. This saves time and energy.
The biggest volumes of information in geophysics are primarily represented by 3D seismic data. Such huge amounts of data are stored because of the large areas, high density and high resolution of the acquired information. For example, “an area of 200 km2 of 3D seismic off-shore acquisition data occupied 30-40 GB of information in 1999-2001; in 2004 with 968 channels per block, 100 – 130 GB of data was acquired. 220-250 GB of data was obtained with 1280 channels in 2010.” – says seismic processor and interpreter P. Kuzmenko. “But nowadays to investigate precisely the structure of a reservoir, a 3-component wide-azimuth and full-azimuth seismic acquisition is applied, which is done with 7600 – 51200 channels and as a result the amount of digital data raises to 1.5 – 10 Tb. Doing this on land, this can take up a volume to 100 Tb. Modern acquired data is stored compactly in electronic databases as digital data. In addition, there is a great amount of ancient geophysical information on paper (maps, well-loggings, reports), which should also be stored in electronic databases to prevent their loss. It means that amount of data for interpretation will raise in geometric progression, if paper materials are converted into digital”.
You can say, “Why do geoscientists need such amounts of raw data? They can analyze it and then delete it!” However, things are not that simple. Data from previous investigations may be useful for further stages of oil-field exploitation, scientific research and comparison with nearby territories.
To sum up, great resources are needed to store and analyze huge amounts of data, but thanks to BD storage and analyzation techniques, important decisions are taken and as a result technologies are developed very rapidly especially in the past 10-20 years. Zettabytes of scientific data contains important information, which can help to develop sustainable life-style, predict and sometimes even prevent dangerous events.
The team encourages you to send us your own thoughts about Big Data or other ESSI related topics! We invite students, scientists, professionals and other interested in geosciences persons to answer several questions:
- What is the boon and what is the bane of your research with Big Earth Science Data?
- What challenges do you face in your daily grind of data processing?
- What challenges of Big Earth Science data do you address with your research / current work?
Gratitude to Ph.D. in geophysics P. Kuzmenko and Ph.D. in geophysics O. Shabatura for provided information for this article.