GD
Geodynamics

Geodynamics

Thoughts on geological modelling: an analogue perspective

Thoughts on geological modelling: an analogue perspective

In geodynamics we study the dynamics of the Earth (and other planets). We ground our studies in as much data as possible, however we are constrained by the fact that pretty much all direct information we can collect from the interior of the Earth only shows its present-day state. The surface rock record gives us a glimpse into the past dynamics and evolution of our planet, but this record gets sparser as we go back in time. This is why it is common to use modelling in geodynamics to fill this gap of knowledge. There are different types of modelling, and this week João Duarte writes about the importance of analogue modelling. 

João Duarte. Researcher at Instituto Dom Luiz and Invited Professor at the Geology Department, Faculty of Sciences of the University of Lisbon. Adjunct Researcher at Monash University.

The first time I went to EGU, in 2004, I presented a poster with some new marine geology data and a few sets of analogue models. I was doing accretionary wedges in sandboxes. At the time, I was in the third year of my bachelor’s degree and I was completely overwhelmed by the magnitude of the conference. It was incredible to see the faces of all those scientists that I used to read articles from. But one thing impressed me the most. Earth Sciences were consolidating as a modern, theoretical based science. The efforts in trying to develop an integrated dynamic theory of plate tectonics as part of mantle convection were obvious. The new emergent numerical models looked incredible, with all those colours, complex rheologies and stunning visualization that allowed us to “see” stresses, temperature gradients and non-linear viscosities. I was amazed.

Analogue modelling was at a relative peak in 2004, however it was also anticipated by some that it would quickly disappear (and indeed several analogue labs have closed since). It was with this mindset, that I later did the experiments for my PhD, which I finished in 2012 (Duarte et al., 2011). But I was fortunate. My supervisors, Filipe Rosas and Pedro Terrinha, took me to state-of-art labs, namely Toronto and Montpellier (lead at the time by Sandy Cruden and Jacques Malavieille, respectively), and I started to develop a passion for this kind of models. When I moved to Monash for a post-doc position, in 2011, this turned out to be a great advantage. There, modelers such as Wouter Schellart, Louis Moresi, Fabio Capitanio, David Boutelier and Sandy Cruden (yes, I met Sandy again at Monash) were using analogue models to benchmark numerical models. Why? Because many times, even though numerical models produce spectacular results, they might not be physically consistent. And there is only one way to get rid of this, which is to make sure that whatever numerical code we are using can reproduce simple experiments that we can run in a lab. The classical example is the sinking of a negatively buoyant sphere in a viscous medium.

Sandbox analogue model of an accretionary wedge. Part of the same experiment as shown in the header figure. Here, a sliced section cut after wetting, is shown. University of Lisbon, 2009. Experiments published in Duarte et al. (2011).

That was what we were doing at Monash. I worked with Wouter Schellart in the development of subduction experiments with an overriding plate, which were advancing step by step in both analogue and numerical schemes (see e.g., Duarte et al., 2013 and Chen et al., 2015, 2016 for the analogue models, and Schellart and Moresi, 2013 for numerical equivalents). The tricky bit was, we wanted to have self-consistent dynamic experiments in which we were ascribing the forces (negative buoyancy of the slab, the viscosity of the upper mantle, etc) and let the kinematics (i.e. the velocities) to be an emergent phenomenon. So, no lateral push or active kinematic boundaries were applied to the plates. This is because we now recognize that in general, it is the slab pull at subduction zones that majorly drives the plates and not the other way around. Therefore, if we want to investigate the fundamental physics and dynamics of subduction zones we need to use self-consistent models (both analogue and numerical). In order to carry out these models, we had to develop a new rheology for the subduction interface, which is a complex problem, both in the analogue and the numerical approaches (Duarte et al. 2013, 2014, 2015). But this is another very long story that would lead to a publication by itself.

Analogue models of subduction with an overriding plate and an interplate rheology. Monash University, 2012. Adapted from Duarte et al. (2013)

But what is analogue modelling all about? Basically, analogue models are scaled models that we can develop in the laboratory using analogue materials (such as sand), and that at the scale that we are doing our models have similar physical properties to those of natural materials (such as brittle rocks). But, as it is widely known, under certain circumstances (at large time and space scales), rocks behave like fluids, and for that we use analogue fluids, such as silicone putties, glucose and honey. We can also use fluids to simulate the interaction between subduction zones and mantle plumes in a fluid reservoir (see below figures and links to videos of scaled experiments using three different fluids to study slab-plume interaction; Meriaux et al., 2015a, 2015b, 2016). These are generally called geodynamic analogue models.

End of a slab-plume experiment in the upper mantle (see below). The tank is partially filled with glucose. The slab (laying at the analogue 660 discontinuity) is made of silicone mixed with iron powder. The plume is made of a water solution of glucose dyed with a red colorant. And that’s me on the left. Monash University, 2014.

I usually consider two main branches of analogue models. The first, which is the one mostly used by geologists, was started by Sir James Hall (1761 – 1832), that squeezed layers of clay to reproduce the patterns of folded rocks that he had observed in nature. This method was later improved by King Hubbert (1937), who laid the ground for the development of the field by developing a theory of scaling of analogue models applied to geological processes.

The other branch is probably as old as humans. It began when we started to manipulate objects and using them to understand basic empirical laws, such as the one that objects always fall. When Galileo was using small spheres in inclined surfaces to extract the physical laws that describe the movement of bodies, from rocks to planets, he was in a certain way using analogue models. He understood that many laws are scale invariant. Still today, these techniques are widely used by physicist and engineers when understanding for example the aerodynamics of airplanes, the stability of bridges, the dynamics of rivers or the resistance of dams. They use scaled models that reproduce at suitable laboratory scales the objects and processes that they are investigating.

What we did at Monash, was a mixture of both approaches. Though, we were less interested in exactly reproducing nature from a purely geometric and kinematic point of view, but we were more interested in understanding the physics of the object we were investigating: subduction zones. Therefore, we had to guarantee that we were using the correct dynamical approach in order to be able to extract generic physical empirical laws, hoping that these laws would provide us some insight on the dynamics of natural subduction zones. These empirical laws could readily be incorporated in numerical models, which would then help exploring more efficiently the space of the controlling parameters in the system.

Slab-Plume interaction in the upper mantle. Experiments published in Meriaux et al. (2015a, 2015b).

I want to finish with a question that I believe concerns all of us: are there still advantages in using analogue models? Yes, I believe so! One of the most important advantages is that analogue models are always three-dimensional and high-resolution. Furthermore, they allow a good tracking of the strain and to understand how it occurs in discontinuous mediums, for example when investigating the localization of deformation or the propagation of cracks. Numerical schemes still struggle with these problems. It is very difficult to have an efficient code that can deal simultaneously with very high resolution and large-scale three-dimensional problems, as it is required to investigate the process of subduction. Nevertheless, numerical models are of great help when it comes to track stresses, and model complex rheologies and temperature gradients. To sum up: nowadays, we recognize that certain problems can only be tackled using self-consistent dynamic models that model the whole system in three-dimensions, capturing different scales. For this, the combination of analogue and numerical models is still one of the most powerful tools we have. An interesting example of a field in which a combined approach is being used is the fascinating investigations on the seismic cycle (for example, see here).

Links to videos:

VIDEO 1: https://www.youtube.com/watch?v=U1TXC2XPbFA&feature=youtu.be
(Subduction with an overriding plate and an interplate rheology. Duarte et al., 2013)

VIDEO 2: https://www.youtube.com/watch?v=n5P2TzS6h_0&feature=youtu.be
(Slab-plume interaction at mantle scale. Side-view of the experiment on the top, and top-view of the experiment on the bottom. Meriaux et al., 2016)
References:

Chen, Z., Schellart, W.P., Strak, V., Duarte, J.C., 2016. Does subduction-induced mantle flow drive backarc extension? Earth and Planetary Science Letters 441, 200-210. https://doi.org/10.1016/j.epsl.2016.02.027

Chen, Z., Schellart, W.P., Duarte, J.C., 2015. Overriding plate deformation and variability of forearc deformation during subduction: Insight from geodynamic models and application to the Calabria subduction zone. Geochemistry, Geophysics, Geosystems 16, 3697–3715. DOI: 10.1002/2015GC005958

Duarte, J.C., Schellart, W.P., Cruden, A.R., 2015. How weak is the subduction zone interface? Geophysical Research Letters 41, 1-10. DOI: 10.1002/2014GL062876

Duarte, J.C., Schellart, W.P., Cruden, A.R., 2014. Rheology of petrolatum – paraffin oil mixtures: applications to analogue modelling of geological processes. Journal of Structural Geology 63, 1-11. https://doi.org/10.1016/j.jsg.2014.02.004

Duarte, J.C., Schellart, W.P., Cruden, A.R., 2013. Three-dimensional dynamic laboratory models of subduction with an overriding plate and variable interplate rheology. Geophysical Journal International 195, 47-66. https://doi.org/10.1093/gji/ggt257

Duarte, J.C., F.M. Rosas P., Terrinha, M-A Gutscher, J. Malavieille, Sónia Silva, L. Matias, 2011. Thrust–wrench interference tectonics in the Gulf of Cadiz (Africa–Iberia plate boundary in the North-East Atlantic): Insights from analog models. Marine Geology 289, 135–149. https://doi.org/10.1016/j.margeo.2011.09.014

Hubbert, M.K., 1937. Theory of scale models as applied to the study of geologic structures. GSA Bulletin 48, 1459-1520. https://doi.org/10.1130/GSAB-48-1459

Meriaux, C., Meriaux, A-S., Schellart, W.P., Duarte, J.C., Duarte, S.S., Chen, Z., 2016. Mantle plumes in the vicinity of subduction zones. Earth and Planetary Science Letters 454, 166-177. https://doi.org/10.1016/j.epsl.2016.09.001

Mériaux, C.A., Duarte, J.C., Schellart, W.P., Mériaux, A-S., 2015. A two-way interaction between the Hainan plume and the Manila subduction zone. Geophysical Research Letters 42, 5796–5802. DOI: 10.1002/2015GL064313

Meriaux, C.A., Duarte, J.C., Duarte, S., Chen, Z., Rosas, F.M., Mata, J., Schellart, W.P., and Terrinha, P. 2015. Capture of the Canary mantle plume material by the Gibraltar arc mantle wedge during slab rollback. Geophysical Journal International 201, 1717-1721. https://doi.org/10.1093/gji/ggv120

Schellart, W.P., Moresi, L., 2013. A new driving mechanism for backarc extension and backarc shortening through slab sinking induced toroidal and poloidal mantle flow: Results from dynamic subduction models with an overriding plate. Journal of Geophysical Research: Solid Earth 118, 3221-3248. https://doi.org/10.1002/jgrb.50173

Inversion 101 through 201 – Part 3: Accounting for uncertainty – Bayes and friends

Inversion 101 through 201 – Part 3: Accounting for uncertainty – Bayes and friends

The Geodynamics 101 series serves to showcase the diversity of research topics and methods in the geodynamics community in an understandable manner. We welcome all researchers – PhD students to professors – to introduce their area of expertise in a lighthearted, entertaining manner and touch upon some of the outstanding questions and problems related to their fields. This time, Lars Gebraad, PhD student in seismology at ETH Zürich, Switzerland, talks about inversion and how it can be used in geodynamics! Due to the ambitious scope of this topic, we have a 3-part series on this! You can find the first post here. This week, we have our final post in the series, where Lars discusses probabilistic inversion.

One integral part of doing estimations on parameters is an uncertainty analysis. The aim of a general inverse problem is to find the value of a parameter, but it is often very helpful to indicate the measure of certainty. For example in the last figure of my previous post, the measurement values at the surface are more strongly correlated to the upper most blocks. Therefore, the result of an inversion in this set up will most likely be more accurate for these parameters, compared to the bottom blocks.

In linear deterministic inversion, the eigenvalues of the matrix system provide an indication of the resolvability of parameters (as discussed in the aforementioned work by Andrew Curtis). There are classes of methods to compute exact parameter uncertainty in the solution.

From what I know, for non-linear models, uncertainty analysis is limited to the computation of second derivatives of the misfit functional in parameter space. The second derivatives of X (the misfit function) are directly related to the standard deviations of the parameters. Thus, by computing all the second derivatives of X, a non-linear inverse problem can still be interrogated for its uncertainty. However, the problem with this is its linearisation; linearising the model and computing derivatives may not be truly how the model reacts in model space. Also, for strongly non-linear models many trade-offs (correlations) exist which influence the final solution, and these correlations may very strongly depending on the model to be inverted.

Bayes’ rule

Enter reverend Thomas Bayes

This part-time mathematician (he only ever published one mathematical work) from the 18th century formulated the Bayes’ Theorem for the first time, which combines knowledge on parameters. The mathematics behind it can be easily retrieved from our most beloved/hated Wikipedia, so I can avoid getting to caught up in it. What is important is that it allows us to combine two misfit functions or probabilities. Misfits and probabilities are directly interchangeable; a high probability of a model fitting our observations corresponds to a low misfit (and there are actual formulas linking the two). Combining two misfits allows us to accurately combine our pre-existing (or commonly: prior) knowledge on the Earth with the results of an experiment. The benefits of this are two-fold: we can use arbitrarily complex prior knowledge and by using prior knowledge that is bounded (in parameter space) we can still invert underdetermined problems without extra regularisation. In fact, the prior knowledge acts as regularisation.

One probabilistic regularised bread, please

Let’s give our friend Bayes a shot at our non-linear 1D bread. We have to come up with our prior knowledge of the bread, and because we did not need that before I’m just going to conjure something up! We suddenly find the remains of a packaging of 500 grams of flour

This is turning in quite the detective story!

However, the kitchen counter that has been worked on is also royally covered in flour. Therefore, we estimate that probably this pack was used; about 400 grams of it, with an uncertainty (standard deviation) of 25 grams. Mathematically we can formulate our prior knowledge as a Gaussian distribution with the aforementioned standard deviation and combine this with our misfit of the inverse problem (often called the likelihood). The result is given here:

Prior and original misfit

Combined misfit

One success and one failure!

First, we successfully combined the two pieces of information to make an inverse problem that is no longer non-unique (which was a happy coincidence of the prior: it is not guaranteed). However, we failed to make the problem more tractable in terms of computational requirements. To get the result of our combined misfit, we still have to do a systematic grid search, or at least arrive at a (local) minimum using gradient descent methods.

We can do the same in 2D. We combine our likelihood (original inverse problem) with rather exotic prior information, an annulus in model space, to illustrate the power of Bayes’ theorem. The used misfit functions and results are shown here:

Original misfit for a bread of 500 grams

Prior knowledge misfit

Combined misfit

This might also illustrate the need for non-linear uncertainty analysis. Trade-offs at the maxima in model space (last figure, at the intersection of the circle and lines) distinctly show two correlation directions, which might not be fully accounted for by using only second derivative approximations.

Despite this ‘non-progress’ of still requiring a grid search even after applying probability theory, we can go one step further by combining the application of Bayesian inference with the expertise of other fields in appraising inference problems…

Jump around!

Up to now, using a probabilistic (Bayesian) approach has only (apparently) made our life more difficult! Instead of one function, we now have to perform a grid search over the prior and the original problem. That doesn’t seem like a good deal. However, a much used technique in statistics deals with exactly the kind of problems we are facing here: given a very irregular and high dimensional function

How do we extract interesting information (preferably without completely blowing our budget on supercomputers)?

Let’s first say that with interesting information I mean minima (not necessarily restricted to global minima), correlations, and possibly other statistical properties (for our uncertainty analysis). One answer to this question was first applied in Los Alamos around 1950. The researches at the famous institute developed a method to simulate equations of state, which has become known as the Metropolis-Hastings algorithm. The algorithm is able to draw samples from a complicated probability distribution. It became part of a class of methods called Markov Chain Monte Carlo (MCMC) methods, which are often referred to as samplers (technically they would be a subset of all available samplers).
The reason that the Metropolis-Hastings algorithm (and MCMC algorithms in general) is useful, is that a complicated distribution (e.g. the annulus such as in our last figure) does not easily allow us to generate points proportional to its misfit. These methods overcome this difficulty by starting at a certain point in model space and traversing a random path through it – jumping around – but visiting regions only proportional to the misfit. So far, we have only considered directly finding optima to misfit functions, but by generating samples from a probability distribution proportional to the misfit functions, we can readily compute these minima by calculating statistical modes. Uncertainty analysis subsequently comes virtually for free, as we can calculate any statistical property from the sample set.

I won’t try to illustrate any particular MCMC sampler in detail. Nowadays many great tools for visualising MCMC samplers exist. This blog by Alex Rogozhnikov does a beautiful job of both introducing MCMC methods (in general, not just for inversion) and illustrating the Metropolis Hastings Random Walk algorithm as well as the Hamiltonian Monte Carlo algorithm. Hamiltonian Monte Carlo also incorporates gradients of the misfit function, thereby even accelerating the MCMC sampling. Another great tool is this applet by Chi Feng. Different target distributions (misfit functions) can be sampled here by different algorithms.

The field of geophysics has been using these methods for quite some time (Malcom Sambridge writes in 2002 in a very interesting read that the first uses were 30 years ago), but they are becoming increasingly popular. However, strongly non-linear inversions and big numerical simulations are still very expensive to treat probabilistically, and success in inverting such a problem is strongly dependent on the appropriate choice of MCMC sampler.


Summary of probabilistic inversions

In the third part of this blog we saw how to combine any non-linear statistical model, and how to sample these complex functions using MCMC samplers. The resulting sample sets can be used to do an inversion and compute statistical moments of the inverse problem.


Some final thoughts

If you reached this point while reading most of the text, you have very impressively worked your way through a huge amount of inverse theory! Inverse theory is a very diverse and large field, with many ways of approaching a problem. What’s discussed here is, to my knowledge, only a subset of what’s being used ‘in the wild’. These ramblings of a aspiring seismologist might sound uninteresting to the geodynamiscists at the other side of the geophysics field. Inverse methods seem to be not nearly discussed as much in geodynamics as they are in seismology. Maybe it’s the terminology that differs, and that all these concepts are well known and studied under different names and you recognise some of the methods. Otherwise, I hope I have given an insight in the wonderful and sometimes ludicrous mathematical world of (some) seismologists.

Interested in playing around with inversion yourself? You can find a toy code about baking bread here.


Inversion 101 through 201 – Part 2: The inverse problem and deterministic inversion

Inversion 101 through 201 – Part 2: The inverse problem and deterministic inversion

The Geodynamics 101 series serves to showcase the diversity of research topics and methods in the geodynamics community in an understandable manner. We welcome all researchers – PhD students to professors – to introduce their area of expertise in a lighthearted, entertaining manner and touch upon some of the outstanding questions and problems related to their fields. This time, Lars Gebraad, PhD student in seismology at ETH Zürich, Switzerland, talks about inversion and how it can be used in geodynamics! Due to the ambitious scope of this topic, we have a 3-part series on this! You can find the first post here. This week, we have our second post in the series, where Lars discusses the inverse problem and deterministic inversion.

The idea of inversion is to literally invert the forward problem. For the aforementioned problems our knowns become unknowns, and vice versa. We wish to infer physical system parameters from measurements. Note that in our forward problem there are multiple classes of knowns; we have the forcing parameters, the material properties and the boundary conditions. All of these parameters could be inverted for in an inversion, but it is typically one only inverts for one class. Most of the time, medium parameters are the target of inversions. As we will go through the examples, I will gradually introduce some inversion methods. Every new method is marked in blue.

Estimating ingredients from a bread

Now, let’s consider our first example: the recipe for a bread. Let’s say we have a 0.5 kilogram bread. For our first case (case 1a), we will assume that the amount of water is ideal. In this case, we have one free variable to estimate (the used amount of flour), from one observable (the resulting amount of bread). We have an analytical relationship that is both invertible and one-to-one. Therefore, we can use the direct inversion of the relationship to compute the amount of flour. The process would go like this:

Direct inversion

1. Obtain the forward expression G(m) = d;
2. Solve this formula for G^-1 (d) = m;
3. Find m by plugging d into the inverse function.

Applying this direct inversion shows us that 312.5 grams of flour must have been used.

The two properties of the analytical relationship (invertible and one-to one) are very important for our choice of this inversion method. If our relationship was sufficiently complex, we couldn’t solve the formula analytically (though the limitation might then lie by the implementer 😉 ). If a function is not one-to-one, then two different inputs could have the same output, so we cannot analytically invert for a unique solution.

In large linear forward models (as often obtained in finite difference and finite element implementations), we have a matrix formula similar to

A m = d.

In these systems, m is a vector of model parameters, and d a vector of observables. If the matrix is invertible, we can directly obtain the model parameters from our observables. The invertibility of the matrix is a very important concept, which I will come back to in a bit.

Problems in direct inversion, misfits and gradient descents

Let’s see what happens when the first condition is not satisfied. I tried creating a non-linear function that at least I could not invert. Someone might be able to do it, but that is not the aim of this exercise. This new relationship is given here:

Complex flour to bread relationship, another 1D forward model.

The actual forward model is included in the script at the bottom of the post. When we cannot analytically solve the forward model, one approach would be to iteratively optimise the input parameters such that the end result becomes as close as possible to the observed real world. But, before we talk about how to propose inputs, we need a way to compare the outputs of our model to the observables!
When we talk about our bread, we see one end-product. It is very easy to compare the result of our simulation to the real world. But what happens when we make a few breads, or, similarly, when we have a lot of temperature observations in our volcanic model? One (common) way to combine observables and predictions into a single measure of discrepancy is by defining a misfit formula which takes into account all observations and predictions, and returns a single – you guessed it – misfit.
The choice of misfit directly influences which data is deemed more important in an inversion scheme, and many choices exist. One of the most intuitive is the L2-norm (or L2 misfit), which calculates the vector distance between the predicted data (from our forward model) with the observed data as if in Euclidean space. For our bread it would simply be

X = | predicted bread – observed bread |

Let’s try to create a misfit function for our 1D bread model. I use the relationship in the included scripts. Again, we have 500 grams of bread. By calculating the quantity | G(m) – 500 |, we can now make a graph of how the misfit varies as we change the amount of flour. I created a figure which shows the observed and predicted value:

Complex flour to bread relationship with observation superimposed

and a figure which shows the actual misfit at each value of m

Misfit between model and observation

Looking at the two previous figures may result in some questions and insights to the reader. First and foremost, it should be obvious that this problem does not have a unique solution. Different amounts of flour give exactly the same end result! It is thus impossible to say with certainty how much flour was used.
We have also recast our inversion problem as an optimisation function. Instead of thinking about fitting observations we can now think of our problem as a minimization of some function X (the misfit). This is very important in inversion.
Iterative optimizations schemes such as gradient descent methods (and the plethora of methods derived from it) work in the following fashion:

Gradient descent inversion/optimization

1. Pick a starting point based on the best prior knowledge (e.g., a 500 gram bread could have been logically baked using 500 grams of flour);
2. Calculate the misfit (X) at our initial guess;
3. If X is not low enough (compared to some arbitrary criterion):
• Compute the gradient of X;
• Do a step in the direction of the steepest gradient;
• Recompute X;
• Repeat from 2.
4. If X is low enough:
• Finalise optimisation, with the last point as solution.
These methods are heavily influenced by the starting point of the inversion scheme. If I would have started on the left side of the recipe-domain (e.g. 100 grams of flour), I might well have ended up in a different solution. Gradient descents often get ‘stuck’ in local solutions, which might not even be the optimal one! We will revisit this non-uniqueness in the 2D problem, and give some strategies to mitigate creating more than one solution. Extra material can be found here.

Simplistic gradient descent

Obvious solutions, grid searches, and higher dimensions

One thing that often bugged me about the aforementioned gradient descent methods is the seemingly complicated approach for such simple problems. Anyone looking at the figures could have said

Well, duh, there’s 3 solutions, here, here and here!

Why care about such an complicated way to only get one of them?
The important realisation to make here is that I have precomputed all possible solution for this forward model in the 0 – 700 grams range. This precomputation on a 1D domain was very simple; at a regular interval, compute the predicted value of baked bread. Following this, I could have also programmed my Python routine to extract all the values with a sufficiently low misfit as solutions. This is the basis of a grid search.
Let’s perform a grid search on our second model (1b). Let’s find all predictions with 500 grams of bread as the end result, plus-minus 50 grams. This is the result:

Grid search for the 2D model

The original 312.5 grams of flour as input is part of the solution set. However, the model actually has infinitely many solutions (extending beyond the range of the current grid search)! The reason that a grid search might not be effective is the inherent computational burden. When the forward model is sufficiently expensive in numerical simulation, exploring a model space completely with adequate resolution might take very long. This burden increases with model dimension; if more model parameters are present, the relevant model space to irrelevant model space becomes very small. This is known as the curse of dimensionality (very well explained in Tarantola’s textbook).
Another reason one might want to avoid grid searches is our inability to appropriately process the results. Performing a 5 dimensional or higher grid searches is sometimes possible on computational clusters, but visualizing and interpreting the resulting data is very hard for humans. This is partly why many supercomputing centers have in-house departments for data visualisation, as it is a very involved task to visualise complex data well.

Now: towards solving our physical inversions!

Non-uniqueness, regularization and linear algebra (bleh!)

One big problem in inversion is non-uniqueness: the same result can be obtained from different inputs. The go-to way to combat this is to add extra information of any form to the forward problem. In our bread recipe we could think of adding extra observables to our problem, such as the consistency of the bread, its taste, color, etc. Another option could be to add constraints on the parameters, such as using the minimum amount of ingredients. This is akin to asking the question: given this amount of bread, how much flour and water was minimally used to make it?
Diffusion type problems are notorious for their non-uniqueness. Many different subsurface heat conduction distributions might result in the observations (imagine differently inclined volcanic conduits). An often used method of regularisation (not limited to diffusion type studies!) is spatial smoothing. This method requires that among equally likely solutions, the smoothest solutions are favoured, for it is more ‘natural’ to have smoothly varying parameters. Of course, in many geoscientific settings one would definitely expect sharp contrasts. However, in ‘underdetermined’ problems (i.e., you do not have enough observations to constrain a unique solution), we favour Occam’s Razor and say

The simplest solution must be assumed

When dealing with more parameters than observables (non-uniqueness) in linear models it is interesting to regard the forward problem again. If one would parameterize our volcanic model using 9 parameters for the subsurface and combine that with the 3 measurements at the surface, the result would be an underdetermined inverse problem.

Rough parametrization for the heat conduction model

This forward model (the Laplace equation) can be discretised by using, for example, finite differences. The resulting matrix equation would be Am = d, with A a 3 x 9 matrix, m a 9 dimensional vector and d a 3 dimensional vector. As one might recall from linear algebra classes, for a matrix to have an inverse, it has to be square. This matrix system is not square, and therefore not invertible!

Aaaaahhh! But don’t panic: there is a solution

By adding either prior information on the parameters, smoothing, or extra datapoints (e.g., taking extra measurements in wells) we can make the 3 x 9 system a perfect 9 x 9 system. By doing this, we condition our system such that it is invertible. However, many times we end up overdetermining our system which could result in a 20 x 9 system, for example. Note that although neither the underdetermined nor the overdetermined systems have an exact matrix inverse, both do have pseudo-inverses. For underdetermined systems, I have not found these to be particularly helpful (but some geophysicists do consider them). Overdetermined matrix systems on the other hand have a very interesting pseudo-inverse: the least squares solution. Finding the least squares solution in linear problems is the same as minimising the L2 norm! Here, two views on inversion come together: solving a specific matrix equation is the same as minimising some objective functional (at least in the linear case). Other concepts from linear algebra play important roles in linear and weakly non-linear inversions. For example, matrix decompositions offer information on how a system is probed with available data, and may provide insights on experimental geophysical survey design to optimise coverage (see “Theory of Model-Based Geophysical Survey and Experimental Design Part A – Linear Problems” by Andrew Curtis).
I would say it is common practice for many geophysicists to pose an inverse problem that is typically underdetermined, and keep adding regularization until the problem is solvable in terms of matrix inversions. I do not necessarily advocate such an approach, but it has its advantages towards more agnostic approaches, as we will see in the post on probabilistic inversion next week!


Summary of deterministic inversions

We’ve seen how the forward model determines our inversion problem, and how many measurements can be combined into a single measure of fit (the misfit). Up to now, three inversion strategies have been introduced:

Direct inversion: analytically find a solution to the forward problem. This method is limited to very specific simple cases, but of course yields near perfect results.
Gradient descent methods: a very widely used class of algorithms that iteratively update solutions based on derivatives of the misfit function. Their drawbacks are mostly getting stuck in local minima, and medium computational cost.
Grid searches: a method that searches the entire parameter space systematically. Although they can map all the features of the inverse problem (by design), they are often much too computationally expensive.

What might be even more important, is that we have seen how to reduce the amount of possible solutions from infinitely many to at least a tractable amount using regularisation. There is only one fundamental piece still missing… Stay tuned for the last blog post in this series for the reveal of this mysterious missing ingredient!

Interested in playing around with inversion yourself? You can find a toy code about baking bread here.


Inversion 101 to 201 – Part 1: The forward problem

Inversion 101 to 201 – Part 1: The forward problem

The Geodynamics 101 series serves to showcase the diversity of research topics and methods in the geodynamics community in an understandable manner. We welcome all researchers – PhD students to professors – to introduce their area of expertise in a lighthearted, entertaining manner and touch upon some of the outstanding questions and problems related to their fields. This time, Lars Gebraad, PhD student in seismology at ETH Zürich, Switzerland, talks about inversion and how it can be used in geodynamics! Due to the ambitious scope of this topic, we have a 3-part series on this with the next part scheduled for publication next week! This week, however, we start with the basics that many geodynamicists will be familiar with: the forward problem.

Inversion methods are at the core of many physical sciences, especially geosciences. The term itself is used mostly by geophysicists, but the techniques employed can be found throughout investigative sciences. Inversion allows us to infer our surroundings, the physical world, from observations. This is especially helpful in geosciences, where one often relies on observations of physical phenomena to infer properties of the (deep) otherwise inaccessible subsurface. Luckily, we have many, many algorithms at our disposal! (…cue computer scientists rejoicing in the distance.)

Inverse theory in a nutshell
Credit: xkcd.

The process of moving between data and physical parameters goes by different names in different fields. In geophysics we mostly use inversion or inference. Inversion is intricately related to optimization in control theory and some branches of machine learning. In this small series of blog posts, I will try to shed a light on how these methods work, and how they can be used in geophysics and in particular geodynamics! In the end, I will focus a little on Bayesian inference, as it is an increasingly popular method in geophysics with growing potential. Be warned though:

Math ahead!

(although as little as possible)

First, for people who are serious about learning more about inverse theory, I strongly suggest Albert Tarantola’s “Inverse Problem Theory and Model Parameter Estimation”. You can buy it or download it (for free!) somewhere around here: http://www.ipgp.fr/~tarantola/.

I will discuss the following topics in this blog series (roughly in order of appearance):

1. Direct inversion
2. Gradient descent methods
3. Grid searches
4. Regularization
5. Bayesian inference
6. Markov chain Monte Carlo methods

The forward ‘problem’

Rather backwards, I will start the first blog post with a component of inversion which is needed about halfway through an inversion algorithm. However, this piece determines the physics we are trying to investigate. Choosing the forward ‘problem’ or model is posing the question of how a physical system will evolve, typically in a deterministic setting. This can be any relationship. The forward problem is roughly the same as asking:

I start with these materials and those initial conditions, what will happen in this system?

In this post I will consider two examples of forward problems (one of which is chosen because I have applied it everyday recently. Hint: it is not the second example…)

1. The baking of a bread with known quantities of water and flour, but the mass of the end product is unknown. A recipe is available which predicts how much bread will be produced (let’s say under all ratios of flour to water).

2. Heat conduction through a volcanic zone where geological and thermal properties and base heat influx are known, but surface temperatures are unknown. A partial differential equation is available which predicts the surface temperature (heat diffusion, Laplace’s equation).

Both these systems are forward problems, as we have all the necessary ingredients to simulate the real world. Making a prediction from these ‘input’ variables is a forward model run, a forward simulation or simply a forward solution. These simulations are almost exclusively deterministic in nature, meaning that no randomness is present. Simulations that start exactly the same, will always yield the same, predictable result.

Modelling our bread

For the first system, the recipe might indicate that 500 grams of flour produce 800 grams of bread (leaving aside the water for a moment, making case 1a), and that the relationship is linear in flour. The forward model becomes:

flour * 800 / 500 = bread

This is a plot of that relationship:

Simple flour to bread relationship: a 1D forward model.
I think I am getting hungry now. Does anyone else have the same problem?

Of course, if we add way too much water to our nice bread recipe, we mess it up! So, the correct amount of water is also important for making our bread. Hence, our bread is influenced by both quantities, and the forward model is a function of 2 variables:

bread = g(flour, water)

Let’s call this case 1b. This relationship is depicted here:

Simple flour and water to bread relationship: a 2D forward model.
Yes, I am definitely getting hungry.

It is important to note that the distinction between these two cases creates two different physical systems, in terms of invertibility.

Modelling our volcanic zone (and other PDE’s)

Modelling a specific physical system is a more complicated task. Typically, physical phenomena can be described by set of (partial) differential equations. Other components that are required to predict the end result of a physical deterministic system are

1. The domain extent, suitable discretisation and parameter distribution in the area of investigation, and the relevant material properties (sometimes parameters) at every location in the medium
2. The boundary conditions, and when the system is time varying (transient), additional initial conditions
3. A simulation method

As an example, the relevant property that determines heat conduction is the heat conductivity at every location in the medium. The parameters control much of how the physical system behaves. The boundary and initial conditions, together with the simulation methods are very important parts of our forward model, but are not the focus of this blog post. Typically, simulation methods are numerical methods (e.g., finite differences, finite elements, etc.). The performance and accuracy of the numerical approximations are very important to inversion, as we’ll see later.
The result of the simulation of the volcanic heat diffusion is the temperature field throughout the medium. We might however be only interested in the temperature at real-world measurement points. We consider the following conceptual model:

Conceptual model of heat diffusion in a heterogeneous thermal conductivity model. Domain of interest is the black box. Under homogeneous heat influx, the deterministic simulation will result in a decreasing temperature in the three measurement points from left to right.

The resulting temperatures at the measurement locations will be different for each point, because they are influenced by the heterogeneous subsurface. Real-world observations would for example look something like this

‘Observations’ (made by me of course) from the volcanic model.

Now that you have a good grasp of the essence of the forward problem, I will discuss the inverse problem in the next blog post. Of course, the idea of inversion is to literally invert the forward problem, but if it were as easy as that, I wouldn’t be spending 3 blog posts on it, now would I?