Geophysical systems are usually described by a set of dynamical equations that are often non-linear and chaotic (Ghil and Lucarini, 2020). Errors about the initial state can grow, shrink, or stay constant with time, depending on error projections onto unstable, stable or neutral subspaces of the dynamical system. The properties of these subspaces can be measured by the Lyapunov exponents (Eckmann and Ruelle, 1985). Knowing the spectrum of Lyapunov exponents is thus immensely important as it can guide prediction strategies or inform decision making (Kalnay, 2003).

In particular, knowledge of the unstable modes can instruct the number of model realizations in an ensemble forecast system, or the deployment of efficient observation network. A notable example is the design of data assimilation (DA) that can be guided by the instability properties of the dynamical system assimilating data. Unfortunately, computing the Lyapunov exponents is extremely costly and computational burden grows quickly with the system’s dimension.

In Chen et al. (2021), we took the opposite viewpoint and showed that it is possible to use the output of DA to infer some fundamental properties of the spectrum of the Lyapunov exponents. Building upon previous studies (Bocquet et al., 2017; Bocquet and Carrassi, 2017), we derived a relation involving the error of DA, the size of the unstable-neutral subspace and the largest Lya- punov exponent.

Our numerical analysis is based on the new Vissio and Lucarini (2020) model, an extension of the Lorenz (1996) system that is able to mimic the co-existence of wave-like and turbulent features in the atmosphere and the interplay between dynamical and thermodynamical variables, in such a way that the Lorenz (1955) energy cycle can be established. Our results demonstrate the robustness of the relation between the skill of DA and the instability property for varying model parameters, especially, as expected, under strong observational constraint. We also look at the Kolmogorov- Sinai entropy, estimated as the a sum of all positive Lyapunov exponents (Eckmann and Ruelle, 1985), which measures the rate at which information is lost, and relate it to the skill of DA. As shown in Figure 1, the first Lyapunov exponent and the Kolmogorov-Sinai entropy appears clearly linearly related to the RMSE of the analysis, as predicted by the theory. Deviations from the linear trend are seen in the weakly unstable cases (see Chen et al. (2021) for rationale of this behaviour).

From the linear relation, our approach implies an efficient way to infer the largest Lyapunov exponent and the Kolmogorov-Sinai entropy under varying model parameters. Although further investigation is needed for more complex scenarios, this study paves the path to exploring the sen- sitivity of model instability to model parameters even in complex, high-dimensional geophysical systems.

**References**

Bocquet, M. and Carrassi, A.: Four-dimensional ensemble variational data assimilation and the unstable subspace, Tellus A: Dynamic Meteorology and Oceanography, 69, 1304504, https://doi.org/10.1080/16000870.2017.1304504, 2017.

Bocquet, M., Gurumoorthy, K. S., Apte, A., Carrassi, A., Grudzien, C., and Jones, C. K. R. T.: Degenerate Kalman Filter Error Covariances and Their Convergence onto the Unsta- ble Subspace, SIAM/ASA Journal on Uncertainty Quantification, 5, 304–333, https://doi.org/ 10.1137/16M1068712, 2017.

Chen, Y., Carrassi, A., and Lucarini, V.: Inferring the instability of a dynamical system from the skill of data assimilation exercises, Nonlinear Processes in Geophysics, 28, 633–649, https://doi.org/10.5194/npg-28-633-2021, 2021.

Eckmann, J. P. and Ruelle, D.: Ergodic theory of chaos and strange attractors, Reviews of Modern Physics, 57, 617–656, 1985.

Ghil, M. and Lucarini, V.: The physics of climate variability and climate change, Rev. Mod. Phys., 92, 035 002, https://doi.org/10.1103/RevModPhys.92.035002, 2020.

Kalnay, E.: Atmospheric Modeling, Data Assimilation and Predictability, Cambridge University Press, Cambridge, 2003.

Lorenz, E.: Available potential energy and the maintenance of the general circulation, Tellus, 7, 157–167, 1955.

Lorenz, E. N.: Predictability – a problem partly solved, in: Predictability of Weather and Climate, edited by Palmer, T. and Hagedorn, R., pp. 40–58, Cambridge University Press, 1996.

Vissio, G. and Lucarini, V.: Mechanics and thermodynamics of a new minimal model of the atmo- sphere, EPJ Plus, 135, 807, https://doi.org/10.1140/epjp/s13360-020-00814-w, 2020.

]]>Spectral characterization of atmospheric variability has led to the discovery of temporal scaling regimes extending from minutes to millions of years. Instrumental data has allowed us to characterize well the scaling regimes up to decadal timescales, showing a steep non-stationary turbulent regime up to weekly timescales in many atmospheric fields, then transitioning at longer timescales to a rather flat regime first termed the local spectral plateau. This description from steep to flat regimes corresponds to the so-called Hasselman model by this year’s eponymous Nobel laureate. More recently the turbulent regime and spectral plateau were termed weather and macroweather regimes by Lovejoy and Schertzer, and found to have spatially varying scaling exponents. The temporal scaling characteristics of a stochastic process is an indication of its autocorrelation, or memory, and informs us about the underlying dynamics.

The structure of climate variability at longer timescales is still subject of debate, and it is still unclear whether another transition into a non-stationary regime happens due to internal dynamics, and at which timescales. The classical view held that climate variability could be described as a sum of oscillatory processes driven by the Milankovitch orbital forcing and a relatively white background noise. However, spectral analysis of paleoclimate archives over long timescales have invariably shown a scaling background continuum of variability. This implies the existence of nonlinear mechanisms able to redistribute the sharply peaked orbital forcing to other timescales.

The nature of paleoclimate archives poses challenges to spectral analysis as the data are often of irregular resolution. To apply spectral methods which assume regular sampling, it is thus generally necessary to use interpolation in order to regularize the data. Interpolation acts as a filter in the Fourier domain and can significantly bias estimates of scaling exponents. There are interpolation-free alternatives such as the Lomb-Scargle periodogram which can be calculated for arbitrary sampling times.

In this paper, we evaluated the precision and accuracy of three methods to estimate the scaling exponent of irregular surrogate data mimicking paleoclimate archives: the multitaper spectrum with linear interpolation, the Lomb-Scargle periodogram, and the first-order Haar structure function. The latter is a wavelet-based method performed in real space which can be easily adapted to take in irregular data due to the simplicity of the Haar Wavelet. While all methods performed similarly for regular data of stationary timeseries, the interpolation-free methods allowed more accurate estimates for irregular timeseries by utilizing the shorter timescales which were otherwise biased by interpolation. The Lomb-Scargle periodogram however was found unsuitable for non-stationary timeseries and is thus unsuitable to detect the presence of a non-stationary low-frequency regime. The Haar structure function was relatively robust to irregularity and over a wide range of scaling exponents, and is thus a safe choice for the analysis of irregular geophysical timeseries.

]]>Achim obtained his PhD at the University of Nice (Franc), doing research on turbulence theory. He then moved to oceanography working at UCLA (USA) and Geomar (Germany). Since 2005 he holds a permanent position at CNRS working in the LEGI laboratory at the Université Grenoble Alpes (France). From 2014 to 2019 he was director of LEGI. His recent research is on applying methods of non-equilibrium thermodynamics to air-sea interaction.

Today, fluctuations are the focus of research in statistical mechanics, which was traditionally concerned with averages. Fluctuations in a thermodynamic system usually appear at spatial scales which are small enough so that thermal, molecular, motion leaves an imprint on the dynamics as first considered by Albert Einstein. The importance of fluctuations is, however, not restricted to small systems. Fluctuations can leave their imprint on the dynamics at all scales when (not necessarily thermal) fluctuations are strong enough.

In non-equilibrium statistical mechanics, which describes forced-dissipative systems, as air-sea interaction and many other components of the climate system, there is no universal probability density function (pdf). Some such systems have recently been demonstrated to exhibit a symmetry called a fluctuation theorem (FT), which strongly constrains the shape of the pdf.

FTs have been established analytically for Langevin type problems with thermal fluctuations. Most experimental data comes also from micro systems subject to thermal fluctuations. The thermodynamic frame of the quantities considered, as entropy, heat and work is not necessary to establish FTs. Examples of non-thermal fluctuations are the experimental data of the drag-force exerted by a turbulent flow and the local entropy production in turbulent Rayleigh-Bénard convection. For these non-Gaussian quantities the existence of a FT was suggested empirically. Our work, based on observations of atmospheric winds and oceanic currents, is strongly inspired by these investigations of the FT in data from laboratory experiments of turbulent flows.

The ocean dynamics is predominantly driven by the shear-stress between the atmospheric winds and ocean currents. The mechanical power input to the ocean is fluctuating in space and time and the atmospheric wind sometimes decelerates the ocean currents. Building on 24-years of global satellite observations, the input of mechanical power to the ocean is analysed. A Fluctuation Theorem (FT) holds when the logarithm of the ratio between the occurrence of positive (when the ocean gains energy by air-sea interaction) and negative events (when the ocean looses energy by air-sea interaction), of a certain magnitude of the power input, is a linear function of this magnitude and the averaging period. The flux of mechanical power to the ocean shows evidence of a FT, for regions within the recirculation area of the subtropical gyre, but not over extensions of western boundary currents. A FT puts a strong constraint on the temporal distribution of fluctuations of power input, connects variables obtained with different length of temporal averaging, guides the temporal down- and up-scaling and constrains the episodes of improbable events.

]]>Abhirup is pursuing a doctoral degree in Theoretical Physics at University of Potsdam. He is working at Potsdam Institute of Climate Impact Research as a guest researcher as part of the DFG funded NatRiskChange project. In this project, he uses recurrence analysis to compare the recurrence properties of different potential drivers underlying the temporal changes of flood hazards. He is working on further methodological developments of this technique to extend its capabilities to study event-like data (extreme events), data with uncertainties, and spatio-temporal recurrences. Abhirup has a Master’s degree in Physics from Bharathidasan University, Trichy, India.

Extreme events attract considerable attention in the scientific community across different disciplines due to their significant impact on economy and human lives. Floods are examples of natural extreme events that cause substantial loss of economic assets and lives. Although extreme events seem stochastic, they manifest a complex dynamical system and often have an inherent recurring behavior.

When studying such complex systems, linear methods seldom capture the whole picture as the system itself is nonlinear. Recurrence plot analysis is a robust nonlinear framework that helps us to visualize and quantify a system’s underlying dynamics.

The underlying dynamics of a system can be qualitatively evaluated by looking at the pattern of a recurrence plot. For a quantitative understanding, the line structure of a recurrence plot is used.

The conventional recurrence plot analysis is applying Euclidean or any other norm in the system’s phase space to identify recurrences. However, the standard recurrence plot analysis is not suitable for analyzing extreme event-like data because the rarity of such events leads to significant gaps in the data. For such data, the edit distance method is suitable to identify the recurrences in extreme event-like data. The edit distance method was originally proposed as a metric to study the similar patterns between spike events, in the context of neuroscience. In this method, the time series, or the series of events, is divided into small time segments. Then, the similarity between a pair of time segments is calculated using three elementary operations – shifting in time, deletion, and insertion of events.

In this new study, the existing edit distance method is improved for extreme event-like data by introducing a nonlinear function that incorporates a temporal tolerance to deal with the quasi-periodic nature of real-world extreme events.

The proposed modified measure is demonstrated on prototypical examples that mimic certain behaviors of extreme natural events. Finally, it is applied to study flood events of the the Mississippi river and revealed a significant serial dependency of flood events. Such a finding suggests some critical implications, like the quasi-periodic occurrence of flood events due to the nonlinear interplay between its drivers.

Understanding the transport of tracers and particulates is an important topic in oceanography and in fluid dynamics in general. The trajectory of an individual fluid parcel will in many cases strongly depend on its initial condition, i.e. the flow is chaotic. At the same time, on a more macroscopic level, many flows possess some form of structure that is less sensitive to the initial conditions of the individual parcels. This structure is determined by the collective behaviour of groups of parcels for intermediate or long times.

An example for such macroscopic structure in geophysical flows are eddies. In the ocean, mesoscale eddies (at the order of 10-100 km) are well-known for capturing water masses while being transported by a background flow. For describing the pathway of a fluid parcel that is captured in an eddy, what really matters is the motion of the entire eddy in the background flow, and not so much where exactly that parcel is within the eddy. We can simplify the problem by saying that all parcels in the eddy approximately go the same pathway, i.e. the particles stay approximately coherent over a certain time interval. Such sets of fluid parcels (or fluid volume) have therefore been termed “finite-time coherent sets” or “Lagrangian coherent structures” in the fluid dynamics community.

In our article, we explore a density-based clustering technique, the so called OPTICS algorithm (Ordering Points To Identify the Clustering Structure) published by Ankerst et. al in 1999, for the detection of such finite-time coherent sets. The goal of density-based clustering is simple: find groups of points that are densely distributed, i.e. those points that are all close to each other. We take modelled trajectories of fluid parcels and represent them as points in a high dimensional Euclidean space. In this way, two points in that space that are very close in terms of their Euclidean distance correspond to parcels that stay close to each other along their entire trajectory. Once this is done, OPTICS does the rest. In the form we propose, the method does not need any sophisticated pre-processing of the trajectory data. What’s also nice about OPTICS is that it is available in the scikit-learn library of Python, so it is quite straightforward to use.

What OPTICS does is that it takes the data and creates a reachability plot. This is a quite condensed visualization of how similar fluid trajectories are – condensed because it is a one-dimensional graph defined on the trajectories. OPTICS creates an ordered list of the trajectories in such a way that densely populated regions are close to each other in this list. Finite-time coherent sets can then simply be identified by examining the “topography” of this plot, i.e. at the troughs and crests. An example for a reachability plot for a model flow containing an atmospheric jet and vortices, the Bickley Jet model flow, can be seen in the first column of the figure above. One can obtain clustering results by thresholding the reachability value (the y-axis of that plot) at a specific value , and then identify connected regions below the line as a coherent set. This method is also known as DBSCAN clustering, but what is special about OPTICS is that multiple DBSCAN clustering results (i.e. for different horizontal lines) can be obtained from one reachability plot.

Two things are special about OPTICS that make it specifically usable for the situations in fluid dynamics. First, it has an intrinsic notion of coherence hierarchies. We can see this by looking at the different rows in the figure, where the clustering result for different choices of are shown. For a large (first row in the figure), we really only see the very large-scale structure of the jet separating the northern and southern parts of the fluid. Decreasing is then similar to using a magnifying glass: if we look closer, we identify smaller individual eddies in the northern and southern part of the flow. The second useful property of OPTICS is that not every point has to be part of a cluster. In fact, in the second and third rows of the figure, the grey points are identified as noise, i.e. they do not belong to any coherent set. This is different from many recent approaches that rely on graph partitioning algorithms for cluster detection. There, every point has to be part of a coherent set, which strongly limits the applicability to realistic geophysical flows. In our article, we apply OPTICS also to modelled trajectories in the Agulhas region, and find, as expected, Agulhas rings.

We show in our paper that a 20-year-old algorithm can be very successful in detecting finite-time coherent sets, even in a purely data-driven form, i.e. with very little additional heuristics or pre-processing of the data. It might well be that there exist even better algorithms that are suited for research questions in fluid dynamics. The Lagrangian fluid dynamics community should therefore explore more existing methods and algorithms from data sciences as these have the potential to greatly improve our understanding of fluid flows.

In geophysics, forecasting is based on solving the equations of physics with the help of a computer. To calculate a forecast we need an initial condition. Estimating this condition is difficult, however, because in general the observations available are few or heterogeneously distributed in space and time. To achieve this, a reference algorithm is given by the Kalman filter. This algorithm is based on temporal propagation during the prediction stage; and the updating of the prediction and analysis error covariance matrices during the analysis stage.

While the formalism of the Kalman filter is based on simple formulas of linear algebra, its practical implementation faces two pitfalls :

Indeed, in the systems of interest to us, these matrices are very large and it is impossible to compute their temporal propagation during forecasting. The ensemble Kalman filter, by approximating the covariance matrices by an ensemble estimation, offers a way to propagate the covariances from the forecast of each member of the ensemble.

Secondly, while part of the forecast error statistics can be explained by the propagation of initial uncertainties – the predictability error – , another contribution, linked to the defect in the numerical model – the model error – is much more difficult to characterize. To go further in understanding model error, the working hypothesis of a decorrelation between the predictability error and the model error is often introduced, which then leads to decompose the prediction error covariance matrix as the sum of the predictability error and model error covariance matrices. Despite this assumption it is very difficult to characterize the model error statistics, and moreover the predictability error is never decorrelated from the model error.

The objective of the paper is to characterize the model-error covariances related to the discretization of physics equations, i.e. the errors that emerge during the transition from the mathematical formalism of physics equations to their implementation on a computer and then to their numerical resolution. To achieve this, we relied on a new method: the parametric Kalman filter (PKF). The PKF is an implementation of the Kalman filter, in which covariance matrix are approximated by a covariance model characterized by a set of parameters (Pannekoucke et al. , 2016; Pannekoucke et al. 2018). Thus, by describing how the parameters evolve over time, we can describe, in an approximate way, the evolution of the full covariance matrix.

By revisiting the formalism of the model-error covariance matrix and using the PKF, we have characterised one of the observed defects in the prediction of the chemical composition of the atmosphere where it appeared that the variances of the forecast error estimated by an ensemble decrease abnormally with time, a phenomenon called the loss of variance. We have shown that this loss of variance is related to a diffusive effect – the origin of which appears when determining the modified equation which is the differential equation whose solution is the numerical prediction – as shown in Figure 1. The panel (a) shows the real evolution of the concentration of a chemical compound transported by a heterogeneous wind and (b) the numerical prediction calculated from a simple numerical scheme: it is observed that the intensity of the numerical solution decreases abnormally compared to the theoretical solution. With the PKF equations, we have characterized the evolution of the model error variance (panel c), which we have shown to be coupled with the evolution of the anisotropy of the correlation functions (characterized by the significant correlation scale, panel d). This is the first time that we have been able to characterize the properties of the model error covariance matrix linked to the defects of the numerical resolution scheme.

This article has not only enabled us to understand a behaviour observed in practice (here the loss of variance), but above all it has opened up a new theoretical avenue of exploration for the characterization of model error covariances.

**References**

O. Pannekoucke, S. Ricci, S. Barthelemy, R. Ménard, and O. Thual, “Parametric Kalman Filter for chemical transport model,” Tellus, vol. 68, p. 31547, 2016, doi: 10.3402/tellusa.v68.31547.

O. Pannekoucke, M. Bocquet, and R. Ménard, “Parametric covariance dynamics for the nonlinear diffusive Burgers’ equation,” Nonlinear Processes in Geophysics, vol. 2018, pp. 1–21, 2018, doi: https://doi.org/10.5194/npg-2018-10.

]]>Ensemble Forecasting rose with the understanding of the limited predictability of weather. In a perfect ensemble system, the obtained ensemble of forecasts expresses the distribution of possible weather scenarios to be expected. However, operational forecasts of near-surface weather elements are often underdispersive and underestimate forecast errors.

At Deutscher Wetterdienst (DWD) a MOS system has been developed, that corrects for systematic errors of the numerical ensemble systems ECMWF-ENS and COSMO-D2-EPS. It calibrates probability forecasts to observed relative frequencies with a focus on severe weather. The calibrated event probabilities can be used for qualified decisions in terms of cost-loss evaluations that relate forecasts of harmful weather to economic value.

The basic concept of the MOS system presented in the paper is to use ensemble mean und spread as predictors in multiple linear and logistic regressions. The use of ensemble products as predictors instead of processing each ensemble member individually prevents difficulties with underdispersive statistical results and underestimated errors especially for longer forecast horizons. During multiple regressions, the system selects most relevant predictors based on statistical tests and is able to correct even for conditional biases, therefore. It is possible to use the latest available observations as predictors for short-term forecasts or to use the previous statistical forecasts as predictor for the next time step in order to exploit the persistency of the weather.

Extreme events are most relevant for meteorological warnings. They are (fortunately) rare, however, and long time series are required to capture a sufficiently large number of observed events in order to derive statistically significant estimations. Currently, time series of 8 years of ensemble and observation data are used for training. Model changes have been found less harmful than insufficient data. Extreme events with 40 mm or 50 mm precipitation per hour rarely appear at certain stations, which may result in statistical models that permanently predict 0% probability. Clusters of stations are gathered for a combined modelling of rare events, therefore. The clusters are defined according to the climatology of the stations. In order to compute statistical forecasts on a regular grid, the MOS-equations of the clusters are evaluated for locations apart from the training and observation sites. Figure 1 shows resulting gridded forecasts of wind gust probabilities as an example. Strong gusts appear most probably over sea and at higher elevations according to this forecast.

]]>Dylan is a postdoctoral fellow within the Oceans and Atmosphere business unit of CSIRO (Australia). His current research focuses on methods for learning reduced-order models from data, and their applications in studying causal relationships in the climate system.

Terry is leader of the climate forecasting team at CSIRO. Along with Adam Scaife (UK Met Office), he is a current co-chair of the WCRP Grand Challenge in Near Term Climate Prediction. His current interests and research are in coupled data assimilation and ensemble prediction, climate dynamics and causality, and application of statistical dynamics to geophysical fluids.

A familiar challenge in climate science is the need to extract information from very high-dimensional datasets. To do so, the first step is usually the application of a method to reduce the dimension of the data down to a much smaller number of features – that is, combinations of the original variables – that are more amenable to study. The importance of identifying a small set of features that best capture the salient information in the data was recognized early on by Lorenz, among others, whose work on the use of so-called empirical orthogonal functions (EOFs) in statistical weather prediction provided the impetus for widespread adoption of the technique among meteorologists and climate scientists. Nowadays, EOF analysis is one of the most frequently used exploratory tools in the climate scientist’s toolbox.

In the intervening years since Lorenz considered the problem, an extensive literature has developed on a wide range of dimension reduction methods where typically some additional pre-filtering of the data is applied before targeting features relevant to the chosen spatio-temporal scales. Examples from this diverse set of methods include vector quantization, based on clustering methods such as k-means, which encodes a given datapoint by assigning it the label of the closest member in a set of a small number of prototypical observations. EOFs represent the data in terms of linear combinations of orthogonal basis vectors, while archetypal analysis (AA) uses basis vectors chosen to lie on the convex hull – the observed “extremes” – of the data. Although conceptually all of these methods are quite different, they may all be formulated in terms of finding a factorization of the observed design matrix into lower rank factors that optimizes a particular objective function, subject to different constraints on the optimal factors.

An important consideration in choosing among different such matrix factorizations is how meaningful the resulting representation will be in context; for instance, can the results be directly mapped to distinct physical modes of variability? In “Applications of matrix factorization methods to climate data”, we highlight this through a set of case studies in which the relevant features manifest in dramatically different ways, meaning that certain methods tend to be more useful than others. In sea surface temperature (SST) data, key modes such as El Nino correspond to large temperature anomalies. As a result, describing an SST map in terms of a basis of extreme points, as provided by AA or by related convex codings, is effective in extracting recognizable physical modes. This is not the case when the features of interest do not lie on the boundaries of the observed data, as in the example of quasi-stationary weather patterns. Since these structures are characterized by their recurrent and persistent dynamics, vector quantization methods are more easily interpreted. The accompanying figure shows a four-dimensional basis that results from using k-means, AA (second column), and two different convex codings (third and fourth columns) on Northern Hemisphere geopotential height anomalies. Where a prototypical blocking pattern is clearly evident (cluster 3) in the k-means basis, only by imposing some level of regularization (as in the fourth column) do the methods based on convex encodings yield a similarly direct identification of blocking events.

With the development of many alternative dimension reduction techniques, the climate scientist’s toolbox is increasingly well equipped for extracting and summarizing complex structural information from large, high-dimensional datasets. We emphasize that it is essential when selecting a method to take into account the nature of the representation that results, and how well this aligns with the features of interest. Or, in other words, to choose the right tool for the job.

]]>The ability to know the future has long been sought after and coveted. Yet, in contrast to prophecies and crystal balls, modern methods of prediction are subject to scientific evaluation and verification. In ensemble forecasting of physical processes, diagnostic tools and metrics are employed to assess the performance of a forecast as well as establish a common point of comparison with other forecast methodologies. There are many well established verification methods available for ensemble forecasts of univariate quantities, i.e. those which study each location and each forecast lead time separately. For example, the verification rank histogram discussed in Hamill (2001) is used to diagnose errors in an ensemble’s mean and spread. For physical processes like precipitation, however, it is also important that forecasted accumulations over a spatial domain are predicted accurately. This is an added difficulty which requires adequate representation of spatial structure by the ensemble.

In “Beyond univariate calibration: verifying spatial structure in ensembles of forecast fields” (https://doi.org/10.5194/npg-27-411-2020), we study the properties of the fraction of threshold exceedance (FTE) histogram, a new diagnostic tool designed for verification of spatially indexed ensemble forecast fields. The FTE is calculated for each ensemble member and the corresponding verification field separately as the fraction of grid points where a prescribed threshold (e.g. 10 mm) is exceeded. The idea is to establish a mapping from a multivariate quantity (i.e., a spatial field) to a univariate quantity (i.e., the FTE) which can then be studied using familiar tools like the verification rank histogram. Using a threshold approach makes the FTE straightforward to interpret and highly relevant to practitioners who are interested in studying the different levels of a weather element separately. Common examples include occurrence of precipitation, above average temperature, and high wind speeds. However, it is not obvious whether FTE histograms are sufficiently sensitive to misrepresentation of the spatial structure by the ensemble, and the goal here is to investigate this discrimination ability in detail.

The spatial structure of a field is an abstract quantity that cannot be measured or directly characterized. In simulations, spatial structure is modeled via the correlation length parameter of a covariance function. In order to study the discrimination ability of FTE histograms, we perform a study on simulated fields where we can control the degree to which the spatial structure of the ensemble is miscalibrated through the correlation length parameters of the simulated verification and ensemble mean. When the verification and ensemble fields have the same correlation length, the ensemble is said to be calibrated. This should result in a uniform FTE histogram — of course, such perfect calibration is generally not the case. Possible deviations from a uniform FTE histogram are listed in our Table 1, along with interpretations as to the type of spatial miscalibration in the ensemble. For instance, the top row of the accompanying Figure illustrates how the FTE histogram takes a cup-shape when the correlation length of the ensemble is too small, resulting in excessive spatial variability in the ensemble members. The reverse effect can be observed on the bottom row of the same Figure.

Looking at the ensemble members in the Figure, it’s likely that a practitioner would not need an FTE histogram to determine the type of spatial miscalibration in this case. The true utility of the FTE histogram is realized when the miscalibration is not visually obvious. Critically, we find that FTE histograms accurately identify even minor disagreements with the true correlation length (e.g., 10% miscalibration) in ensemble forecasts, and this conclusion was consistent across a range of thresholds and domain sizes. Having established the discriminative ability of the FTE histogram in our simulation study, we applied the method in a data example with downscaled precipitation forecast fields. There we found that the FTE metric pointed to some shortcomings of the underlying spatial disaggregation algorithm during the seasons where precipitation is driven by local convection, a result that was not apparent from univariate verification metrics alone.

]]>Sebastian Lerch is a researcher at the Faculty of Mathematics of the Karlsruhe Institute Technology (KIT). He has a background in mathematics and statistics, his research interests include probabilistic forecasting, forecast evaluation, and the development of statistical and machine learning models for applications in the environmental sciences. The paper presented here is joint work with Sándor Baran (University of Debrecen), Annette Möller (Technical University of Clausthal), Jürgen Groß (University of Hildesheim), Roman Schefzik (German Cancer Research Center), Stephan Hemri (Federal Office of Meteorology and Climatology MeteoSwiss) and Maximiliane Graeter (KIT).

Most weather forecasts are based on ensemble simulations of numerical weather prediction models. Despite continued improvements, these ensemble predictions often exhibit systematic errors that require correction via statistical post-processing methods. Most post-processing methods utilize statistical and machine learning techniques to produce probabilistic forecasts in the form of full probability distributions of the quantity of interest. The focus usually is on univariate approaches where ensemble predictions for different locations, time steps and weather variables are treated independently. However, many practical applications of weather forecasts require one to accurately capture spatial, temporal, or inter-variable dependencies. Important examples include hydrological applications, air traffic management, and energy forecasting. Such dependencies are present in the physically consistent raw ensemble predictions but are lost if standard univariate post-processing methods are applied separately in each margin.

Over the past years, a variety of multivariate post-processing methods has been proposed. Most of these methods follow a two-step strategy. In a first step, univariate post-processing methods are applied independently in all dimensions, and samples are generated from the obtained probability distributions. In a second step, the multivariate dependencies are restored by re-arranging the univariate sample values with respect to the rank order structure of a specific multivariate dependence template. Popular established methods include ensemble copula coupling (ECC), where the dependence template is learned from the raw ensemble predictions, the Schaake shuffle (SSh), where the dependence template is learned from past observations, and the Gaussian copula approach (GCA), where a parametric dependence model is assumed. Over the past years, several extensions of those approaches, in particular of ECC, have been proposed, however, the literature lacks guidance on which approaches work best for which situations. Therefore, the overarching goal of our paper is to provide a systematic comparison of state-of-the-art methods for multivariate ensemble post-processing.

To achieve this, we propose three simulation settings which are tailored to mimic different situations and challenges that arise in applications of post-processing methods. In contrast to case studies based on real-world datasets, simulation studies allow one to specifically tailor the multivariate properties of the ensemble forecasts and observations and to readily interpret the effects of different types of misspecifications on the forecast performance of the various post-processing methods.

Overall, we find that the Schaake shuffle constitutes a powerful benchmark method that proves difficult to outperform, except for naive implementations in the presence of structural change (for example, time-varying correlation structures). Not surprisingly, variants of ensemble copula coupling typically perform the better the more informative the ensemble forecasts are about the true multivariate dependence structure. A particular advantage is the ability to account for flow-dependent differences in the multivariate dependence structure if those are (at least approximately) present in the ensemble predictions. Perhaps not surprisingly, the results generally depend on the simulation setup, and there is no consistently best method across all settings and potential misspecifications of the ensemble predictions. Nonetheless, the simulation studies offer some guidance, in particular on which of the ECC variants perform best for general types of misspecifications.

The computational costs of all presented methods are not only negligible in comparison to the generation of the raw ensemble forecasts, but also compared to the univariate post-processing as no numerical optimization is required. It may thus be generally advisable to compare multiple multivariate post-processing methods for the specific dataset and application at hand. An important aspect for future work will be to complement the comparison of multivariate post-processing methods by studies based on real-world datasets of ensemble forecasts and observations.

]]>