A leap of faith: Should we trust AI with a million-year problem?

Photo by Dan Meyers. Source: Unsplash

Artificial intelligence (AI) has been here a while, and it isn’t going anywhere, not any time soon. It has become an integral part of many lives and businesses. When I speak of AI, I am not referring to GenAI (generative AI) that writes your emails for you: Think about the algorithms that suggest what movie you should watch next, the voice assistant that adds milk to your shopping list, and the powerful systems that help doctors diagnose diseases. They all feel like a natural part of our digital existence. For most of these things, we’re happy to use what we call “black box” AI. We don’t really need to know how Netflix decided we’d like that obscure documentary about a certain serial killer; we just care that the recommendation was good. We trust the output, even if the inner workings are somewhat of a mystery.

But what happens when the stakes are higher? What if the decision isn’t about a movie, but about the long-term safety of a community? What if the problem is not about diagnosing a disease, but about building a permanent home for something so dangerous it needs to be isolated from all life for a million years? When the consequences of a wrong guess are catastrophic and irreversible, the black box approach suddenly feels a lot less comfortable. “Trust me” is simply not an option.

This is the central, critical tension that a recent paper by Degen et al. (2025), published in EGU’s journal Solid Earth, seeks to resolve. While the title might sound intimidatingly academic: “About the trustworthiness of physics-based machine learning: considerations for geomechanical applications”, I found out after reading it that its core message is anything but! It’s a roadmap for how we can build a new kind of artificial intelligence, one that is not only brilliant but also transparent, accountable, and fundamentally trustworthy. It’s a necessary step forward for anyone involved in managing our planet’s biggest challenges, from policymakers to engineers, and for the public who needs to have confidence in their decisions.

The million-year problem

Let’s take a closer look at the problem at the heart of this research: the permanent disposal of high-level nuclear waste. This isn’t just about digging a hole and burying it. The task requires finding a geological formation deep underground, say hundreds of meters below Earth’s surface, that can remain stable for a period longer than human civilization has existed. The host rock, whether it’s salt, clay, or crystalline granite, must be able to withstand the heat generated by the radioactive waste, as well as the immense, fluctuating pressures from the surrounding earth, without cracking, shifting, or allowing groundwater to reach the canisters. Believe me, I got the chills thinking about all the ways this can go wrong!

To find such a site, and to be absolutely sure of its long-term integrity, geoscientists and engineers rely on powerful computer simulations. These models are a form of geomechanical engineering, essentially a digital twin of the subsurface. They simulate the forces, stresses, and strains on the rock over millennia. They predict how the rock will deform, how fractures might open, and how fluids might move through the system. This is the geoscientist’s crystal ball, peering into a future we can only imagine! But here’s the rub: The subsurface is a deeply uncertain place (yes, and you can read here about one of the reasons why I am using the term “uncertain”). We can’t know the precise stiffness of every rock layer, the exact location of every tiny fault, or the subtle variations in pressure that build up over time. To account for this uncertainty, a single simulation isn’t enough. Scientists have to run thousands of them, each with slightly different input parameters, to understand the full range of possible outcomes. This process, known as uncertainty quantification, is the gold standard for decision-making in a high-stakes environment. It’s the only way to move from a single best guess to a probabilistic understanding of the risks. As the authors write in their paper,

[…]many current studies consider best-case and worst-case scenarios only. A probabilistic uncertainty quantification approach allows not only the most likely scenario to be provided but also the associated range of uncertainties and their probability of being encountered. This is important for the planning of nuclear waste disposal sites since best- and worst-case analyses tend to estimate extreme values that have a low probability of being encountered.

Now, we don’t want that to be the “best” way to handle nuclear waste, do we?

A more nuanced approach to nuclear waste

The paper suggests moving beyond the old way of thinking, which usually focuses on “best-case” and “worst-case” scenarios. These extreme values are, more often than not, statistically improbable. The new approach presented by the authors, using probabilistic uncertainty quantification, provides a more realistic and nuanced picture, which allows for more efficient planning and a better use of resources without compromising safety. Thus, to showcase the capabilities and trustworthiness of their new method, the researchers developed and tested it across a series of detailed case studies below:

Case 1: Changing the boundary conditions

Imagine you have a block of concrete, and you’re pushing on its sides with a certain amount of force. The “boundary conditions” are these external forces. This case tested what happens when the model has to deal with uncertainty in these external forces.

The results were excellent. The AI model was extremely accurate and used very little computational power. The most important finding was that the model correctly showed that the larger external force had a much bigger impact on the internal stress of the rock. This proved that the model not only gets the right answer, but also understands the underlying physics of the problem.

Case 2: Changing the material properties

This time, the researchers tested a more complex scenario. Instead of just one type of concrete, imagine the block is made of a layered cake of different materials, with varying stiffness and density. The “material properties” are these inherent characteristics of each layer. This case tested how the model would handle uncertainty in these internal rock properties.

Again, the model was highly accurate. It required a little more computational complexity than the first case, but it handled it gracefully. The key insight from this test was that the model correctly identified which properties were most important. It showed that vertical stress is mainly controlled by density, which makes sense because gravity is pulling the rock down. For horizontal stress, the stiffest layer of rock had the most influence. This kind of insight is invaluable for engineers because it tells them exactly where to focus their efforts.

Case 3: Overcoming geometrical hurdles

This was the trickiest challenge. The researchers varied the thickness and depth of the rock layers, which caused the underlying structure of the data to change. This made the model struggle to accurately predict the stress at the points where the layers met.

The researchers solved this with a clever fix. They “re-formulated” the problem by making sure the model always had a consistent number of data points for each rock layer, regardless of its thickness. Once this was done, the model’s accuracy returned to the same high level as in the other cases. This proved that while the AI is powerful, how you set up the problem for the model is just as crucial as the model itself.

Case 4: Simultaneously changing boundary conditions and material properties

This case combined the challenges of the first two cases, testing how the model performs when both the external forces and the internal material properties vary at the same time.

Surprisingly, the combined problem didn’t significantly increase the model’s complexity compared to Case 2. The AI was still highly accurate and efficient. The key takeaway here was that the model was able to clearly show the relative importance of each parameter. It could determine that the external forces had a similar impact on the material properties of the stiffest rock, which is a powerful insight that would have been impossible to get with a traditional model.

Case 5: Simultaneously changing all three sources of uncertainty

This was the ultimate test. It combined all three sources of uncertainty—boundary conditions, material properties, and geometry—into one complex problem.

The model once again performed with high accuracy. While the complexity of the problem slightly increased the model’s error for certain predictions, it was still able to identify which parameters mattered most. For the horizontal stress components, the material properties and boundary conditions had a much higher impact than the geometry. This shows that the model is robust and can handle the most realistic and challenging scenarios.

Case 6: Nördlich Lägern

This final section shows the AI model being applied to a real-world problem. With the consideration of a potential disposal site at Nördlich Lägern, their analysis revealed something surprising and valuable: The stress predictions were less impacted by changes in the properties of the Opalinus clay, which is the rock layer being considered for the repository. Instead, the model showed that the stiff rock units had the largest impact on the maximum stress component. This is a game-changing insight. It tells engineers that they should focus their efforts on collecting more precise data about these surrounding layers, not just the clay. This kind of targeted information can lead to a safer, more efficient, and more trustworthy repository design

The solution? A hybrid of brains and brawn

This is where Degen et al. (2025) enter the picture with a brilliant solution. They didn’t try to build a faster supercomputer; they built a smarter one. Their paper introduces a new type of hybrid machine learning, specifically the non-intrusive reduced-basis method, designed to act as a “surrogate model” for these computationally expensive simulations.

Think of it like this: a traditional simulation is like a master architect painstakingly drawing a massive blueprint by hand, detailing every single beam and bolt. The Degen et al. model is like an apprentice who, after carefully studying a few of the master’s hand-drawn blueprints, learns the underlying rules and principles of architecture. This apprentice can then sketch a new, accurate blueprint in seconds, because they understand the physics of how the structure will behave.

The power of trust

With this newfound speed, the team and other researchers can now do what was previously impossible, and this is where the real significance of the paper lies.

Comprehensive uncertainty quantification: They can now run tens of thousands of simulations in the time it would have taken to run just one. This allows for a complete, statistically rigorous assessment of all the geological uncertainties. Instead of a single number, they can produce a probability distribution of potential outcomes. This is not just a better prediction; it’s a completely different kind of knowledge. It’s the difference between saying, “The bridge will stand,” and saying, “The bridge has a 99.999% chance of standing under these conditions, but here is the 0.001% risk scenario and what would cause it.”
Global sensitivity analysis: A crucial next step is to understand which of the many uncertain parameters are the most important. Is the final stress prediction more sensitive to the porosity of the rock or the strength of a fault? Before Degen et al., figuring this out would have required an impossibly large number of simulations. Now, with the rapid-fire surrogate model, they can quickly test the influence of every parameter and identify the “weak links” in their model. This allows scientists to focus their limited resources on getting the best data for the most critical factors.
Building public and political trust: This might be the most important impact of all. In a society that is increasingly polarized and skeptical of large-scale government projects, especially those with such a long time horizon, transparency is paramount. The Degen et al. method allows scientists to not only make a prediction but to also show their work. They can explain exactly which uncertainties they quantified, which parameters matter most, and what the full range of possible outcomes looks like. This isn’t just a scientific advance; it’s a civic one. It provides the bedrock of verifiable, explainable science needed to build consensus and public trust for decisions that will affect our world for millennia.

A universal blueprint

While the paper’s primary focus is on nuclear waste disposal, its underlying principles of physics-based machine learning offer a transformative blueprint for computational science with far-reaching applications. This technology could be a game-changer for geothermal energy, optimizing drilling and operations by rapidly simulating fluid and heat flow through complex rock formations. In the realm of carbon sequestration, it could predict the long-term stability of underground reservoirs, ensuring captured carbon dioxide remains securely stored. Civil engineers could leverage it to design safer and more resilient structures, such as tunnels and bridges, by quickly assessing their behavior under various geological and seismic conditions. Ultimately, this approach could also revolutionize resource extraction and enable more efficient and safer recovery of oil, gas, and minerals by providing a deeper understanding of the subsurface’s geomechanical responses.

In a world full of big, complex problems, we may feel like we’re navigating in the dark. This paper, in its quiet, academic way, has provided us with a new, powerful headlight. It has shown us that AI doesn’t have to be a mysterious black box. By integrating it with the fundamental laws of nature, we can create tools that are not only faster and more powerful but also more transparent and, ultimately, more trustworthy. It’s a leap of faith, but one grounded in the solid earth of verifiable science, exactly the kind of faith we need to solve the impossible.