Machine learning has become increasing popular across science, but do these algorithms actually “understand” the scientific problems they are trying to solve? In this article we explain physics-informed neural networks, which are a powerful way of incorporating physical principles into machine learning.

#### A machine learning revolution in science

Machine learning has caused a fundamental shift in the scientific method. Traditionally, scientific research has revolved around theory and experiment: one hand-designs a well-defined theory and then continuously refines it using experimental data and analyses it to make new predictions.

But today, with rapid advances in the field of machine learning and dramatically increasing amounts of scientific data, data-driven approaches have become increasingly popular. Here an existing theory is not required, and instead a machine learning algorithm can be used to analyse a scientific problem using data alone.

#### Learning to model experimental data

Let’s look at one way machine learning can be used for scientific research. Imagine we are given some experimental data points that come from some unknown physical phenomenon, e.g. the orange points in the animation below.

A common scientific task is to find a *model* which is able to accurately predict new experimental measurements given this data.

One popular way of doing this using machine learning is to use a neural network. Given the location of a data point as input (denoted ), a neural network can be used to output a prediction of its value (denoted ), as shown in the figure below:

To learn a model, we try to tune the network’s free parameters (denoted by the s in the figure above) so that the network’s predictions closely match the available experimental data. This is usually done by minimising the mean-squared-error between its predictions and the training points;

The result of training such a neural network using the experimental data above is shown in the animation.

#### The “naivety” of purely data-driven approaches

The problem is, using a purely data-driven approach like this can have significant downsides. Have a look at the actual values of the unknown physical process used to generate the experimental data in the animation above (grey line).

You can see that whilst the neural network accurately models the physical process within the vicinity of the experimental data, it fails to *generalise *away from this training data. By only relying on the data, one could argue it hasn’t truly “understood” the scientific problem.

#### The rise of scientific machine learning (SciML)

What if I told you that we already knew something about the physics of this process? Specifically, that the data points are actually measurements of the position of a damped harmonic oscillator:

This is a classic physics problem, and we know that the underlying physics can be described by the following differential equation:

Where is the mass of the oscillator, is the coefficient of friction and is the spring constant.

Given the limitations of “naive” machine learning approaches like the one above, researchers are now looking for ways to include this type of prior scientific knowledge into our machine learning workflows, in the blossoming field of scientific machine learning (SciML).

#### So, what is a physics-informed neural network?

One way to do this for our problem is to use a *physics-informed neural network* [1,2]. The idea is very simple: add the known differential equations directly into the loss function when training the neural network.

This is done by sampling a set of input training locations () and passing them through the network. Next gradients of the network’s output with respect to its input are computed at these locations (which are typically analytically available for most neural networks, and can be easily computed using autodifferentiation). Finally, the residual of the underlying differential equation is computed using these gradients, and added as an extra term in the loss function.

Let’s do this for the problem above. This amounts to using the following loss function to train the network:

We can see that the additional “physics loss” in the loss function tries to ensure that the solution learned by the network is consistent with the known physics.

And here’s the result when we train the physics-informed network:

#### Remarks

The physics-informed neural network is able to predict the solution far away from the experimental data points, and thus performs much better than the naive network. One could argue that this network does indeed have some concept of our prior physical principles.

The naive network is performing poorly because we are “throwing away” our existing scientific knowledge; with only the data at hand, it is like trying to understand all of the data generated by a particle collider, without having been to a physics class!

Whilst we focused on a specific physics problem here, physics-informed neural networks can be easily applied to many other types of differential equations too, and are a general-purpose tool for incorporating physics into machine learning.

#### Conclusion

We have seen that machine learning offers a new way of carrying out scientific research, placing an emphasis on learning from data. By incorporating existing physical principles into machine learning we are able to create more powerful models that learn from data and build upon our existing scientific knowledge.

#### Our own work on physics-informed neural networks

We have carried out research on physics-informed neural networks! Read the following for more:

Moseley, B., Markham, A., & Nissen-Meyer, T. (2021). Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations. *ArXiv*.

Moseley, B., Markham, A., & Nissen-Meyer, T. (2020). Solving the wave equation with physics-informed deep learning. *ArXiv*.

#### References

1. Lagaris, I. E., Likas, A., & Fotiadis, D. I. (1998). Artificial neural networks for solving ordinary and partial differential equations. *IEEE Transactions on Neural Networks*.

2. Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. *Journal of Computational Physics.*

Physics problem inspired by this blog post: https://beltoforion.de/en/harmonic_oscillator/

Hi Ben, an awesome attempt at simplifying the issue. I didn’t get the concept of simply adding the “physics” to the cost function … wouldn’t this imply that you know the answer upfront!? (in your example: the 1d wave equation” … extending on that, would it be possible to give ML access to a huge library of all known physical problems in literature (which are limited) and let ML and the “right physics” to the cost function … maybe we should have another layer of optimization that takes care of this step …

anyway fascinating subject.

Hi Amine, yes in the blog post I simply used the PINN to solve the differential equation (which can be useful by itself; i.e. the experimental data points serve as the boundary condition of the PDE). PINNs are also frequently used for inversion, where parameters in the PDE are jointly optimised alongside the network parameters (e.g. in the harmonic oscillator, one could simultaneously learn the mass/spring constant/friction coefficient too); other work has done exactly as you have described, trying to learn the actual operators in the PDE by using bi-level (sparse) optimisation: (e.g. https://arxiv.org/abs/2005.03448). There is much more to explain about PINNs that I couldn’t cover at this level, perhaps I need some more blog posts!

Dear Ben,

Thanks for the amazing blogpost. Very informative. I work in computational electromagnetics and I was wondering if it is possible (or feasible) to solve an inverse problem associated with these non-linear differential equation? For example, what if we want to estimate any parameter (which is a function of space) given the physical quantity as input? For the above example of harmonic oscillator, can we estimate the coefficient of friction at different locations given the physical quantity such as speed as a function of space? Ovbiously, most inverse problems are ill-posed so we have to assume we have fewer measurements of speed (at multiple locations) and we want to estimate friction coefficient at more number of points.

Hi Amartansh,

Thanks a lot for your comment! Yes, PINNs can (and are frequently) used for inverse problems too. This is very simple conceptually – one just optimises the unknown parameters of the PDE alongside the free parameters of the network when training the network. So in the harmonic oscillator example, we could easily learn the coefficient of friction too (assuming we have enough real data points so that the inverse problem is not ill-posed, as you mention). We could also learn an unknown PDE function which varies in space too. One way would be to have another network which defines this function (and then we backpropagate through it to jointly optimise its weights and the PINN’s weights together).

This is really great and thanks for the example – helps to understand the post.

I have another question – e.g. I do not know the exact differential equation, but i know something, e.g. dimensional reasoning. i know that this problem can be described using dimensional analysis and presented as 3 or 4 non-dimensional parameters. What would be the right way to discover those combinations of dimensional parameters?

Hi Alex,

That is a really interesting idea. PINNs have also been used for learning underlying equations themselves, e.g. https://arxiv.org/abs/2005.03448. This paper uses a PINN to estimate the gradients with respect to the input coordinates of the real data points, and then does a sparse optimisation over linear combinations of these gradients to try to discover the underlying differential equation. I think doing this, and then adding a dimensional analysis constraint during the optimisation would be really interesting!

Thanks for the nice article. Just a correction that the original PINNs paper (two of them for foreward and inverse problems) was first published in 2017 in the arxiv. We didi the work in 2015-2016 time preiod and all this work is documented in the DARPA EQUiPS reports.

Thank you for nice information, Visit our web:

https://uhamka.ac.id/

Thanks for the great content! Just one question: how can we extend physics-informed machine learning models for the experimental cases where we don’t have any differential or state equations? I’m working with machine learning models for the experimental data obtained from the machinery. Do I need to have a mathematically model of the entire system in order to employ the physics-informed ML model?

Hi Ali, yes for a standard PINN, one must know the underlying differential equations so they can be used in the loss function. However, some current work tries to simultaneously train the PINN and learn the underlying differential equations at the same time, e.g. see: https://arxiv.org/abs/2005.03448

Thank for the nice comparison. However, since the considered case a dynamical system, have you compared it with Neural-ODE. Particularly, the considered example is of second-order, there is an extension of neural-ODEs for second-order systems as well (https://arxiv.org/abs/2006.07220). I would be quite interested to see this comparison than a classical NNs for an extrapolation.

Great idea Pawan. I am not an expert on neural ODEs, but to the best of my understanding, they use a neural network to describe terms in the underlying PDE, i.e., training the neural ODE would amount to learning terms in the PDE. Furthermore, a standard ODE solver is used to solve PDE during training. The difference with PINN is that no ODE solver is required, i.e. we just directly train the weights of the network. Agree, some more comparison work would be great!

what happens to the physics informed nn when you give it fewer training points? Is there a reason 9 need to be used?

Great question Philip. You can think of the PINN as a method for solving the underlying PDE. Thus, we need to provide appropriate boundary/ initial conditions so that the solution learned is unique. In PINNs, this is usually done using the training data points. For the harmonic oscillator problem above, which is a 2nd order DE, the boundary conditions I want to impose are u(0)=1 and u'(0)=0. Thus we need “enough” training points to impose these. I chose 9 randomly (and for the purpose of comparing the PINN to the NN), but I think it would be enough just to have 2 training points for this problem (as the general solution to the DE has two constants of integration).

The “naive” approach is trained for only 1000 steps, while the PINN is trained for 20,000 steps. Is the computational time per step much less in the case of the PINN? How can you compare these two models that are operating on different orders of magnitude in time?

Hi Alexander, this is a great point, the longer training times are definitely a downside for the PINN for this particular problem. I think one of the main reasons is that I had to use a smaller learning rate when training the PINN (1e-4) compared to the NN (1e-3). Using 1e-3 for the PINN made its training unstable. My hunch is that this is due to the potentially “competing” terms in the PINN loss function (one trying to match the training data, the other trying to satisfy the PDE). Related to this, the convergence of the PINN for this problem was very sensitive to the relative weighting between the two terms in the loss function. See the code for the training details: https://github.com/benmoseley/harmonic-oscillator-pinn. In terms of computational time per training step, the PINN is also slower, because the second order gradients of the network with respect to its inputs also need to be computed at each step.

great effort

How about using reinforcement learning to train the NN? In this case it will be the agent network.

I’m currently applying time-dependent source terms in the PDEs I’m trying to predict with PINN. Does incorporating time-dependent source functions in the PDE I’m trying to predict require special considerations (higher number of epochs? smaller learning rate?… etc.)

Thank you for this great article and wish you the best of luck!