Skip to content

Bayesian Updating

Inverse eigenvalue problem

The inverse eigenvalue problem is a classic engineering example. Here we will use Bayesian updating to sample from a bivariate posterior distribution describing unknown quantities of a matrix

[θ1+θ2θ2θ2θ2]

A matrix of this form can represent different problems, like the stiffness matrix describing a tuned mass damper system. In this example we assume the fixed values θ1=0.5 and θ2=1.5 for the variables.

The eigenvalues λ1 and λ2 of this matrix represent a physical measurable property corrupted by "noise" created for example due to environmental factors or measurement inaccuracy.

λ1noisy=(θ1+2θ2)+θ12+4θ222+ϵ1λ2noisy=(θ1+2θ2)θ12+4θ222+ϵ2

The "noise" terms ϵ1 and ϵ2 follow a Normal distribution with zero mean and standard deviations σ1=1.0 and σ2=0.1.

The synthetic "noisy" data used for the Bayesian updating procedure is given in the following table.

λ1λ2
1.510.33
4.010.30
3.160.17
3.210.18
2.190.32
1.710.23
2.730.21
5.510.20
1.950.11
4.480.20
1.430.16
2.910.26
3.910.23
3.580.25
2.620.25

The a priori knowledge of θ1 and θ2 is that they take values between 0.01 and 4. The likelihood function used for this problem is a bivariate Gaussian function with a covariance matrix [σ1200σ22], with off-diagonal terms equal to 0 and the diagonal terms corresponding to the variances of the respective noise terms.

P(λ|θ)exp[12i=12n=115(λi,ndataλimodelσi)2]

To begin the Bayesian model updating procedure we start by defining the data, the models for the eigenvalues (without the noise term) and the likelihood function.

julia
Y = [
    1.51 0.33
    4.01 0.3
    3.16 0.27
    3.21 0.18
    2.19 0.33
    1.71 0.23
    2.73 0.21
    5.51 0.2
    1.95 0.11
    4.48 0.2
    1.43 0.16
    2.91 0.26
    3.81 0.23
    3.58 0.25
    2.62 0.25
]

λ1 = @. Model(df -> ((df.θ1 + 2 * df.θ2) + sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ1)
λ2 = @. Model(df -> ((df.θ1 + 2 * df.θ2) - sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ2)

σ = [1.0 0.1]
function likelihood(df)
    λ = [df.λ1 df.λ2]

    return log.(exp.([-0.5 * sum(((Y .- λ[n, :]') ./ σ) .^ 2) for n in axes(λ, 1)]))
end
likelihood (generic function with 1 method)

We will solve this problem using the TMCMC algorithm, as well as multi-objective maximum a priori (MAP) and maximum likelihood (ML) estimates. Therefore, the next step is to define the RandomVariable vector of the prior, followed by the objects for the estimaters (TransitionalMarkovChainMonteCarlo). We also have to choose number of samples and burn-in for TMCMC.

julia
prior = RandomVariable.(Uniform(0.01, 4), [:θ1, :θ2])

n = 1000
burnin = 0

x0 = [[1., 1.],[3.,.5],[2.,2.]]

tmcmc = TransitionalMarkovChainMonteCarlo(prior, n, burnin)
TransitionalMarkovChainMonteCarlo(RandomVariable{Uniform{Float64}}[RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.01, b=4.0), :θ1), RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.01, b=4.0), :θ2)], 1000, 0, 0.2, true)

With the prior, likelihood, models and MCMC sampler defined, the last step is to call the bayesianupdating method.

julia
samples, evidence = bayesianupdating(likelihood, [λ1, λ2], tmcmc)

scatter(samples.θ1, samples.θ2; lim=[0, 4], label="TMCMC", xlabel="θ1", ylabel="θ2")

A scatter plot of the resulting samples shows convergence to two distinct regions. Unlike the transitional Markov Chain Monte Carlo algorithm, the standard Metropolis-Hastings algorithm would have only identified one of the two regions.

Inverse eigenvalue problem with maximum likelihood and maximum a posteriori point estimation

The inverse eigenvalue problem can also be solved with point estimation schemes, i.e. maximum likelihood estimate (MLE) and maximum a posteriori (MAP) estimate. Both find the maximum of either only the likelihood (MLE) or the posterior (MAP) using optimization. The main difference in both is that MLE does not use the prior information, it will only give an estimate of the most likely parameter values based on the measurements. MAP on the other hand takes into account the prior distribution and gives a weighted estimate of the most likely parameters. MAP thus can also be seen as regularization of MLE.

We will set up the problem the same way as in the MCMC example.

julia
Y = [
    1.51 0.33
    4.01 0.3
    3.16 0.27
    3.21 0.18
    2.19 0.33
    1.71 0.23
    2.73 0.21
    5.51 0.2
    1.95 0.11
    4.48 0.2
    1.43 0.16
    2.91 0.26
    3.81 0.23
    3.58 0.25
    2.62 0.25
]

λ1 = @. Model(df -> ((df.θ1 + 2 * df.θ2) + sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ1)
λ2 = @. Model(df -> ((df.θ1 + 2 * df.θ2) - sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ2)

σ = [1.0 0.1]

function likelihood(df)
    λ = [df.λ1 df.λ2]

    return log.(exp.([-0.5 * sum(((Y .- λ[n, :]') ./ σ) .^ 2) for n in axes(λ, 1)]))
end
likelihood (generic function with 1 method)

For MLE and MAP we need to define the prior, however note that in MLE the defined RandomVariable is only used to inform the updating process of which paramters to update. The distribution will not affect the updating, as only the likelihood is taken into account. For the optimization we also need to define starting point(s). Since we know that the problem is multi-modal, we can define multiple starting points to find both modes. We also have to specify the optimization procedure, in this case we will use LBFGS. To illustrate the results of MAP and MLE, we will also solve the problem with TMCMC.

julia
prior = RandomVariable.(Uniform(.1, 10), [:θ1, :θ2])

burnin = 0
n = 1000

x0 = [[1., 1.],[3.,.5]]

tmcmc = TransitionalMarkovChainMonteCarlo(prior, n, burnin)
MAP = MaximumAPosterioriBayesian(prior, x0)
MLE = MaximumLikelihoodBayesian(prior, x0)
MaximumLikelihoodBayesian(RandomVariable{Uniform{Float64}}[RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.1, b=10.0), :θ1), RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.1, b=10.0), :θ2)], [[1.0, 1.0], [3.0, 0.5]], true, [-Inf], [Inf])

With the prior, likelihood, models and MCMC sampler defined, the last step is to call the bayesianupdating method.

julia
samples, evidence = bayesianupdating(likelihood, [λ1, λ2], tmcmc)
MapEstimate = bayesianupdating(likelihood, [λ1, λ2], MAP)
MLEstimate = bayesianupdating(likelihood, [λ1, λ2], MLE)

scatter(samples.θ1, samples.θ2; lim=[0, 4], label="TMCMC", xlabel="θ1", ylabel="θ2")
scatter!((MapEstimate.θ1, MapEstimate.θ2), label="MAP")
scatter!((MLEstimate.θ1, MLEstimate.θ2), label="MLE")

A scatter plot of the resulting samples shows convergence to two distinct regions. Since we used a uniform prior distribution, ML and MAP estimates find the same estimates. With a different prior distribution, i.e. a standard normal centered on one of the modes, we obtain a different result:

julia
priorθ1 = RandomVariable(Normal(.5, .5), :θ1)
priorθ2 = RandomVariable(Normal(1.5, .5), :θ2)

prior = [priorθ1, priorθ2]

burnin = 0
n = 1000

x0 = [[1., 1.],[3.,.5]]

tmcmc = TransitionalMarkovChainMonteCarlo(prior, n, burnin)
MAP = MaximumAPosterioriBayesian(prior, x0)
MLE = MaximumLikelihoodBayesian(prior, x0)

samples, evidence = bayesianupdating(likelihood, [λ1, λ2], tmcmc)
MapEstimate = bayesianupdating(likelihood, [λ1, λ2], MAP)
MLEstimate = bayesianupdating(likelihood, [λ1, λ2], MLE)

scatter(samples.θ1, samples.θ2; lim=[0, 4], label="TMCMC", xlabel="θ1", ylabel="θ2")
scatter!((MapEstimate.θ1, MapEstimate.θ2), label="MAP")
scatter!((MLEstimate.θ1, MLEstimate.θ2), label="MLE")

Some things to note: Results from MLE are the same as before, since the prior distribution is not taken into account. MCMC does not find the second mode, since it is much less likely than the first one, so the Markov chains do not converge there. MAP does find the mode since it uses optimization and therefore is able to find the local maximum. A look at the relative values between both modes show the differences in probability:

julia
println(exp.(MLEstimate[!,:logMLE]))
println(exp.(MapEstimate[!,:logMAP]))
[5.846036389174841e-6, 5.84603638917482e-6]
[3.5711506271809677e-6, 1.0072485673125058e-10]

The second mode is 4 orders of magnitude less probable than the first mode, which explains why the Markov chains do not converge there.

Beam Example: Comparison of TMCMC and Transport Maps

This example demonstrates Bayesian model updating for a beam deflection problem using two different approaches: Transitional Markov Chain Monte Carlo (TMCMC) and Transport Maps. Both methods are used to infer unknown parameters of a clamped beam model from noisy displacement measurements.

The clamped beam has a rectangular cross-section with width b=0.2 m and height h=0.1 m, and length L=10 m. The Young's modulus is E=210 GPa. A load F is applied at the position aL along the beam, as seen in the figure below. This example is based on an [42].

The displacement at the end of the beam, denoted as s, is obtained as a function of a and F through the following analytical expression:

M(θ):=s=F(aL)26EI(3LaL)

where I=bh312 is moment of inertia of the cross section and θ=[a,F].

To perform Bayesian updating, a set of 10 displacement measurements corrupted by Gaussian noise with standard deviation σ=0.01 m are given. The goal is to infer the unknown parameters a (relative position along the beam) and F (applied force) from these measurements using Bayesian inference.

First, we define the beam parameters and the displacement model:

julia
b = 0.2
h = 0.1
L = 10.
E = 210e9

A = b * h
I = b * h^3 / 12

s(a, F) = F .* (a * L) .^ 2 / (6 * E * I) .* (3 * L .- a * L)

Next, we define the prior distributions for the unknown parameters. We assume that a follows a Beta distribution and F follows a Normal distribution centered at 1000 N with standard deviation 300 N.

julia
prior = [
    RandomVariable(Beta(10, 3), :a),
    RandomVariable(Normal(1000, 300), :F),
]
2-element Vector{RandomVariable{<:UnivariateDistribution}}:
 RandomVariable{Beta{Float64}}(Beta{Float64}(α=10.0, β=3.0), :a)
 RandomVariable{Normal{Float64}}(Normal{Float64}(μ=1000.0, σ=300.0), :F)

The measurement noise standard deviation and the observed data D are:

julia
σ = 0.01

data = [
    0.04700676518380301
    0.05567472107255563
    0.06033689503633009
    0.035675890874077895
    0.02094933145952007
    0.044602949523632154
    0.04886405043326749
    0.0456339834330763
    0.04457204918859585
    0.05735639275860812
]

We define the forward model M(θ) as a Model. The likelihood assumes that the measurement errors are independent and normally distributed:

P(D|θ)=i=11012πσ2exp[12(M(θ)Diσ)2].

For better stability, we use the log-likelihood for the updating.

julia
M = Model(df -> s(df.a, df.F), :disp)
Like = df -> sum([logpdf.(Normal(y, σ), df.disp) for y in data])

Bayesian Updating with TMCMC

First, we apply the TMCMC algorithm to sample from the posterior distribution. We use these samples to compare with the results obtained with the transport map.

julia
tmcmc = TransitionalMarkovChainMonteCarlo(prior, 1_000, 3)
samples, evidence = bayesianupdating(Like, [M], tmcmc)

Bayesian Updating with Transport Maps

Transport Maps provide an alternative approach to Bayesian inference. They construct a deterministic transformation that maps a standard normal distribution to the posterior distribution. This is achieved by learning a triangular map using polynomial basis functions.

We define a polynomial transport map of order 2 with 2 input dimensions. Additionally, we define a quadrature scheme used to compute the KL-divergence, i.e., the target of the optimization. Finally, we store these along with the prior in the TransportMapBayesian object:

julia
T = PolynomialMap(2, 2)
quadrature = GaussHermiteWeights(3, 2)
transportmap = TransportMapBayesian(prior, T, quadrature)
TransportMapBayesian
  Variable names: [:a, :F], 
  Map: PolynomialMap(2-dimensional, degree=2, basis=LinearizedHermiteBasis, reference=Normal{Float64}(μ=0.0, σ=1.0), rectifier=Softplus, 9 total coefficients)
  Quadrature: GaussHermiteWeights(9 points, 2 dimensions)
  Gradient: AutoFiniteDiff{Val{:forward}, Val{:forward}, Val{:hcentral}, Nothing, Nothing, Bool}
  Optimizer: Optim.LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Returns{Nothing}}
  Log-density: true
  Transform prior: true

The map we defined is optimized by calling the bayesianupdating function. A JointDistribution containing the map is returned.

julia
tm = bayesianupdating(Like, [M], transportmap)
JointDistribution{MultivariateDistribution, Symbol}(TransportMap(map=PolynomialMap(2-dimensional, degree=2, basis=LinearizedHermiteBasis, reference=Normal{Float64}(μ=0.0, σ=1.0), rectifier=Softplus, 9 total coefficients), target=MapTargetDensity(backend=AutoFiniteDiff()), names=[:a, :F]), [:a, :F])

Once the map coefficients are optimized, we can sample from the transport map posterior and compare the samples to those obtained from TMCMC. The TM-based samples are obtained from sampling in the reference space (i.e., standard normal space) and applying the mapping. In the figure we see a good agreement of the samples obtained with TMCMC and from the transport map.

julia
df = sample(tm, 1000)
scatter(df.a, df.F, alpha=0.5, label="TM Samples")
scatter!(samples.a, samples.F, alpha=0.5, label="TMCMC Samples")
xlabel!("a [-]")
ylabel!("F [N]")
title!("Comparison of TM and TMCMC samples")

Further, transport maps provide a formulation of the posterior density in terms of the reference density and the map, as also outlined in Variational Inference with Transport Maps. We can evaluate the density by calling pdf(tm::TransportMap, x::AbstractVector{<:Real}). In the figure below, the TM-approximated pdf is plotted over the samples generated with TMCMC. A good agreement of the samples and the pdf is observed.

julia
x1_grid = range(0.3, 1, 100)
x2_grid = range(0, 2500, 100)

post = [pdf(tm, [x1, x2]) for x2 in x2_grid, x1 in x1_grid]
scatter(samples.a, samples.F, alpha=0.5, label="TMCMC Samples")
contour!(x1_grid, x2_grid, post)
xlabel!("a [-]")
ylabel!("F [N]")
title!("TM-posterior and TMCMC samples")

Finally, we can compute the variancediagnostic to assess the quality of the transport map approximation. The diagnostic measures the variance of the log-ratio between the pushforward density (mapping from target to reference density) and the reference density:

εσ=12Var[logπ(T(z))+log|detT(z)|ρ(z)].

A smaller variance indicates a better fit of the transport map.

First, we generate standard normal samples and than pass these to the function along with the TransportMap stored in the JointDistribution, i.e., tm.d.

julia
df = sample(prior, 1000)
to_standard_normal_space!(prior, df) # generate standard normal samples
var_diag = variancediagnostic(tm.d, df)
println("Variance diagnostic: $var_diag")
Variance diagnostic: 0.005472865910032664

Note

Evaluating variancediagnostic repeatedly evaluates the target density. In Bayesian updating, this corresponds to repeated evaluations of the likelihood and hence Model evaluations.


This page was generated using Literate.jl.