Bayesian Updating
Inverse eigenvalue problem
The inverse eigenvalue problem is a classic engineering example. Here we will use Bayesian updating to sample from a bivariate posterior distribution describing unknown quantities of a matrix
A matrix of this form can represent different problems, like the stiffness matrix describing a tuned mass damper system. In this example we assume the fixed values
The eigenvalues
The "noise" terms
The synthetic "noisy" data used for the Bayesian updating procedure is given in the following table.
| 1.51 | 0.33 |
| 4.01 | 0.30 |
| 3.16 | 0.17 |
| 3.21 | 0.18 |
| 2.19 | 0.32 |
| 1.71 | 0.23 |
| 2.73 | 0.21 |
| 5.51 | 0.20 |
| 1.95 | 0.11 |
| 4.48 | 0.20 |
| 1.43 | 0.16 |
| 2.91 | 0.26 |
| 3.91 | 0.23 |
| 3.58 | 0.25 |
| 2.62 | 0.25 |
The a priori knowledge of
To begin the Bayesian model updating procedure we start by defining the data, the models for the eigenvalues (without the noise term) and the likelihood function.
Y = [
1.51 0.33
4.01 0.3
3.16 0.27
3.21 0.18
2.19 0.33
1.71 0.23
2.73 0.21
5.51 0.2
1.95 0.11
4.48 0.2
1.43 0.16
2.91 0.26
3.81 0.23
3.58 0.25
2.62 0.25
]
λ1 = @. Model(df -> ((df.θ1 + 2 * df.θ2) + sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ1)
λ2 = @. Model(df -> ((df.θ1 + 2 * df.θ2) - sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ2)
σ = [1.0 0.1]
function likelihood(df)
λ = [df.λ1 df.λ2]
return log.(exp.([-0.5 * sum(((Y .- λ[n, :]') ./ σ) .^ 2) for n in axes(λ, 1)]))
endlikelihood (generic function with 1 method)We will solve this problem using the TMCMC algorithm, as well as multi-objective maximum a priori (MAP) and maximum likelihood (ML) estimates. Therefore, the next step is to define the RandomVariable vector of the prior, followed by the objects for the estimaters (TransitionalMarkovChainMonteCarlo). We also have to choose number of samples and burn-in for TMCMC.
prior = RandomVariable.(Uniform(0.01, 4), [:θ1, :θ2])
n = 1000
burnin = 0
x0 = [[1., 1.],[3.,.5],[2.,2.]]
tmcmc = TransitionalMarkovChainMonteCarlo(prior, n, burnin)TransitionalMarkovChainMonteCarlo(RandomVariable{Uniform{Float64}}[RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.01, b=4.0), :θ1), RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.01, b=4.0), :θ2)], 1000, 0, 0.2, true)With the prior, likelihood, models and MCMC sampler defined, the last step is to call the bayesianupdating method.
samples, evidence = bayesianupdating(likelihood, [λ1, λ2], tmcmc)
scatter(samples.θ1, samples.θ2; lim=[0, 4], label="TMCMC", xlabel="θ1", ylabel="θ2") A scatter plot of the resulting samples shows convergence to two distinct regions. Unlike the transitional Markov Chain Monte Carlo algorithm, the standard Metropolis-Hastings algorithm would have only identified one of the two regions.
Inverse eigenvalue problem with maximum likelihood and maximum a posteriori point estimation
The inverse eigenvalue problem can also be solved with point estimation schemes, i.e. maximum likelihood estimate (MLE) and maximum a posteriori (MAP) estimate. Both find the maximum of either only the likelihood (MLE) or the posterior (MAP) using optimization. The main difference in both is that MLE does not use the prior information, it will only give an estimate of the most likely parameter values based on the measurements. MAP on the other hand takes into account the prior distribution and gives a weighted estimate of the most likely parameters. MAP thus can also be seen as regularization of MLE.
We will set up the problem the same way as in the MCMC example.
Y = [
1.51 0.33
4.01 0.3
3.16 0.27
3.21 0.18
2.19 0.33
1.71 0.23
2.73 0.21
5.51 0.2
1.95 0.11
4.48 0.2
1.43 0.16
2.91 0.26
3.81 0.23
3.58 0.25
2.62 0.25
]
λ1 = @. Model(df -> ((df.θ1 + 2 * df.θ2) + sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ1)
λ2 = @. Model(df -> ((df.θ1 + 2 * df.θ2) - sqrt(df.θ1^2 + 4 * df.θ2^2)) / 2, :λ2)
σ = [1.0 0.1]
function likelihood(df)
λ = [df.λ1 df.λ2]
return log.(exp.([-0.5 * sum(((Y .- λ[n, :]') ./ σ) .^ 2) for n in axes(λ, 1)]))
endlikelihood (generic function with 1 method)For MLE and MAP we need to define the prior, however note that in MLE the defined RandomVariable is only used to inform the updating process of which paramters to update. The distribution will not affect the updating, as only the likelihood is taken into account. For the optimization we also need to define starting point(s). Since we know that the problem is multi-modal, we can define multiple starting points to find both modes. We also have to specify the optimization procedure, in this case we will use LBFGS. To illustrate the results of MAP and MLE, we will also solve the problem with TMCMC.
prior = RandomVariable.(Uniform(.1, 10), [:θ1, :θ2])
burnin = 0
n = 1000
x0 = [[1., 1.],[3.,.5]]
tmcmc = TransitionalMarkovChainMonteCarlo(prior, n, burnin)
MAP = MaximumAPosterioriBayesian(prior, x0)
MLE = MaximumLikelihoodBayesian(prior, x0)MaximumLikelihoodBayesian(RandomVariable{Uniform{Float64}}[RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.1, b=10.0), :θ1), RandomVariable{Uniform{Float64}}(Uniform{Float64}(a=0.1, b=10.0), :θ2)], [[1.0, 1.0], [3.0, 0.5]], true, [-Inf], [Inf])With the prior, likelihood, models and MCMC sampler defined, the last step is to call the bayesianupdating method.
samples, evidence = bayesianupdating(likelihood, [λ1, λ2], tmcmc)
MapEstimate = bayesianupdating(likelihood, [λ1, λ2], MAP)
MLEstimate = bayesianupdating(likelihood, [λ1, λ2], MLE)
scatter(samples.θ1, samples.θ2; lim=[0, 4], label="TMCMC", xlabel="θ1", ylabel="θ2")
scatter!((MapEstimate.θ1, MapEstimate.θ2), label="MAP")
scatter!((MLEstimate.θ1, MLEstimate.θ2), label="MLE") A scatter plot of the resulting samples shows convergence to two distinct regions. Since we used a uniform prior distribution, ML and MAP estimates find the same estimates. With a different prior distribution, i.e. a standard normal centered on one of the modes, we obtain a different result:
priorθ1 = RandomVariable(Normal(.5, .5), :θ1)
priorθ2 = RandomVariable(Normal(1.5, .5), :θ2)
prior = [priorθ1, priorθ2]
burnin = 0
n = 1000
x0 = [[1., 1.],[3.,.5]]
tmcmc = TransitionalMarkovChainMonteCarlo(prior, n, burnin)
MAP = MaximumAPosterioriBayesian(prior, x0)
MLE = MaximumLikelihoodBayesian(prior, x0)
samples, evidence = bayesianupdating(likelihood, [λ1, λ2], tmcmc)
MapEstimate = bayesianupdating(likelihood, [λ1, λ2], MAP)
MLEstimate = bayesianupdating(likelihood, [λ1, λ2], MLE)
scatter(samples.θ1, samples.θ2; lim=[0, 4], label="TMCMC", xlabel="θ1", ylabel="θ2")
scatter!((MapEstimate.θ1, MapEstimate.θ2), label="MAP")
scatter!((MLEstimate.θ1, MLEstimate.θ2), label="MLE") Some things to note: Results from MLE are the same as before, since the prior distribution is not taken into account. MCMC does not find the second mode, since it is much less likely than the first one, so the Markov chains do not converge there. MAP does find the mode since it uses optimization and therefore is able to find the local maximum. A look at the relative values between both modes show the differences in probability:
println(exp.(MLEstimate[!,:logMLE]))
println(exp.(MapEstimate[!,:logMAP]))[5.846036389174862e-6, 5.846036389174862e-6]
[3.5711506271809486e-6, 1.0072485673125022e-10]The second mode is 4 orders of magnitude less probable than the first mode, which explains why the Markov chains do not converge there.
Beam Example: Comparison of TMCMC and Transport Maps
This example demonstrates Bayesian model updating for a beam deflection problem using two different approaches: Transitional Markov Chain Monte Carlo (TMCMC) and Transport Maps. Both methods are used to infer unknown parameters of a clamped beam model from noisy displacement measurements.
The clamped beam has a rectangular cross-section with width
The displacement at the end of the beam, denoted as
where
To perform Bayesian updating, a set of 10 displacement measurements corrupted by Gaussian noise with standard deviation
First, we define the beam parameters and the displacement model:
b = 0.2
h = 0.1
L = 10.
E = 210e9
A = b * h
I = b * h^3 / 12
s(a, F) = F .* (a * L) .^ 2 / (6 * E * I) .* (3 * L .- a * L)Next, we define the prior distributions for the unknown parameters. We assume that
prior = [
RandomVariable(Beta(10, 3), :a),
RandomVariable(Normal(1000, 300), :F),
]2-element Vector{RandomVariable{<:UnivariateDistribution}}:
RandomVariable{Beta{Float64}}(Beta{Float64}(α=10.0, β=3.0), :a)
RandomVariable{Normal{Float64}}(Normal{Float64}(μ=1000.0, σ=300.0), :F)The measurement noise standard deviation and the observed data
σ = 0.01
data = [
0.04700676518380301
0.05567472107255563
0.06033689503633009
0.035675890874077895
0.02094933145952007
0.044602949523632154
0.04886405043326749
0.0456339834330763
0.04457204918859585
0.05735639275860812
]We define the forward model Model. The likelihood assumes that the measurement errors are independent and normally distributed:
For better stability, we use the log-likelihood for the updating.
M = Model(df -> s(df.a, df.F), :disp)
Like = df -> sum([logpdf.(Normal(y, σ), df.disp) for y in data])Bayesian Updating with TMCMC
First, we apply the TMCMC algorithm to sample from the posterior distribution. We use these samples to compare with the results obtained with the transport map.
tmcmc = TransitionalMarkovChainMonteCarlo(prior, 1_000, 3)
samples, evidence = bayesianupdating(Like, [M], tmcmc)Bayesian Updating with Transport Maps
Transport Maps provide an alternative approach to Bayesian inference. They construct a deterministic transformation that maps a standard normal distribution to the posterior distribution. This is achieved by learning a triangular map using polynomial basis functions.
We define a polynomial transport map of order 2 with 2 input dimensions. Additionally, we define a quadrature scheme used to compute the KL-divergence, i.e., the target of the optimization. Finally, we store these along with the prior in the TransportMapBayesian object:
T = PolynomialMap(2, 2)
quadrature = GaussHermiteWeights(3, 2)
transportmap = TransportMapBayesian(prior, T, quadrature)TransportMapBayesian
Variable names: [:a, :F],
Map: PolynomialMap(2-dimensional, degree=2, basis=LinearizedHermiteBasis, reference=Normal{Float64}(μ=0.0, σ=1.0), rectifier=Softplus, 9 total coefficients)
Quadrature: GaussHermiteWeights(9 points, 2 dimensions)
Gradient: AutoFiniteDiff{Val{:forward}, Val{:forward}, Val{:hcentral}, Nothing, Nothing, Bool}
Optimizer: Optim.LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Returns{Nothing}}
Log-density: true
Transform prior: trueThe map we defined is optimized by calling the bayesianupdating function. A JointDistribution containing the map is returned.
tm = bayesianupdating(Like, [M], transportmap)JointDistribution{MultivariateDistribution, Symbol}(TransportMap(map=PolynomialMap(2-dimensional, degree=2, basis=LinearizedHermiteBasis, reference=Normal{Float64}(μ=0.0, σ=1.0), rectifier=Softplus, 9 total coefficients), target=MapTargetDensity(backend=AutoFiniteDiff()), names=[:a, :F]), [:a, :F])Once the map coefficients are optimized, we can sample from the transport map posterior and compare the samples to those obtained from TMCMC. The TM-based samples are obtained from sampling in the reference space (i.e., standard normal space) and applying the mapping. In the figure we see a good agreement of the samples obtained with TMCMC and from the transport map.
df = sample(tm, 1000)
scatter(df.a, df.F, alpha=0.5, label="TM Samples")
scatter!(samples.a, samples.F, alpha=0.5, label="TMCMC Samples")
xlabel!("a [-]")
ylabel!("F [N]")
title!("Comparison of TM and TMCMC samples")Further, transport maps provide a formulation of the posterior density in terms of the reference density and the map, as also outlined in Variational Inference with Transport Maps. We can evaluate the density by calling pdf(tm::TransportMap, x::AbstractVector{<:Real}). In the figure below, the TM-approximated pdf is plotted over the samples generated with TMCMC. A good agreement of the samples and the pdf is observed.
x1_grid = range(0.3, 1, 100)
x2_grid = range(0, 2500, 100)
post = [pdf(tm, [x1, x2]) for x2 in x2_grid, x1 in x1_grid]
scatter(samples.a, samples.F, alpha=0.5, label="TMCMC Samples")
contour!(x1_grid, x2_grid, post)
xlabel!("a [-]")
ylabel!("F [N]")
title!("TM-posterior and TMCMC samples")Finally, we can compute the variancediagnostic to assess the quality of the transport map approximation. The diagnostic measures the variance of the log-ratio between the pushforward density (mapping from target to reference density) and the reference density:
A smaller variance indicates a better fit of the transport map.
First, we generate standard normal samples and than pass these to the function along with the TransportMap stored in the JointDistribution, i.e., tm.d.
df = sample(prior, 1000)
to_standard_normal_space!(prior, df) # generate standard normal samples
var_diag = variancediagnostic(tm.d, df)
println("Variance diagnostic: $var_diag")Variance diagnostic: 0.005472863314750117Note
Evaluating variancediagnostic repeatedly evaluates the target density. In Bayesian updating, this corresponds to repeated evaluations of the likelihood and hence Model evaluations.
This page was generated using Literate.jl.