Regression splines tutorial
This tutorial describes the spline basis and smoothing techniques which are based on splines.
using Plots
pyplot()
Here is a simple function and noisy data. The object of this tutorial is to estimate the sinuosoidal curve given the noisy observations.
x = linspace(0, 2*pi, 100)[1:end-1]
y = sin(4*x)
yn = y + randn(size(y)) *.2
plot(x,y)
scatter!(x,yn)
To do this we will use cubic splines. The spline basis with knots at $ \xi_i $ is given by $ (x-\xi_i)^3_i $ . This can be expressed in terms of so-called B-splines by the formula,
The B-splines are a nice basis for the space of cubic splines, because they are smooth at the knots $\xi_i$. They are defined recursively in the following fashion:
This formula, and the corresponding recursive formula for the derivatives of beta splines are implemented by the following pair of functions. B-splines are typically used because they have compact support, which can speed up evaluation. I do not take advantage of this here.
function B(i, k, knots)
e = eps(0.0)
L = knots[end] - knots[1]
# pad the knots vector for periodic case
knots = [knots..., (L+knots[2:2+k]-knots[1])...]
if k == 0
x-> float( knots[i] - e .<= x .< knots[i+1] + e)
else
x->(B(i,k-1, knots)(x) .* (x- knots[i])/(knots[i+k]-knots[i])
+ B(i+1,k-1, knots)(x) .* (knots[i+k+1]-x)/(knots[i+k+1]-knots[i+1]))
end
end
function Bder(i, k, knots, p)
e = eps(0.0)
L = knots[end] - knots[1]
# pad the knots vector for periodic case
knots = [knots..., (L+knots[2:2+k]-knots[1])...]
if p > k
x -> zero(x)
elseif p == 0
x -> B(i,k,knots)(x)
else
x->((Bder(i,k-1, knots,p)(x) .* (x- knots[i]) + Bder(i,k-1, knots,p-1)(x))/(knots[i+k]-knots[i])
+ (Bder(i+1,k-1, knots,p)(x) .* (knots[i+k+1]-x) - Bder(i+1,k-1,knots,p-1)(x))/(knots[i+k+1]-knots[i+1]))
end
end
Bder (generic function with 1 method)
This is what the B-splines for orders 0-3 look like.
knots = collect(0:.1:1.0)
xx = linspace(0,1,200)
plot(xx,[B(4, k, knots)(xx) for k in 0:3])
And these are the first through third derivatives of the 3-rd order B-spline.
plot([Bder(4, 3, knots, p)(xx) for p in 0:2])
Smoothing using splines
Now I return to the original problem, and use a spline basis to smooth the noisy observations.
n = 20
p = 3
knots = collect(-p:n-1)*(2*pi)/n;
sp3basis = hcat([B(i,p,knots)(x) for i in 1:length(knots)]...);
yhat = sp3basis * (sp3basis \ yn);
plot(yhat)
plot!(y)
# scatter!(yn)
While this looks okay, we are not taking advantage of the boundary conditions. For periodic boundaries, we demand that the first and second derivatives be continuous at the boundary.
periodic_constraints = [Bder(i,3,knots, p)(x[1]) - Bder(i,3,knots, p)(x[end]) for p=0:2, i=1:length(knots)]
3×23 Array{Float64,2}:
0.333333 1.33333 0.333333 0.0 … -0.558076 -0.164105 -0.00423442
-3.1831 0.0 3.1831 0.0 -0.0909283 -0.674302 -0.0506727
10.1321 -20.2642 10.1321 0.0 1.42003 -0.40682 -0.202131
This is a projection operator which applies the constraints.
$y = A (I - B B^{\dagger}) c + \epsilon$
P = eye(size(periodic_constraints,2)) - pinv(periodic_constraints)*periodic_constraints;
Indeed the constraints are satisfied for random vectors of cofficents
periodic_constraints*P * rand(n+3)
3-element Array{Float64,1}:
1.70889e-15
1.08728e-15
-2.96689e-15
Let’s perform a least squares fit
A = sp3basis * P
beta = A\yn
yhat = A* beta
yhat = yhat;
The spline does a decent job at fitting the data.
plot(yhat)
plot!(y)
scatter!(yn)