Saddle Surface and Saddle Point
Saddle Surface
马鞍面是一种双曲抛物面(hyperbolic paraboloid),其方程为:
\[z(x,y) = x^2-y^2\label{SaddleSurface}\]是一种特殊的非凸也非凹的曲面 [1],它的图像如下所示:
注:绘图脚本见文末附录A。
马鞍面可以看作是“顶点在曲线上的抛物线挂在抛物线上移动形成的” [2]。例如,固定住上图中上方的抛物线(蓝色曲线:$z=x^2+y^2,\ y=0$),移动下方的抛物线(红色曲线),可以形成一个马鞍面:
注:绘图脚本见文末附录B。
同样地,也可以看作是固定下方的抛物线(红色曲线,$z=x^2+y^2,\ x=0$),移动上方的抛物线(蓝色曲线),形成一个马鞍面:
注:绘图脚本见文末附录C。
以及综合上述两种视角得到的图像:
注:绘图脚本见文末附录D。
Saddle Point
对于这样一个马鞍面,它有一个很特殊的点$(0,0,0)$,被称为鞍点(Saddle point)。鞍点也被称作minimax point(这一称呼在机器学习领域也比较常用);在鞍点处,方程在正交方向的斜率(导数)都是零,即是一个驻点(critical point),但是它并不是函数的局部极值。(… where the slopes (derivatives) in orthogonal directions are all zero (a critical point), but which is not a local extremum of the function)。
鞍点的一个典型的情况是:鞍点的一个例子是当存在一个驻点时,该驻点沿着一个轴向方向(峰值之间)具有相对最小值,并且沿着交叉轴具有相对最大值(An example of a saddle point is when there is a critical point with a relative minimum along one axial direction (between peaks) and at a relative maximum along the crossing axis.)。马鞍面的鞍点就是这样的情况。我们对函数$\eqref{SaddleSurface}$分别对$x$分量和$y$分量求导,可以得到:
\[\begin{split} &z'_x=2x\\ &z'_y=-2y \end{split}\]易分析出(0,0)是一个驻点,并且函数在$x$方向上先减小后增大,$(0,0,0)$是$x$方向上的极小值;而函数在$y$方向上先增大后减小,$(0,0,0)$是$y$方向的极大值。
简单地讲,驻点并不一定是极值点,马鞍面的鞍点就是一个非常典型的例子。
注:极值点也不一定数驻点。例如对于函数$y=\vert x\vert$而言,它的极值点就不是驻点,因为在其极值点处不可导。
从图像上我们可以很容易地看出$(0,0,0)$并不是一个极值点,而且在数学上,我们也可以通过二元函数极值点的充分条件判断出这个它不是极值点 [4], [5]。但是对于利用梯度信息进行优化的优化算法而言,它们是无法判断出这一点的。例如,我们在博客 [6] 中提到过,在深度学习领域,人们通常GD算法来解决损失函数的优化问题;如果损失函数关于权重的函数是类似于一个马鞍面的函数,那么当损失值优化到鞍点时,GD算法会因为梯度消失而认为鞍点处就是最优值(或者是局部最优值)而停止优化。这显然是不合理的。因此,鞍点问题,或者是梯度消失的问题,是神经网络训练中着重关注的一个问题。
Appendix
Appendix A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
clc,clear,close all
figure('Units','pixels','Position',[717.67,170.33,1364.67,953.33])
tiledlayout(2,2,'TileSpacing','compact')
% 3-D view
nexttile
view(3)
helperPlotSaddle
% z-y plane
nexttile
view(90,0)
helperPlotSaddle
% z-x plane
nexttile
view(180,0)
helperPlotSaddle
% y-x plane
nexttile
view(2)
helperPlotSaddle
function helperPlotSaddle
x = -7:0.2:7;
y = -7:0.2:7;
func = @(x,y) x.^2-y.^2;
[X,Y] = meshgrid(x,y);
Z = func(X,Y);
box(gca,"on")
grid(gca,"on")
hold(gca,"on")
LineWidth = 3;
% Plot saddle surface
surf(X,Y,Z,...
'DisplayName','Saddle surface',...
'FaceAlpha',0.5,'EdgeColor','none');
% Plot z = x^2-y^2, x=0
x0 = zeros(1,numel(y));
plot3(x0,y,func(x0,y), ...
'LineWidth',LineWidth,'Color','r', ...
'DisplayName','$z = x^2-y^2, x=0$')
% Plot z = x^2-y^2, y=0
y0 = zeros(1,numel(x));
plot3(x,y0,func(x,y0), ...
'LineWidth',LineWidth,'Color','b',...
'DisplayName','$z = x^2-y^2, y=0$')
% Plog saddle point
scatter3(0,0,func(0,0),200, ...
'filled','MarkerFaceColor','k',...
'DisplayName','Saddle point','Marker','hexagram');
legend('Interpreter','latex','Location','best')
xlim([-7,7])
ylim([-7,7])
colorbar
set(gca,'FontSize',13)
title("Saddle surface")
xlabel("$x$","Interpreter","latex")
ylabel("$y$","Interpreter","latex")
zlabel("$z$","Interpreter","latex")
end
Appendix B
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
clc,clear,close all
figure('Units','pixels','Position',[717.67,170.33,1364.67,953.33])
tiledlayout(2,2,'TileSpacing','compact')
% 3-D view
nexttile
view(3)
helperPlotSaddle
% z-y plane
nexttile
view(90,0)
helperPlotSaddle
% z-x plane
nexttile
view(180,0)
helperPlotSaddle
% y-x plane
nexttile
view(2)
helperPlotSaddle
function helperPlotSaddle
x = -7:0.2:7;
y = -7:0.2:7;
func = @(x,y) x.^2-y.^2;
[X,Y] = meshgrid(x,y);
Z = func(X,Y);
box(gca,"on")
grid(gca,"on")
hold(gca,"on")
LineWidth = 1.5;
% Plot saddle surface
surf(X,Y,Z,...
'DisplayName','Saddle surface',...
'FaceAlpha',0.5,'EdgeColor','none');
% Plot z = x^2-y^2, y=0
y0 = zeros(1,numel(x));
plot3(x,y0,func(x,y0), ...
'LineWidth',LineWidth,'Color','b',...
'DisplayName','$z = x^2-y^2, y=0$')
for i = -7:1:7
x0 = i*ones(1,numel(y));
plot3(x0,y,func(x0,y), ...
'LineWidth',LineWidth,'Color','r',...
"handlevisibility", 'off')
end
% Plog saddle point
scatter3(0,0,func(0,0),200, ...
'filled','MarkerFaceColor','k',...
'DisplayName','Saddle point','Marker','hexagram');
legend('Interpreter','latex','Location','best')
xlim([-7,7])
ylim([-7,7])
colorbar
set(gca,'FontSize',13)
title("Saddle surface")
xlabel("$x$","Interpreter","latex")
ylabel("$y$","Interpreter","latex")
zlabel("$z$","Interpreter","latex")
end
Appendix C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
clc,clear,close all
figure('Units','pixels','Position',[717.67,170.33,1364.67,953.33])
tiledlayout(2,2,'TileSpacing','compact')
% 3-D view
nexttile
view(3)
helperPlotSaddle
% z-y plane
nexttile
view(90,0)
helperPlotSaddle
% z-x plane
nexttile
view(180,0)
helperPlotSaddle
% y-x plane
nexttile
view(2)
helperPlotSaddle
function helperPlotSaddle
x = -7:0.2:7;
y = -7:0.2:7;
func = @(x,y) x.^2-y.^2;
[X,Y] = meshgrid(x,y);
Z = func(X,Y);
box(gca,"on")
grid(gca,"on")
hold(gca,"on")
LineWidth = 1.5;
% Plot saddle surface
surf(X,Y,Z,...
'DisplayName','Saddle surface',...
'FaceAlpha',0.5,'EdgeColor','none');
% Plot z = x^2-y^2, x=0
x0 = zeros(1,numel(y));
plot3(x0,y,func(x0,y), ...
'LineWidth',LineWidth,'Color','r',...
'DisplayName','$z = x^2-y^2, x=0$')
for i = -7:1:7
y0 = i*ones(1,numel(x));
plot3(x,y0,func(x,y0), ...
'LineWidth',LineWidth,'Color','b',...
"handlevisibility", 'off')
end
% Plog saddle point
scatter3(0,0,func(0,0),200, ...
'filled','MarkerFaceColor','k',...
'DisplayName','Saddle point','Marker','hexagram');
legend('Interpreter','latex','Location','best')
xlim([-7,7])
ylim([-7,7])
colorbar
set(gca,'FontSize',13)
title("Saddle surface")
xlabel("$x$","Interpreter","latex")
ylabel("$y$","Interpreter","latex")
zlabel("$z$","Interpreter","latex")
end
Appendix D
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
clc,clear,close all
figure('Units','pixels','Position',[717.67,170.33,1364.67,953.33])
tiledlayout(2,2,'TileSpacing','compact')
% 3-D view
nexttile
view(3)
helperPlotSaddle
% z-y plane
nexttile
view(90,0)
helperPlotSaddle
% z-x plane
nexttile
view(180,0)
helperPlotSaddle
% y-x plane
nexttile
view(2)
helperPlotSaddle
function helperPlotSaddle
x = -7:0.2:7;
y = -7:0.2:7;
func = @(x,y) x.^2-y.^2;
[X,Y] = meshgrid(x,y);
Z = func(X,Y);
box(gca,"on")
grid(gca,"on")
hold(gca,"on")
LineWidth = 1.5;
% Plot saddle surface
surf(X,Y,Z,...
'DisplayName','Saddle surface',...
'FaceAlpha',0.5,'EdgeColor','none');
for i = -7:1:7
x0 = i*ones(1,numel(y));
plot3(x0,y,func(x0,y), ...
'LineWidth',LineWidth,'Color','r',...
"handlevisibility", 'off')
end
for i = -7:1:7
y0 = i*ones(1,numel(x));
plot3(x,y0,func(x,y0), ...
'LineWidth',LineWidth,'Color','b',...
"handlevisibility", 'off')
end
% Plog saddle point
scatter3(0,0,func(0,0),200, ...
'filled','MarkerFaceColor','k',...
'DisplayName','Saddle point','Marker','hexagram');
legend('Interpreter','latex','Location','best')
xlim([-7,7])
ylim([-7,7])
colorbar
set(gca,'FontSize',13)
title("Saddle surface")
xlabel("$x$","Interpreter","latex")
ylabel("$y$","Interpreter","latex")
zlabel("$z$","Interpreter","latex")
end
References