Condition Monitoring and Prognostics based on Vibration Signals in MATLAB

Sep. 21, 2022

Introduction

本博客中的示例来自MATLAB Predictive Maintenance Toolbox,Condition Monitoring and Prognostics Using Vibration Signals1。该示例使用振动信号的频域特征(mean peak frequency)进行轴承的故障诊断和预测,使用的模型是一个时间序列二阶模型,本质上也是拟合参数的问题,即不断使用新数据进行参数估计,估计完模型参数后进行预测。但实际上,这个例子更像是系统辨识的例子:

This example uses functionality from Signal Processing Toolbox™ and System Identification Toolbox™, and does not require Predictive Maintenance Toolbox™.1

另外需要强调的是,该示例使用的训练数据也是从健康状态到故障状态全过程的数据。


Work flow

Data visualization

pdmBearingConditionMonitoringData.mat文件中的数据内容:

1
2
3
4
Name                  Size               Bytes  Class     Attributes
  data                600x1             96062400  cell                
  defectDepthVec      600x1                 4800  double              
  expTime             600x1                 4800  double

其中,

  • 变量data中包含着多个不同health conditions的震动信号片段,每个元素对应1秒的数据,采样率为20kHz;
  • 变量defectDepthVec为相应的故障程度,为逐渐增大的数值;
  • 变量expTime为对应的时间,单位是minutes。

绘制故障程度随时间的变化曲线:

1
2
3
plot(expTime, defectDepthVec);
xlabel('Time (min)');
ylabel('Defect depth (m)');

image-20220918154729742

绘制健康状态的震动信号(第一条信号)和故障状态的震动信号(最后一条信号):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
time = linspace(0, 1, fs)';

% healthy bearing signal
subplot(2, 1, 1);
plot(time,data{1});
xlabel('Time(s)');
ylabel('Acceleration(m/s^2)');
legend('Healthy bearing signal');

% faulty bearing signal
subplot(2, 1, 2);
plot(time,data{end});
xlabel('Time(s)');
ylabel('Acceleration(m/s^2)');
legend('Faulty bearing signal');

image-20220918155029014


Feature extraction

用于轴承诊断的典型特征包括时域特征(方均根值、峰值、信号峭度等等)或者频域特征(峰频率,平均频率等等)。

在数据探索阶段,可以绘制震动信号的谱,看一看震动信号在时域、频域或者时频域的特点,这有利于探索能够反映故障发展的特征。

首先,计算健康数据的频谱,窗口长度为500,重叠度为90%(即450各数据点),并且设置FFT的点数为512。

1
[~, fvec, tvec ,P0] = spectrogram(data{1}, 500, 450, 512, fs);

其中,P0为频谱,fvec为频率向量,tvec为时间向量。

绘制轴承健康状态下的震动信号的频谱图:

1
2
3
4
5
6
7
clf;
imagesc(tvec, fvec, P0)
xlabel('Time(s)');
ylabel('Frequency(Hz)');
title('Healthy Bearing Signal Spectrogram');
colorbar
axis xy

image-20220918160505064

之后,计算并绘制轴承故障状态下震动信号的频谱图:

1
2
3
4
5
6
7
[~, fvec, tvec, Pfinal] = spectrogram(data{end}, 500, 450, 512, fs);
imagesc(tvec, fvec, Pfinal)
xlabel('Time(s)');
ylabel('Frequency(Hz)');
title('Faulty Bearing Signal Spectrogram');
colorbar
axis xy

image-20220918160655675

可以看到,此时的信号能量集中的更高的频率。

因此,在健康状态下和故障状态下震动信号的频谱是不同的,可以用于故障诊断。在本例中,从频谱中提取mean peak frequencies作为health indicator。

设频谱为$P(t,\omega)$,peak frequency可以表示为:

\[PeakFreq(t)=\arg\max_\omega P(t,\omega)\notag\]

则mean peak frequency为peak frequency的均值:

\[meanPeakFreq=\dfrac1T\int_0^TPeakFreq(t)\mathrm{d}t\notag\]

计算健康震动信号的mean peak frequency:

1
2
[~, I0] = max(P0);               % Find out where the peak frequencies are located.
meanPeakFreq0 = mean(fvec(I0))  % Calculate mean peak frequency.
1
2
meanPeakFreq0 =
  666.4602

之后计算故障震动信号的mean peak frequency:

1
2
[~, Ifinal] = max(Pfinal);
meanPeakFreqFinal = mean(fvec(Ifinal))
1
2
meanPeakFreqFinal =
   2.8068e+03

可以看到mean peak frequency从大概650Hz增加到了将近3kHz。

之后,可以再探索一下中间阶段震动信号的mean peak frequency:

1
2
3
4
5
6
7
[~, fvec, tvec, Pmiddle] = spectrogram(data{end/2}, 500, 450, 512, fs);
imagesc(tvec, fvec, Pmiddle)
xlabel('Time(s)');
ylabel('Frequency(Hz)');
title('Bearing Signal Spectrogram');
colorbar
axis xy

image-20220918161757316

可以看到,图像中布满了高频噪声。这个现象是由原始震动和由小缺陷引起的震动所共同引起的。为了计算mean peak frequency,需要将数据进行滤波处理,移除这些高频噪声,再次进行可视化:

1
2
3
4
5
6
7
8
9
dataMiddleFilt = medfilt1(data{end/2}, 3);

[~, fvec, tvec, Pmiddle] = spectrogram(dataMiddleFilt, 500, 450, 512, fs);
imagesc(tvec, fvec, Pmiddle)
xlabel('Time(s)');
ylabel('Frequency(Hz)');
title('Filtered Bearing Signal Spectrogram');
colorbar
axis xy

image-20220918162333303

之后,对所有的震动信号进行相同的特征提取步骤:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
% Define a progress bar.
h = waitbar(0, 'Start to extract features');
% Initialize a vector to store the extracted mean peak frequencies.
meanPeakFreq = zeros(numSamples,1);
for k = 1:numSamples
    % Get most up-to-date data.
    curData = data{k};
    % Apply median filter.
    curDataFilt = medfilt1(curData,3);
    % Calculate spectrogram.
    [~,fvec,tvec,P_k] = spectrogram(curDataFilt,500,450,512,fs);
    % Calculate peak frequency at each time instance.
    [~,I] = max(P_k);
    meanPeakFreq(k) = mean(fvec(I));
    % Show progress bar indicating how many samples have been processed.
    waitbar(k/numSamples,h,'Extracting features');
end
close(h);

并绘制mean peak frequencies随时间变化的趋势:

1
2
3
4
plot(expTime, meanPeakFreq);
xlabel('Time(min)');
ylabel('Mean peak frequency (Hz)');
grid on;

image-20220918162622715

Condition monitoring and prognostics

把mean peak frequency当作time series,我们可以估计一个关于mean peak frequency的时间序列模型,使用这个模型预测未来值。使用前200个mean peak frequency值创建一个初始时间序列模型,之后一旦有有了10个新值,就使用后100个值更新时间序列模型。这种更新时间序列模型的batch mode可以捕捉instantaneous trends。更新的时间序列模型用于预测后面10个步长的数据。

1
2
3
4
tStart = 200;               % Start Time
timeSeg = 100;              % Length of data for building dynamic model
forecastLen = 10;           % Define forecast time horizon
batchSize = 10;             % Define batch size for updating the dynamic model

为了进行预测性维修,我们需要一个阈值来决定何时停掉机器。在这个例子中,我们基于仿真得到的轴承健康情况和故障情况的统计数据(保存在pdmBearingConditionMonitoringStatistics.mat文件中)得到这个阈值。

pdmBearingConditionMonitoringStatistics.mat文件的数据内容如下所示:

1
2
3
4
5
6
7
Name               Size            Bytes  Class     Attributes

  dataFaulty       100x1               800  double              
  dataNormal       100x1               800  double              
  pFaulty         3001x1             24008  double              
  pFreq           3001x1             24008  double              
  pNormal         3001x1             24008  double

绘制健康轴承和故障轴承震动信号的mean peak frequency的概率分布:

1
2
3
4
plot(pFreq, pNormal, 'g--', pFreq, pFaulty, 'r', LineWidth=1.5);
xlabel('Mean peak frequency(Hz)');
ylabel('Probability Distribution');
legend('Normal Bearing','Faulty Bearing');

image-20220918223647977

基于图像,我们将mean peak frequency的阈值设置为2000 Hz,以分辨轴承的正常状态和故障状态,同时最大化轴承的使用。

1
threshold = 2000;

之后,计算采样间隔,并将其单位转换为秒(原本的单位是分钟)。

1
samplingTime = 60*(expTime(2)-expTime(1));                  % unit: seconds

之后使用iddata函数将meanPeakFreqtStart(200个)数据转换为iddata类型的变量tsFeature

1
tsFeature = iddata(meanPeakFreq(1:tStart), [], samplingTime);

iddata函数2将数据转换为可以在时域和频域内进行system identification(系统辨识)的数据类型。

之后绘制出这200个initial mean peak frequency data:

1
plot(tsFeature.y)

image-20220919091520424

图像表明,初始数据大致是常数加上一定的噪声。这是预期内的,因为在初始时刻轴承是健康的,mean peak frequency不会大幅度地变化。

然后,使用前200个数据去拟合一个second-order state-space model:

1
past_sys = ssest(tsFeature, 2, 'Ts', samplingTime, 'Form', 'canonical')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
past_sys =
  Discrete-time identified state-space model:
    x(t+Ts) = A x(t) + K e(t)
       y(t) = C x(t) + e(t)
 
  A = 
            x1       x2
   x1        0        1
   x2   0.9605  0.03942
 
  C = 
       x1  x2
   y1   1   0
 
  K = 
              y1
   x1  -0.003899
   x2   0.003656
 
Sample time: 600 seconds
  
Parameterization:
   CANONICAL form with indices: 2.
   Disturbance component: estimate
   Number of free coefficients: 4
   Use "idssdata", "getpvec", "getcov" for parameters and their uncertainties.
   
Status:                                               
Estimated using SSEST on time domain data "tsFeature".
Fit to estimation data: 0.2763% (prediction focus)    
FPE: 640, MSE: 602.7  

ssest函数的官方文档见参考3

可以看到,这个initial estimated dynamic model的拟合效果不好。拟合效果的评判指标是normalized root mean square error (NRMSE)(即上面的Fit to estimation data: 0.2763%(prediction focus),计算公式为:

\[NRMSE=1-\dfrac{\vert\vert x_{true}-x_{pred}\vert\vert}{\vert\vert x_{true}-mean(x_{true})\vert\vert}\notag\]

这个数值越大越好。而初始的前200个数据点大致是常数和噪声的叠加,此时$x_{pred}\approx mean(x_{true})$,此时NRMSE接近于0。

为了验证这个模型,绘制autocorrelation of residuals:

1
resid(tsFeature, past_sys)

image-20220919094941951

图像表明,残差是不相关的,得到的模型是有效的。

之后,使用该模型past_sys预测未来10个点并计算标准差:

1
[yF, ~, ~, yFSD] = forecast(past_sys, tsFeature, forecastLen);

并绘制图像(过去值,预测值,故障阈值,95%置信区间):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
tHistory = expTime(1:tStart); % The initial 200 time stamps
forecastTimeIdx = (tStart+1):(tStart+forecastLen); 
tForecast = expTime(forecastTimeIdx);% 200~210 time stamps

% Plot historical data, forecast value and 95% confidence interval.
plot(tHistory, meanPeakFreq(1:tStart), 'b',...
     tForecast, yF.OutputData, 'kx',...
     [tHistory; tForecast], threshold*ones(1,length(tHistory)+forecastLen), 'r--',...
     tForecast, yF.OutputData+1.96*yFSD, 'g--',...
     tForecast, yF.OutputData-1.96*yFSD, 'g--');

ylim([400, 1.1*threshold]);
ylabel('Mean peak frequency (Hz)');
xlabel('Time (min)');
legend({'Past Data', 'Forecast', 'Failure Threshold', '95% C.I'},...
    'Location', 'northoutside', 'Orientation', 'horizontal');
grid on

gif1

注:这里在计算95%置信区间时使用的方式是直接乘上1.96,这是查正态分布表得到的。这里默认的区间估计的对象是什么?具体是如何查找的?目前还不是很清楚。

之后,随着出现新的观测点(10个)而更新模型参数,并且重新估计预测值,并且创建一个alarm,如果信号或者预测值超过了故障阈值(2000 Hz)就报警:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
figure
ax = subplot(1, 1, 1);

for tCur = tStart:batchSize:numSamples % batchSize = 10;

    %  latest features into iddata object.
    tsFeature = iddata(meanPeakFreq((tCur-timeSeg+1):tCur), [], samplingTime);

    % Update system parameters when new data comes in. Use previous model
    % parameters as initial guesses.
    sys = ssest(tsFeature, past_sys);
    past_sys = sys;

    % Forecast the output of the updated state-space model. Also compute
    % the standard deviation of the forecasted output.
    [yF, ~, ~, yFSD]  = forecast(sys, tsFeature, forecastLen);

    % Find the time corresponding to historical data and forecasted values.
    tHistory = expTime(1:tCur);
    forecastTimeIdx = (tCur+1):(tCur+forecastLen);
    tForecast = expTime(forecastTimeIdx);

    cla(ax)
    hold(ax, 'on')
    % Plot historical data, forecasted mean peak frequency value and 95%
    % confidence interval.
    plot(ax, tHistory, meanPeakFreq(1:tCur),'b',...
        tForecast, yF.OutputData,'kx',...
        [tHistory; tForecast], threshold*ones(1,length(tHistory)+forecastLen), 'r--',...
        tForecast, yF.OutputData+1.96*yFSD,'g--',...
        tForecast, yF.OutputData-1.96*yFSD,'g--');

    ylim([400, 1.1*threshold]);
    ylabel('Mean peak frequency (Hz)');
    xlabel('Time (min)');
    legend({'Past Data', 'Forecast', 'Failure Threshold', '95% C.I'},...
        'Location','northoutside','Orientation','horizontal');
    grid on;

    % Display an alarm when actual monitored variables or forecasted values exceed
    % failure threshold.
    if(any(meanPeakFreq(tCur-batchSize+1:tCur)>threshold))
        disp('Monitored variable exceeds failure threshold');
        break;
    elseif(any(yF.y>threshold))
        % Estimate the time when the system will reach failure threshold.
        tAlarm = tForecast(find(yF.y>threshold,1));
        disp(['Estimated to hit failure threshold in ' num2str(tAlarm-tHistory(end)) ' minutes from now.']);
        break;
    end
    %     pause(0.1)
end

gif1

此时的时间序列模型:

1
sys
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
sys =
  Discrete-time identified state-space model:
    x(t+Ts) = A x(t) + K e(t)
       y(t) = C x(t) + e(t)
 
  A = 
           x1      x2
   x1       0       1
   x2  0.2624   0.746
 
  C = 
       x1  x2
   y1   1   0
 
  K = 
           y1
   x1  0.3902
   x2  0.3002
 
Sample time: 600 seconds
  
Parameterization:
   CANONICAL form with indices: 2.
   Disturbance component: estimate
   Number of free coefficients: 4
   Use "idssdata", "getpvec", "getcov" for parameters and their uncertainties.

Status:                                               
Estimated using SSEST on time domain data "tsFeature".
Fit to estimation data: 92.53% (prediction focus)     
FPE: 499.3, MSE: 442.7    

可以看到,模型的预测拟合优度达到92.53%,并且正确地捕捉了趋势。用训练集训练出的最终的这个模型,就可以进行实际的预测。


In closing

在这个例子中,时间序列预测模型实际上就是一个second-order state-space model,其中的核心步骤是根据数据使用ssest函数拟合出这个动态模型的参数,实际上就是拟合出一个二阶函数。


References