Similarity-Based Remaining Useful Life Estimation in MATLAB

Sep. 21, 2022 • Updated Oct. 05, 2024

Introduction

本文的示例来自MATLAB Predictive Maintenance Toolbox，Similarity-Based Remaining Useful Life Estimation¹，目的是对设备进行RUL估计，使用的数据集为：PHM2008 challenge dataset。

该示例具有以下特点：

数据集包含了从健康状态到故障状态整个发展过程的数据；
假设设备的退化过程为线性退化过程，不考虑累积退化(cumulative degradation)；

Work flow

Data preparation

使用自定义函数degradationData读取数据文件train_FD002.txt，将每个机器的数据都保存在一个table中，最终将所有机器的table组合成一个cell。

helperLoadData函数：

function degradationData = helperLoadData(filename)
% Copyright 2017-2018 The MathWorks, Inc.

% Load the text file as a table
t = readtable(filename);

% Add variable names to the table
VarNames = {...
    'id', 'time', 'op_setting_1', 'op_setting_2', 'op_setting_3', ...
    'sensor_1', 'sensor_2', 'sensor_3', 'sensor_4', 'sensor_5', ...
    'sensor_6', 'sensor_7', 'sensor_8', 'sensor_9', 'sensor_10', ...
    'sensor_11', 'sensor_12', 'sensor_13', 'sensor_14', 'sensor_15', ...
    'sensor_16', 'sensor_17', 'sensor_18', 'sensor_19', 'sensor_20', ...
    'sensor_21'};
t.Properties.VariableNames = VarNames;

% Split the signals from each machine ID into individual tables 
% and collect the tables in a cell array.
IDs = t{:,1};
nID = unique(IDs);
degradationData = cell(numel(nID),1);
for ct=1:numel(nID)
    idx = IDs == nID(ct);
    degradationData{ct} = t(idx,:);
end
end

读取训练数据文件train_FD002.txt：

% Load train data
degradationData = helperLoadData('train_FD002.txt');

>> degradationData
ans =
  260×1 cell array
    {149×26 table}
    {269×26 table}
    {206×26 table}
    ...
    {205×26 table}
    {316×26 table}

其中，cell变量degradationData中的第$i$个元素代表第$i$个机器的数据集，每个元素的行数是不一样的，表示每个机器的数据集数量是不一样的，但是列数是一样的，都是26列，第1列表示machine ID，第2列是time stamp，第3~5列表示3个operating conditions，第6~26列表示21个sensor measurements。

之后，将数据集分割为训练集和验证集：

% Split the degradation data into a training data set 
% and a validation data set for later performance evaluation
rng('default')  % For repeatablility
numEnsemble = length(degradationData);
numFold = 5;
cv = cvpartition(numEnsemble, 'KFold', numFold);
trainData = degradationData(training(cv, 1));
validationData = degradationData(test(cv, 1));

此时，trainData为208x1 cell，validationData为52x1 cell。

之后，将数据表的列名，分为时间列、condition列、变量列，供后续使用：

% Specify groups of variables of interest
varNames = string(degradationData{1}.Properties.VariableNames);
timeVariable = varNames{2};
conditionVariables = varNames(3:5);
dataVariables = varNames(6:26);

timeVariable =
    'time'

conditionVariables = 
  1×3 string array
  Column 1
    "op_setting_1"
  Column 2
    "op_setting_2"
  Column 3
    "op_setting_3"

dataVariables = 
  1×21 string array
  Columns 1 through 2
    "sensor_1"    "sensor_2"
  Columns 3 through 4
    "sensor_3"    "sensor_4"
    ...
  Column 21
    "sensor_21"

使用自定义函数helperPlotEnsemble随机选取10个机器的数据，并且绘制这10个机器op_setting_1列、op_setting_2列、sensor_1列、sensor_2列随时间变化的曲线（每张图像有10条曲线）。

helperPlotEnsemble函数：

function helperPlotEnsemble(ensemble, TimeVariable, DataVariables, nsample, varargin)
% Copyright 2017-2018 The MathWorks, Inc.
% ensemble, 208x1 cell

% Randomly select 10(nsample) machines
randIdx = randperm(length(ensemble));
ensembleSample = ensemble(randIdx(1:nsample));

% Plot profiles
numData = length(DataVariables);
for i = 1:numData
    data2plot = DataVariables(i);
    subplot(numData, 1, i)
    hold on
    for j = 1:length(ensembleSample)
        ensembleMember = ensembleSample{j};
        if isa(ensembleMember, 'double') % Determine if input is 'double' class
            % TimeVariable and DataVariables are doubles
            if isempty(TimeVariable)
                plot(ensembleMember(:, DataVariables), varargin{:})
            else
                plot(ensembleMember(:, TimeVariable), ensembleMember(:, DataVariables), varargin{:})
            end
            xlblstr = "time";
            ylblstr = "Var " + data2plot;
        else
            % ensembleMember is a table
            plot(ensembleMember.(char(TimeVariable)), ensembleMember.(char(data2plot)), varargin{:})
            xlblstr = TimeVariable;
            ylblstr = data2plot;
        end
    end
    hold off
    ylabel(ylblstr, 'Interpreter', 'none')
end
xlabel(xlblstr)
end

绘制图像：

% Visualize a sample of the ensemble data
nsample = 10;
figure('Units', 'pixels', 'Position', [299, 101, 1100, 789])
helperPlotEnsemble(trainData, timeVariable, [conditionVariables(1:2) dataVariables(1:2)], nsample)

为了看得清楚一点，我们可以修改nsample，只可视化2个机器的情况：

可以看到，这些数据随时间变化是杂乱无章的，没有明显的退化趋势。但是处理后的曲线就可能会表现出一种趋势。

Working regime clustering

把208x1 cell的trainData展开成43352x26 table的trainDataUnwrap，之后选中3个operating conditions列，并使用自定义函数helperPlotClusters将它们绘制在三维空间上：

helperPlotClusters函数：

function helperPlotClusters(X, idx, C)
% Copyright 2017-2018 The MathWorks, Inc.

if(nargin>1)
    hold on
    for i = 1:max(idx)
        scatter3(X{idx==i,1}, X{idx==i,2}, X{idx==i,3});
    end
    scatter3(C(:,1),C(:,2),C(:,3), 'x', 'MarkerFaceColor', [0 0 0], ...
        'MarkerEdgeColor', [0 0 0], 'LineWidth', 2);
    legendStr = ["Cluster "+(1:6), "Centroids"];
    legend(cellstr(legendStr), 'Location', 'NW');
    hold off
    view(-30,30)
    grid on
else
    scatter3(X{:,1}, X{:,2}, X{:,3});
end
xlabel(X.Properties.VariableNames{1}, 'Interpreter', 'none')
ylabel(X.Properties.VariableNames{2}, 'Interpreter', 'none')
zlabel(X.Properties.VariableNames{3}, 'Interpreter', 'none')
end

绘制图像：

% Notice that each ensemble member contains 3 operating conditions: "op_setting_1", "op_setting_2", and "op_setting_3". 
% Firstly, extract the table from each cell and concatenate them into a single table.
trainDataUnwrap = vertcat(trainData{:});
opConditionUnwrap = trainDataUnwrap(:, cellstr(conditionVariables));

% Visualize all operating points on a 3D scatter plot.
% It clearly shows 6 regimes and the points in each regime are in very close proximity.
figure(2)
helperPlotClusters(opConditionUnwrap)

可以看到，所有数据的operating conditions列聚集在三维空间中的6个regime。

之后使用多次k-means找到聚类中心点，并绘制效果图：

% Use clustering techniques to locate the 6 clusters automatically. 
% Repeat the K-means clustering algorithm several times with different initial conditions and pick the results with the lowest cost
opts = statset('Display', 'final');
[clusterIndex, centers] = kmeans(table2array(opConditionUnwrap), 6, ... 
    'Distance', 'sqeuclidean', 'Replicates', 5, 'Options', opts);
% Visualize the clustering results and the identified cluster centroids
figure(3)
helperPlotClusters(opConditionUnwrap, clusterIndex, centers)

Working regime normalization

当聚类工作完成后，就可以计算每条数据的3个operating conditions列距离6个中心点的距离，距离哪个中心点的距离最小就将数据分为哪一类，并按照类别计算均值和标准差，之后使用regimeNormalization函数对每一类数据的传感器数据进行标准化。

regimeNormalization函数：

function data = regimeNormalization(data, centers, centerstats)
% Perform normalization for each observation (row) of the data
% according to the cluster the observation belongs to.
conditionIdx = 3:5;
dataIdx = 6:26;

% Perform row-wise operation
data{:, dataIdx} = table2array(...
    rowfun(@(row) localNormalize(row, conditionIdx, dataIdx, centers, centerstats), ...
    data, 'SeparateInputs', false));
end

其中的localNormalize函数：

function data = regimeNormalization(data, centers, centerstats)
% Perform normalization for each observation (row) of the data
% according to the cluster the observation belongs to.
conditionIdx = 3:5;
dataIdx = 6:26;

% Perform row-wise operation
data{:, dataIdx} = table2array(...
    rowfun(@(row) localNormalize(row, conditionIdx, dataIdx, centers, centerstats), ...
    data, 'SeparateInputs', false));
end

实现代码：

%% Working Regime Normalization
centerstats = struct('Mean', table(), 'SD', table());
for v = dataVariables
    centerstats.Mean.(char(v)) = splitapply(@mean, trainDataUnwrap.(char(v)), clusterIndex);
    centerstats.SD.(char(v))   = splitapply(@std,  trainDataUnwrap.(char(v)), clusterIndex);
end

trainDataNormalized = cellfun(@(data) regimeNormalization(data, centers, centerstats), ...
    trainData, 'UniformOutput', false);

之后，绘制标准化后的前四个传感器监测量随时间变化曲线：

figure(4)
helperPlotEnsemble(trainDataNormalized, timeVariable, dataVariables(1:4), nsample)

标准化后，一些传感器值就表现一定的随时间变化的趋势，之后就针对每一个传感器数据进行趋势分析。

Trendability analysis `linearDegradationModel`

将训练集中所有机器的同一种传感器随时间变化数据都挑选出来赋给变量tmp，因此变量tmp是一个208x1 cell变量，其中第i个元素为nx1 table的变量，n为机器i的数据数量（即时间长度）。使用数据集tmp拟合线性模型mdl = linearDegradationModel()，之后将参数mdl.Theta保存在变量signalSlope中：

numSensors = length(dataVariables);
signalSlope = zeros(numSensors, 1);
warn = warning('off');
for ct = 1:numSensors
    tmp = cellfun(@(tbl) tbl(:, cellstr(dataVariables(ct))), trainDataNormalized, 'UniformOutput', false);
    mdl = linearDegradationModel(); % create model
    fit(mdl, tmp); % train mode
    signalSlope(ct) = mdl.Theta;
end
warning(warn);

最终，斜率参数signalSlope是一个21x1 double变量。

之后，将斜率参数的绝对值进行降序排序，选取斜率最大的前8个传感器作为特征量：

[~, idx] = sort(abs(signalSlope), 'descend');
sensorTrended = sort(idx(1:8));

并绘制其中排序为3~6的传感器检测值随时间变化曲线：

figure(5)
helperPlotEnsemble(trainDataNormalized, timeVariable, dataVariables(sensorTrended(3:6)), nsample)

Construct health condition

由于这里的数据集中的数据都是run-to-failure数据，每一个机器最开始都是健康状态，最后都是故障状态。我们需要构造一个随时间线性变化的Health condition，向所有数据添加health_condition列，初始状态时Health condition为1，故障状态时Health condition为0。

for j=1:numel(trainDataNormalized)
    data = trainDataNormalized{j};
    rul = max(data.time)-data.time;
    data.health_condition = rul / max(rul);
    trainDataNormalized{j} = data;
end

figure(6)
helperPlotEnsemble(trainDataNormalized, timeVariable, "health_condition", nsample)

Construct health indicator

之后，我们需要将8个传感器值融合成一个Health indicator，并寻找它和health_condition列之间的关系，这里假设是线性关系：

Health Condition ~ 1 + Sensor2 + Sensor3 + Sensor4 + Sensor7 + Sensor11 + Sensor12 + Sensor15 + Sensor17

这实际上就是一个参数估计问题，使用的是fitlm函数：

trainDataNormalizedUnwrap = vertcat(trainDataNormalized{:});

sensorToFuse = dataVariables(sensorTrended);
X = trainDataNormalizedUnwrap{:, cellstr(sensorToFuse)};
y = trainDataNormalizedUnwrap.health_condition;
regModel = fitlm(X, y);
bias = regModel.Coefficients.Estimate(1);
weights = regModel.Coefficients.Estimate(2:end);

bias =
    0.5000

weights =
   -0.0323
   -0.0300
   -0.0527
    0.0057
   -0.0646
    0.0054
   -0.0431
   -0.0379

之后，基于bias和weights融合传感器信息，使用degradationSensorFusion函数构造一个Health indicator列（但是并没有添加到数据表中，这点和Health condition不同）。

degradationSensorFusion函数：

function dataFused = degradationSensorFusion(data, sensorToFuse, weights)
% Combine measurements from different sensors according 
% to the weights, smooth the fused data and offset the data
% so that all the data start from 1

% Fuse the data according to weights
dataToFuse = data{:, cellstr(sensorToFuse)};
dataFusedRaw = dataToFuse*weights;

% Smooth the fused data with moving mean
stepBackward = 10;
stepForward = 10;
dataFused = movmean(dataFusedRaw, [stepBackward stepForward]);

% Offset the data to 1
dataFused = dataFused + 1 - dataFused(1);
end

实现代码：

% Construct a single health indicator by multiplying the sensor measurements with their associated weights
trainDataFused = cellfun(@(data) degradationSensorFusion(data, sensorToFuse, weights), trainDataNormalized, ...
    'UniformOutput', false);% trainDataFused, 208x1 cell

并可视化10个机器的Health indicator：

% Visualize the fused health indicator for training data
figure(7)
helperPlotEnsemble(trainDataFused, [], 1, nsample)
xlabel('Time')
ylabel('Health Indicator')
title('Training Data')

注：在degradationSensorFusion函数中可以看到，这些数据（即曲线）是经过了movmean滑动平均的，并且考虑到了滑动平均技术可能引起的offset。

之后，使用验证集进行验证Health Indicator的趋势性：

validationDataNormalized = cellfun(@(data) regimeNormalization(data, centers, centerstats), ...
    validationData, 'UniformOutput', false);
validationDataFused = cellfun(@(data) degradationSensorFusion(data, sensorToFuse, weights), ...
    validationDataNormalized, 'UniformOutput', false);

figure(8)
helperPlotEnsemble(validationDataFused, [], 1, nsample)
xlabel('Time')
ylabel('Health Indicator')
title('Validation Data')

Build and fit similarity RUL model `residualSimilarityModel`

现在，我们基于Health condition构造出了Health indicator曲线，Health indicator曲线就代表了整个退化趋势和过程。接下来就是构建并拟合RUL模型，这里采用residualSimilarityModel函数²构建：

mdl = residualSimilarityModel(...
    'Method', 'poly2',...
    'Distance', 'absolute',...
    'NumNearestNeighbors', 50,...
    'Standardize', 1);

fit(mdl, trainDataFused);

mdl = 
  residualSimilarityModel with properties:
                 Models: {208×1 cell}
               ModelMSE: [208×1 double]
               LifeSpan: [208×1 double]
    NumNearestNeighbors: 50
                 Method: "poly2"
               Distance: "absolute"
            IncludeTies: 1
            Standardize: 1
       LifeTimeVariable: ""
           LifeTimeUnit: ""
          DataVariables: ""
               UserData: []
            UseParallel: 0

这里的拟合实际上就是拟合一个二阶多项式来表示一个机器的Health indicator的曲线。

我们可以验证一下，首先查看mdl.Models元素的参数：

>> mdl.Models{1}
ans =
   -0.0000    0.0023    0.9812

其中，三个值分别表示常数项系数，一阶系数和二阶系数。将该多项式曲线和真实曲线做对比：

truedata = trainDataFused{1, 1};
coefficients = mdl.Models{1};
time = 1:numel(truedata);
fitdata = coefficients(3)+ coefficients(2)*time + coefficients(1)*time.^2;

figure
hold on
plot(truedata)
plot(fitdata)
legend('True data', 'Fit data');
hold off
grid on
box on

Performance evaluation

当我们构建并使用训练数据拟合了一个RUL模型后，就需要使用验证集测试模型的效果，即使用predictRUL函数进行预测。

首先选取验证集中的一个Health indicator曲线，并且截取其前50%，70%，90%的数据进行预测：

% To evaluate the similarity RUL model, use 50%, 70% and 90% of a sample validation data to predict its RUL.
breakpoint = [0.5, 0.7, 0.9];
validationDataTmp = validationDataFused{3}; % use one validation data for illustration

Use 50% of a sample validation data

% Use the validation data before the first breakpoint, 50% of the lifetime
bpidx = 1;
validationDataTmp50 = validationDataTmp(1:ceil(end*breakpoint(bpidx)),:);
trueRUL = length(validationDataTmp) - length(validationDataTmp50);
[estRUL, ciRUL, pdfRUL] = predictRUL(mdl, validationDataTmp50);
% Visualize the validation data truncated at 50% 
% and its nearest neighbors.
figure(9)
compare(mdl, validationDataTmp50);

estRUL =
  100.9366

ciRUL =
   65.0099  137.9562

pdfRUL =
  68×2 table
     RUL      ProbabilityDensity
    ______    __________________

    49.436         0.0015362    
      :               :         
    177.56        0.00065188    

之后，使用自定义函数helperPlotRULDistribution对比真实RUL与预测RUL。

helperPlotRULDistribution函数：

function helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)
% HELPERPLOTRULDISTRIBUTION Plot RUL distribution
%
% This function supports the Similarity-based Remaining Useful Life
% Estimation example. It may change in a future release.

%  Copyright 2017-2018 The MathWorks, Inc.

hold on
plot(pdfRUL.RUL, pdfRUL.ProbabilityDensity, 'b');
idx = find(pdfRUL.RUL > trueRUL, 1, 'first');
if isempty(idx)
    y = pdfRUL.ProbabilityDensity(end);
else
    y = pdfRUL.ProbabilityDensity(idx);
end
plot([trueRUL, trueRUL], [0, y], 'r');
idx = find(pdfRUL.RUL > estRUL, 1, 'first');
if isempty(idx)
    y = pdfRUL.ProbabilityDensity(end);
else
    y = pdfRUL.ProbabilityDensity(idx);
end
plot([estRUL, estRUL], [0, y], 'g');
idx = pdfRUL.RUL >= ciRUL(1) & pdfRUL.RUL<=ciRUL(2);
area(pdfRUL.RUL(idx), pdfRUL.ProbabilityDensity(idx), ...
    'FaceAlpha', 0.2, 'FaceColor', 'y', 'EdgeColor', 'none');
hold off
legend('Probability Density Function', 'True RUL', 'Estimated RUL', '90% Confidence Inteval');
xlabel('Cycle')
ylabel('Probability Density')
title('RUL Estimation')
end

实现函数：

% Visualize the estimated RUL compared to the true RUL and the probability distribution of the estimated RUL.
figure(10)
helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)

对于得到的结果需要有几点说明。

(1) 'NumNearestNeighbors'

上图中蓝色的曲线是训练集中和validationDataTmp50最接近的50条曲线，这个曲线的个数是之前在residualSimilarityModel函数中使用'NumNearestNeighbors'进行设置的，如果属性值改为10：

mdl = residualSimilarityModel(...
    'Method', 'poly2',...
    'Distance', 'absolute',...
    'NumNearestNeighbors', 50,...
    'Standardize', 1);

estRUL =
  111.9934

ciRUL =
   89.4586  129.1621

pdfRUL =
  38×2 table
     RUL      ProbabilityDensity
    ______    __________________

    86.517        0.0081863     
    87.732        0.0086341       
      :               :           
    131.48         0.015314

可以看到，修改了'NumNearestNeighbors'属性会影响estRUL、ciRUL和pdfRUL的值。

(2) predictRUL function

上面的代码预测用的是predictRUL函数³：

[estRUL, ciRUL, pdfRUL] = predictRUL(mdl, validationDataTmp50);

estRUL =
  100.9366

ciRUL =
   65.0099  137.9562

pdfRUL =
  68×2 table
     RUL      ProbabilityDensity
    ______    __________________

    49.436         0.0015362    
      :               :         
    177.56        0.00065188

其中，返回参数：

estRUL，表示mdl预测的设备剩余使用寿命，是一个标量；
ciRUL，表示变量estRUL的置信区间，是一个二元行向量，用户可以通过指定predictRUL函数的Alpha参数来更改置信度；
pdfRUL，表示RUL的概率密度方程。

至于是如何计算出置信区间和概率密度函数的，我不是很清楚。

Given one ensemble member in the validation data set, the model will find the nearest 50 ensemble members in the training data set, fit a probability distribution based on the 50 ensemble members, and use the median of the distribution as an estimate of RUL.¹

(3) Custom function helperPlotRULDistribution

function helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)
% HELPERPLOTRULDISTRIBUTION Plot RUL distribution
%
% This function supports the Similarity-based Remaining Useful Life
% Estimation example. It may change in a future release.

%  Copyright 2017-2018 The MathWorks, Inc.

hold on
plot(pdfRUL.RUL, pdfRUL.ProbabilityDensity, 'b');
idx = find(pdfRUL.RUL > trueRUL, 1, 'first');
if isempty(idx)
    y = pdfRUL.ProbabilityDensity(end);
else
    y = pdfRUL.ProbabilityDensity(idx);
end
plot([trueRUL, trueRUL], [0, y], 'r');
idx = find(pdfRUL.RUL > estRUL, 1, 'first');
if isempty(idx)
    y = pdfRUL.ProbabilityDensity(end);
else
    y = pdfRUL.ProbabilityDensity(idx);
end
plot([estRUL, estRUL], [0, y], 'g');
idx = pdfRUL.RUL >= ciRUL(1) & pdfRUL.RUL<=ciRUL(2);
area(pdfRUL.RUL(idx), pdfRUL.ProbabilityDensity(idx), ...
    'FaceAlpha', 0.2, 'FaceColor', 'y', 'EdgeColor', 'none');
hold off
legend('Probability Density Function', 'True RUL', 'Estimated RUL', '90% Confidence Inteval');
xlabel('Cycle')
ylabel('Probability Density')
title('RUL Estimation')
end

helperPlotRULDistribution函数将预测结果与真实结果进行对比并可视化，这是一个比较漂亮的呈现方式。

Use 70% of a sample validation data

% Use the validation data before the second breakpoint, 70% of the lifetime
bpidx = 2;
validationDataTmp70 = validationDataTmp(1:ceil(end*breakpoint(bpidx)), :);
trueRUL = length(validationDataTmp) - length(validationDataTmp70);
[estRUL, ciRUL, pdfRUL] = predictRUL(mdl, validationDataTmp70);
figure(11)
compare(mdl, validationDataTmp70);
figure(12)
helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)

Use 90% of a sample validation data

% Use the validation data before the second breakpoint, 90% of the lifetime
bpidx = 3;
validationDataTmp90 = validationDataTmp(1:ceil(end*breakpoint(bpidx)), :);
trueRUL = length(validationDataTmp) - length(validationDataTmp90);
[estRUL, ciRUL, pdfRUL] = predictRUL(mdl, validationDataTmp90);
figure(13)
compare(mdl, validationDataTmp90);
figure
helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)

Summary

三次RUL估计真实值与估计值之间的差值分别是：23.0634，13.8251，10.0474，单位是小时。我们可以得出以下的结论：

该模型可以在真正故障前做出预警，可以达到预测检修的效果（只是针对这一条曲线）
There is a relatively large error between the estimated RUL and the true RUL when the machine is in an intermediate health stage. In this example, the most similar 10 curves are close at the beginning, but bifurcate when they approach the failure state, resulting in roughly two modes in the RUL distribution.
When more data is observed, the RUL estimation is enhanced.
When the machine is close to failure, the RUL estimation is even more enhanced in this example.

Repeat the same evaluation procedure for the whole validation data set

对所有的测试集数据重复上述步骤，并且计算预测误差：

% Repeat the same evaluation procedure fo the whole validation data set
numValidation = length(validationDataFused);
numBreakpoint = length(breakpoint);
error = zeros(numValidation, numBreakpoint);
for dataIdx = 1:numValidation
    tmpData = validationDataFused{dataIdx};
    for bpidx = 1:numBreakpoint
        tmpDataTest = tmpData(1:ceil(end*breakpoint(bpidx)), :);
        trueRUL = length(tmpData) - length(tmpDataTest);
        [estRUL, ~, ~] = predictRUL(mdl, tmpDataTest);
        error(dataIdx, bpidx) = estRUL - trueRUL;
    end
end

使用核函数估计误差的分布，并绘制图像：

numValidation = length(validationDataFused);
numBreakpoint = length(breakpoint);
error = zeros(numValidation, numBreakpoint);
for dataIdx = 1:numValidation
    tmpData = validationDataFused{dataIdx};
    for bpidx = 1:numBreakpoint
        tmpDataTest = tmpData(1:ceil(end*breakpoint(bpidx)), :);
        trueRUL = length(tmpData) - length(tmpDataTest);
        [estRUL, ~, ~] = predictRUL(mdl, tmpDataTest);
        error(dataIdx, bpidx) = estRUL - trueRUL;
    end
end
[pdf50, x50] = ksdensity(error(:, 1));
[pdf70, x70] = ksdensity(error(:, 2));
[pdf90, x90] = ksdensity(error(:, 3));
figure(14)
ax(1) = subplot(3,1,1);
hold on
histogram(error(:, 1), 'BinWidth', 5, 'Normalization', 'pdf')
plot(x50, pdf50)
hold off
title('RUL Prediction Error using first 50% of each validation ensemble member')
ax(2) = subplot(3,1,2);
hold on
histogram(error(:, 2), 'BinWidth', 5, 'Normalization', 'pdf')
plot(x70, pdf70)
hold off
title('RUL Prediction Error using first 70% of each validation ensemble member')
ax(3) = subplot(3,1,3);
hold on
histogram(error(:, 3), 'BinWidth', 5, 'Normalization', 'pdf')
plot(x90, pdf90)
hold off
xlabel('Prediction Error')
title('RUL Prediction Error using first 90% of each validation ensemble member')
linkaxes(ax, 'x')

并且绘制误差的箱线图，查看四分位点：

figure
boxplot(error, 'Labels', {'50%', '70%', '90%'})
ylabel('Prediction Error')
title('Prediction error using different percentages of each validation ensemble member')

之后，计算预测误差的均值和标准差，并绘制errorbar：

errorMean = mean(error)
errorMedian = median(error)
errorSD = std(error)

errorMean =
   -1.1217    9.5186    9.6540

errorMedian =
    1.3798   11.8172   10.3457

errorSD =
   21.7315   13.5194   12.3083

In closing

综上，对本示例进行总结：

数据集包含了从健康状态到故障状态整个发展过程的传感器数据；
本实例假设退化过程为线性的，不考虑累积退化，这表现在：在拟合标准化后的传感器量的趋势时，采用的是linearDegradationModel，并且体现在Health indicator的构建上；
本示例在融合传感器信息时，即在构造Health indicator时，采用的是比较简单的技术：即挑选8个趋势性最明显的传感器数据，之后采用线性模型，参数估计出出它们与Health condition之间的关系（即线性系数）。除此之外，还可以使用更为丰富的技术，MATLAB官方提供了几个参考文献⁴⁵⁶⁷。

另外，在这个示例中，还存在一些我不是很清楚的部分：

predictRUL函数根据residualSimilarityModel函数构建出的模型mdl计算出RUL估计的置信区间和PDF的方法；
核函数估计的ksdensity函数；

References

Similarity-Based Remaining Useful Life Estimation. ˄ ˄²
residualSimilarityModel: Residual comparison-based similarity model for estimating remaining useful life. ˄
predictRUL: Estimate remaining useful life for a test component. ˄
Roemer, Michael J., Gregory J. Kacprzynski, and Michael H. Schoeller. “Improved diagnostic and prognostic assessments using health management information fusion.” AUTOTESTCON Proceedings, 2001. IEEE Systems Readiness Technology Conference. IEEE, 2001. ˄
Goebel, Kai, and Piero Bonissone. “Prognostic information fusion for constant load systems.” Information Fusion, 2005 8th International Conference on. Vol. 2. IEEE, 2005. ˄
Wang, Peng, and David W. Coit. “Reliability prediction based on degradation modeling for systems with multiple degradation measures.” Reliability and Maintainability, 2004 Annual Symposium-RAMS. IEEE, 2004. ˄
Jardine, Andrew KS, Daming Lin, and Dragan Banjevic. “A review on machinery diagnostics and prognostics implementing condition-based maintenance.” Mechanical systems and signal processing 20.7 (2006): 1483-1510. ˄