Generate Machine Learning Model `.m` Code Files Automatically in MATLAB

Aug. 02, 2022 • Updated Aug. 02, 2022

MATLAB Classification Learner app 提供了方便的、图形化操作的训练机器学习模型的手段，并且可以将整个的训练过程自动生成为函数代码或者模型代码

本文就导出了一个核函数为线性函数的支持向量机模型，得到了函数文件 trainClassifier.m ，并以此文件为例分析代码的整体结构。

（1）函数文件注释

function [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% Returns a trained classifier and its accuracy. This code recreates the
% classification model trained in Classification Learner app. Use the
% generated code to automate training the same model with new data, or to
% learn how to programmatically train models.
%
%  Input:
%      trainingData: A table containing the same predictor and response
%       columns as those imported into the app.
%
%  Output:
%      trainedClassifier: A struct containing the trained classifier. The
%       struct contains various fields with information about the trained
%       classifier.
%
%      trainedClassifier.predictFcn: A function to make predictions on new
%       data.
%
%      validationAccuracy: A double containing the accuracy as a
%       percentage. In the app, the Models pane displays this overall
%       accuracy score for each model.
%
% Use the code to train the model with new data. To retrain your
% classifier, call the function from the command line with your original
% data or new data as the input argument trainingData.
%
% For example, to retrain a classifier trained with the original data set
% T, enter:
%   [trainedClassifier, validationAccuracy] = trainClassifier(T)
%
% To make predictions with the returned 'trainedClassifier' on new data T2,
% use
%   yfit = trainedClassifier.predictFcn(T2)
%
% T2 must be a table containing at least the same predictor columns as used
% during training. For details, enter:
%   trainedClassifier.HowToPredict

% Auto-generated by MATLAB on 2022-08-02 21:36:27

注释部分主要介绍了函数的输入输出和其他相关信息

输入：变量 trainingData ，该变量是一个 table 变量，包含了特征列(predictor columns)和标签列(response column)
输出：

变量 trainedClassifier，该变量是一个结构体，其中包含一个已经训练好的分类模型，以及其相关信息和函数

变量 validationAccuracy，交叉验证准确率

另外，可以使用 trainedClassifier.predictFcn 预测未知数据集的标签.

⭐ 注：

trainedClassifier 是一个结构体，其中除了保存训练好的分类模型外，还保存了其他相关信息和函数（比如 trainedClassifier.predictFcn。对于该示例，trainedClassifier.ClassificationSVM 才是训练好的分类模型，数据类型为 1x1 ClassificationECOC 。

（2）分离特征列和标签列

% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];

（3）设置分类器超参数并进行训练

% Train a classifier
% This code specifies all the classifier options and trains the classifier.
template = templateSVM(...
    'KernelFunction', 'linear', ...
    'PolynomialOrder', [], ...
    'KernelScale', 'auto', ...
    'BoxConstraint', 1, ...
    'Standardize', true);
classificationSVM = fitcecoc(...
    predictors, ...
    response, ...
    'Learners', template, ...
    'Coding', 'onevsone', ...
    'ClassNames', [1; 2; 3; 4; 5; 6; 7]);

templateSVM ：设置分类器超参数
fitcecoc ：训练模型

（4）设置结构体 trainedClassifier 的 predictFcn 函数，用于预测未知数据的标签

% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
svmPredictFcn = @(x) predict(classificationSVM, x);
trainedClassifier.predictFcn = @(x) svmPredictFcn(predictorExtractionFcn(x));

（5）在结构体中保存其他信息

% Add additional fields to the result struct
trainedClassifier.RequiredVariables = {'BeginningTime', 'BeginningVoltage', 'EndingVoltage', 'FirstPeakValue', 'MaxVoltage', 'MaxVoltageTime', 'SecondPeakValue', 'Stroke', 'ValleyValue', 'Velocity', 'duration', 'stage1', 'stage2', 'stage3', 'stage4'};
trainedClassifier.ClassificationSVM = classificationSVM;
trainedClassifier.About = 'This struct is a trained model exported from Classification Learner R2022a.';
trainedClassifier.HowToPredict = sprintf('To make predictions on a new table, T, use: \n  yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n  c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appclassification_exportmodeltoworkspace'')">How to predict using an exported model</a>.');

（6）分离特征列和标签列

% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];

该部分与第（2）部分代码完全一致。这可能是因为我设置了模型交叉验证，这是整个交叉验证步骤所生成代码的一部分，MTALB 并不会对重复的代码进行简化。

（7）交叉验证

% Perform cross-validation
partitionedModel = crossval(trainedClassifier.ClassificationSVM, 'KFold', 5);

% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);

% Compute validation accuracy
validationAccuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'ClassifError');

函数 crossval 的功能是配置交叉验证的标准（类似于测试误差的计算方法）、数据集分割折数等选项信息，之后传入到 kfoldPredict 函数中。详见：crossval - MATLAB Documentation
函数 kfoldPredict 是真正进行交叉验证的操作
使用 kfoldLoss 函数计算总的验证准确率

这部分的代码依赖于一开始的交叉验证设置，比如上述代码就是使用下图所示的 k 折交叉验证法

如果改变交叉验证策略，比如使用留出法

则相应的代码为

...
% Set up holdout validation
cvp = cvpartition(response, 'Holdout', 0.25);
trainingPredictors = predictors(cvp.training, :);
trainingResponse = response(cvp.training, :);
trainingIsCategoricalPredictor = isCategoricalPredictor;

% Train a classifier
% This code specifies all the classifier options and trains the classifier.
template = templateSVM(...
    'KernelFunction', 'linear', ...
    'PolynomialOrder', [], ...
    'KernelScale', 'auto', ...
    'BoxConstraint', 1, ...
    'Standardize', true);
classificationSVM = fitcecoc(...
    trainingPredictors, ...
    trainingResponse, ...
    'Learners', template, ...
    'Coding', 'onevsone', ...
    'ClassNames', [1; 2; 3; 4; 5; 6; 7]);

% Create the result struct with predict function
svmPredictFcn = @(x) predict(classificationSVM, x);
validationPredictFcn = @(x) svmPredictFcn(x);

% Add additional fields to the result struct


% Compute validation predictions
validationPredictors = predictors(cvp.test, :);
validationResponse = response(cvp.test, :);
[validationPredictions, validationScores] = validationPredictFcn(validationPredictors);

% Compute validation accuracy
correctPredictions = (validationPredictions == validationResponse);
isMissing = isnan(validationResponse);
correctPredictions = correctPredictions(~isMissing);
validationAccuracy = sum(correctPredictions)/length(correctPredictions);

如果训练集和验证集使用的是同一个数据集

则

% Compute resubstitution predictions
[validationPredictions, validationScores] = predict(trainedClassifier.ClassificationSVM, predictors);

% Compute resubstitution accuracy
validationAccuracy = 1 - resubLoss(trainedClassifier.ClassificationSVM, 'LossFun', 'ClassifError');

最终，完整的函数文件代码：

function [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% Returns a trained classifier and its accuracy. This code recreates the
% classification model trained in Classification Learner app. Use the
% generated code to automate training the same model with new data, or to
% learn how to programmatically train models.
%
%  Input:
%      trainingData: A table containing the same predictor and response
%       columns as those imported into the app.
%
%  Output:
%      trainedClassifier: A struct containing the trained classifier. The
%       struct contains various fields with information about the trained
%       classifier.
%
%      trainedClassifier.predictFcn: A function to make predictions on new
%       data.
%
%      validationAccuracy: A double containing the accuracy as a
%       percentage. In the app, the Models pane displays this overall
%       accuracy score for each model.
%
% Use the code to train the model with new data. To retrain your
% classifier, call the function from the command line with your original
% data or new data as the input argument trainingData.
%
% For example, to retrain a classifier trained with the original data set
% T, enter:
%   [trainedClassifier, validationAccuracy] = trainClassifier(T)
%
% To make predictions with the returned 'trainedClassifier' on new data T2,
% use
%   yfit = trainedClassifier.predictFcn(T2)
%
% T2 must be a table containing at least the same predictor columns as used
% during training. For details, enter:
%   trainedClassifier.HowToPredict

% Auto-generated by MATLAB on 2022-08-02 21:36:27


% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];

% Train a classifier
% This code specifies all the classifier options and trains the classifier.
template = templateSVM(...
    'KernelFunction', 'linear', ...
    'PolynomialOrder', [], ...
    'KernelScale', 'auto', ...
    'BoxConstraint', 1, ...
    'Standardize', true);
classificationSVM = fitcecoc(...
    predictors, ...
    response, ...
    'Learners', template, ...
    'Coding', 'onevsone', ...
    'ClassNames', [1; 2; 3; 4; 5; 6; 7]);

% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
svmPredictFcn = @(x) predict(classificationSVM, x);
trainedClassifier.predictFcn = @(x) svmPredictFcn(predictorExtractionFcn(x));

% Add additional fields to the result struct
trainedClassifier.RequiredVariables = {'BeginningTime', 'BeginningVoltage', 'EndingVoltage', 'FirstPeakValue', 'MaxVoltage', 'MaxVoltageTime', 'SecondPeakValue', 'Stroke', 'ValleyValue', 'Velocity', 'duration', 'stage1', 'stage2', 'stage3', 'stage4'};
trainedClassifier.ClassificationSVM = classificationSVM;
trainedClassifier.About = 'This struct is a trained model exported from Classification Learner R2022a.';
trainedClassifier.HowToPredict = sprintf('To make predictions on a new table, T, use: \n  yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n  c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appclassification_exportmodeltoworkspace'')">How to predict using an exported model</a>.');

% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];

% Perform cross-validation
partitionedModel = crossval(trainedClassifier.ClassificationSVM, 'KFold', 5);

% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);

% Compute validation accuracy
validationAccuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'ClassifError');