Generate Machine Learning Model .m
Code Files Automatically in MATLAB
MATLAB Classification Learner app 提供了方便的、图形化操作的训练机器学习模型的手段,并且可以将整个的训练过程自动生成为函数代码或者模型代码
本文就导出了一个核函数为线性函数的支持向量机模型,得到了函数文件 trainClassifier.m ,并以此文件为例分析代码的整体结构。
(1)函数文件注释
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
function [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% Returns a trained classifier and its accuracy. This code recreates the
% classification model trained in Classification Learner app. Use the
% generated code to automate training the same model with new data, or to
% learn how to programmatically train models.
%
% Input:
% trainingData: A table containing the same predictor and response
% columns as those imported into the app.
%
% Output:
% trainedClassifier: A struct containing the trained classifier. The
% struct contains various fields with information about the trained
% classifier.
%
% trainedClassifier.predictFcn: A function to make predictions on new
% data.
%
% validationAccuracy: A double containing the accuracy as a
% percentage. In the app, the Models pane displays this overall
% accuracy score for each model.
%
% Use the code to train the model with new data. To retrain your
% classifier, call the function from the command line with your original
% data or new data as the input argument trainingData.
%
% For example, to retrain a classifier trained with the original data set
% T, enter:
% [trainedClassifier, validationAccuracy] = trainClassifier(T)
%
% To make predictions with the returned 'trainedClassifier' on new data T2,
% use
% yfit = trainedClassifier.predictFcn(T2)
%
% T2 must be a table containing at least the same predictor columns as used
% during training. For details, enter:
% trainedClassifier.HowToPredict
% Auto-generated by MATLAB on 2022-08-02 21:36:27
注释部分主要介绍了函数的输入输出和其他相关信息
-
输入:变量
trainingData
,该变量是一个 table 变量,包含了特征列(predictor columns)和标签列(response column) -
输出:
变量
trainedClassifier
,该变量是一个结构体,其中包含一个已经训练好的分类模型,以及其相关信息和函数变量
validationAccuracy
,交叉验证准确率
另外,可以使用 trainedClassifier.predictFcn
预测未知数据集的标签.
⭐ 注:
trainedClassifier
是一个结构体,其中除了保存训练好的分类模型外,还保存了其他相关信息和函数(比如trainedClassifier.predictFcn
。对于该示例,trainedClassifier.ClassificationSVM
才是训练好的分类模型,数据类型为1x1 ClassificationECOC
。
(2)分离特征列和标签列
1
2
3
4
5
6
7
8
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];
(3)设置分类器超参数并进行训练
1
2
3
4
5
6
7
8
9
10
11
12
13
14
% Train a classifier
% This code specifies all the classifier options and trains the classifier.
template = templateSVM(...
'KernelFunction', 'linear', ...
'PolynomialOrder', [], ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true);
classificationSVM = fitcecoc(...
predictors, ...
response, ...
'Learners', template, ...
'Coding', 'onevsone', ...
'ClassNames', [1; 2; 3; 4; 5; 6; 7]);
templateSVM
:设置分类器超参数fitcecoc
:训练模型
(4)设置结构体 trainedClassifier
的 predictFcn
函数,用于预测未知数据的标签
1
2
3
4
% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
svmPredictFcn = @(x) predict(classificationSVM, x);
trainedClassifier.predictFcn = @(x) svmPredictFcn(predictorExtractionFcn(x));
(5)在结构体中保存其他信息
1
2
3
4
5
6
% Add additional fields to the result struct
trainedClassifier.RequiredVariables = {'BeginningTime', 'BeginningVoltage', 'EndingVoltage', 'FirstPeakValue', 'MaxVoltage', 'MaxVoltageTime', 'SecondPeakValue', 'Stroke', 'ValleyValue', 'Velocity', 'duration', 'stage1', 'stage2', 'stage3', 'stage4'};
trainedClassifier.ClassificationSVM = classificationSVM;
trainedClassifier.About = 'This struct is a trained model exported from Classification Learner R2022a.';
trainedClassifier.HowToPredict = sprintf('To make predictions on a new table, T, use: \n yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appclassification_exportmodeltoworkspace'')">How to predict using an exported model</a>.');
(6)分离特征列和标签列
1
2
3
4
5
6
7
8
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];
该部分与第(2)部分代码完全一致。这可能是因为我设置了模型交叉验证,这是整个交叉验证步骤所生成代码的一部分,MTALB 并不会对重复的代码进行简化。
(7)交叉验证
1
2
3
4
5
6
7
8
% Perform cross-validation
partitionedModel = crossval(trainedClassifier.ClassificationSVM, 'KFold', 5);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validationAccuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'ClassifError');
- 函数
crossval
的功能是配置交叉验证的标准(类似于测试误差的计算方法)、数据集分割折数等选项信息,之后传入到kfoldPredict
函数中。详见:crossval - MATLAB Documentation - 函数
kfoldPredict
是真正进行交叉验证的操作 - 使用
kfoldLoss
函数计算总的验证准确率
这部分的代码依赖于一开始的交叉验证设置,比如上述代码就是使用下图所示的 k 折交叉验证法
如果改变交叉验证策略,比如使用留出法
则相应的代码为
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
...
% Set up holdout validation
cvp = cvpartition(response, 'Holdout', 0.25);
trainingPredictors = predictors(cvp.training, :);
trainingResponse = response(cvp.training, :);
trainingIsCategoricalPredictor = isCategoricalPredictor;
% Train a classifier
% This code specifies all the classifier options and trains the classifier.
template = templateSVM(...
'KernelFunction', 'linear', ...
'PolynomialOrder', [], ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true);
classificationSVM = fitcecoc(...
trainingPredictors, ...
trainingResponse, ...
'Learners', template, ...
'Coding', 'onevsone', ...
'ClassNames', [1; 2; 3; 4; 5; 6; 7]);
% Create the result struct with predict function
svmPredictFcn = @(x) predict(classificationSVM, x);
validationPredictFcn = @(x) svmPredictFcn(x);
% Add additional fields to the result struct
% Compute validation predictions
validationPredictors = predictors(cvp.test, :);
validationResponse = response(cvp.test, :);
[validationPredictions, validationScores] = validationPredictFcn(validationPredictors);
% Compute validation accuracy
correctPredictions = (validationPredictions == validationResponse);
isMissing = isnan(validationResponse);
correctPredictions = correctPredictions(~isMissing);
validationAccuracy = sum(correctPredictions)/length(correctPredictions);
如果训练集和验证集使用的是同一个数据集
则
1
2
3
4
5
% Compute resubstitution predictions
[validationPredictions, validationScores] = predict(trainedClassifier.ClassificationSVM, predictors);
% Compute resubstitution accuracy
validationAccuracy = 1 - resubLoss(trainedClassifier.ClassificationSVM, 'LossFun', 'ClassifError');
最终,完整的函数文件代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
function [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% Returns a trained classifier and its accuracy. This code recreates the
% classification model trained in Classification Learner app. Use the
% generated code to automate training the same model with new data, or to
% learn how to programmatically train models.
%
% Input:
% trainingData: A table containing the same predictor and response
% columns as those imported into the app.
%
% Output:
% trainedClassifier: A struct containing the trained classifier. The
% struct contains various fields with information about the trained
% classifier.
%
% trainedClassifier.predictFcn: A function to make predictions on new
% data.
%
% validationAccuracy: A double containing the accuracy as a
% percentage. In the app, the Models pane displays this overall
% accuracy score for each model.
%
% Use the code to train the model with new data. To retrain your
% classifier, call the function from the command line with your original
% data or new data as the input argument trainingData.
%
% For example, to retrain a classifier trained with the original data set
% T, enter:
% [trainedClassifier, validationAccuracy] = trainClassifier(T)
%
% To make predictions with the returned 'trainedClassifier' on new data T2,
% use
% yfit = trainedClassifier.predictFcn(T2)
%
% T2 must be a table containing at least the same predictor columns as used
% during training. For details, enter:
% trainedClassifier.HowToPredict
% Auto-generated by MATLAB on 2022-08-02 21:36:27
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];
% Train a classifier
% This code specifies all the classifier options and trains the classifier.
template = templateSVM(...
'KernelFunction', 'linear', ...
'PolynomialOrder', [], ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true);
classificationSVM = fitcecoc(...
predictors, ...
response, ...
'Learners', template, ...
'Coding', 'onevsone', ...
'ClassNames', [1; 2; 3; 4; 5; 6; 7]);
% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
svmPredictFcn = @(x) predict(classificationSVM, x);
trainedClassifier.predictFcn = @(x) svmPredictFcn(predictorExtractionFcn(x));
% Add additional fields to the result struct
trainedClassifier.RequiredVariables = {'BeginningTime', 'BeginningVoltage', 'EndingVoltage', 'FirstPeakValue', 'MaxVoltage', 'MaxVoltageTime', 'SecondPeakValue', 'Stroke', 'ValleyValue', 'Velocity', 'duration', 'stage1', 'stage2', 'stage3', 'stage4'};
trainedClassifier.ClassificationSVM = classificationSVM;
trainedClassifier.About = 'This struct is a trained model exported from Classification Learner R2022a.';
trainedClassifier.HowToPredict = sprintf('To make predictions on a new table, T, use: \n yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appclassification_exportmodeltoworkspace'')">How to predict using an exported model</a>.');
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'FirstPeakValue', 'ValleyValue', 'SecondPeakValue', 'stage1', 'stage2', 'stage3', 'stage4', 'duration', 'BeginningVoltage', 'MaxVoltage', 'EndingVoltage', 'BeginningTime', 'MaxVoltageTime', 'Stroke', 'Velocity'};
predictors = inputTable(:, predictorNames);
response = inputTable.FaultCode;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];
% Perform cross-validation
partitionedModel = crossval(trainedClassifier.ClassificationSVM, 'KFold', 5);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validationAccuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'ClassifError');