Create Word Cloud in MATLAB
Example 1: Visualize Text Data Using Word Clouds
本示例来自:Visualize Text Data Using Word Clouds - MathWorks,展示了从.csv文件中读取文本数据并创建词云的过程。
首先读取.csv文件:
1
2
filename = "factoryReports.csv";
tbl = readtable(filename, 'TextType', 'string');
table变量tbl的结构和内容如下所示:

本示例主要是根据各种条件创建tbl.Description列的词云:
1
textData = tbl.Description;
(1)直接创建tbl.Description列的词云:
1
2
3
figure
wordcloud(textData);
title("Factory Reports")

(2)根据tbl.Category列的标签分别对tbl.Description列创建词云:
1
2
3
4
5
6
7
8
9
10
11
12
figure
labels = tbl.Category;
subplot(1,2,1)
idx = labels == "Leak";
wordcloud(textData(idx), 'Color', 'blue');
title("Leak")
subplot(1,2,2)
idx = labels == "Mechanical Failure";
wordcloud(textData(idx), 'Color', 'magenta');
title("Mechanical Failure")

在这里,设置了wordcloud函数的Color属性设置出现频次较少的单词的颜色,如果想要设置出现频次较多的单词的颜色,需要设置wordcloud函数的HighlightColor属性看,例如:
1
2
3
4
5
6
7
8
9
10
11
figure
tiledlayout(1,2)
nexttile
wordcloud(textData(idx), 'HighlightColor', 'blue');
title("Cost > $100")
idx = cost > 1000;
nexttile
wordcloud(textData(idx), 'HighlightColor', 'red');
title("Cost > $1,000")

(3)WordCloudChart属性
wordcloud函数可以返回WordCloudChart对象:
1
2
3
figure
wc = wordcloud(textData);
title("Factory Reports")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
wc =
WordCloudChart (Factory Reports) with properties:
WordData: [1×315 string]
SizeData: [1×315 double]
MaxDisplayWords: 100
Show all properties
Box: off
Color: [1×3 double]
FontName: 'Helvetica'
HandleVisibility: 'on'
HighlightColor: [1×3 double]
InnerPosition: [1×4 double]
Layout: [0×0 matlab.ui.layout.LayoutOptions]
LayoutNum: 1
MaxDisplayWords: 100
OuterPosition: [1×4 double]
Parent: [1×1 Figure]
Position: [1×4 double]
PositionConstraint: 'outerposition'
Shape: 'oval'
SizeData: [1×315 double]
SizePower: 0.5000
SizeVariable: ''
SourceTable: [0×0 table]
Title: 'Factory Reports'
TitleFontName: 'Helvetica'
Units: 'normalized'
Visible: on
WordData: [1×315 string]
WordVariable: ''
可以通过.语法设置属性,如:
1
2
3
4
figure
wc = wordcloud(textData);
wc.MaxDisplayWords = 3;
title("Factory Reports")

Example 2: Create Word Cloud With String Arrays
本示例来自:Create Word Cloud With String Arrays - MathWorks,展示的是从.txt文件提取文本数据并创建词云的过程,包含了一些最基本的文本数据处理步骤。
sonnets.txt文件中保存的是莎士比亚的十四行诗:

首先,使用fileread函数读取sonnets.txt文件:
1
sonnets = fileread('sonnets.txt');
1
2
3
4
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 1x100266 200532 char
之后,将sonnets转换为string数据类型:
1
sonnets = string(sonnets);
1
2
3
4
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 1x1 200678 string
此时的sonnets变量中包含很多的换行符:

使用splitlines函数将其分行:
1
sonnets = splitlines(sonnets);
1
2
3
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 2625x1 320806 string

之后,将一些标点符号替换为空格:
1
2
p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets, p, " ");
然后,将sonnets分割成string array,其中的每一个元素都是individual words:
1
2
sonnets = join(sonnets);
sonnets = split(sonnets);
1
2
3
4
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 17712x1 976416 string

移除掉字母小于5个的单词:
1
sonnets(strlength(sonnets) < 5) = [];
最后,将sonnets转换为categorical array,并绘制词云:
1
2
3
4
C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")

Reference
[1] Visualize Text Data Using Word Clouds - MathWorks.