Create Word Cloud in MATLAB
Example 1: Visualize Text Data Using Word Clouds
本示例来自:Visualize Text Data Using Word Clouds - MathWorks,展示了从.csv文件中读取文本数据并创建词云的过程。
首先读取.csv文件:
1
2
filename = "factoryReports.csv";
tbl = readtable(filename, 'TextType', 'string');
table变量tbl
的结构和内容如下所示:
本示例主要是根据各种条件创建tbl.Description
列的词云:
1
textData = tbl.Description;
(1)直接创建tbl.Description
列的词云:
1
2
3
figure
wordcloud(textData);
title("Factory Reports")
(2)根据tbl.Category
列的标签分别对tbl.Description
列创建词云:
1
2
3
4
5
6
7
8
9
10
11
12
figure
labels = tbl.Category;
subplot(1,2,1)
idx = labels == "Leak";
wordcloud(textData(idx), 'Color', 'blue');
title("Leak")
subplot(1,2,2)
idx = labels == "Mechanical Failure";
wordcloud(textData(idx), 'Color', 'magenta');
title("Mechanical Failure")
在这里,设置了wordcloud
函数的Color
属性设置出现频次较少的单词的颜色,如果想要设置出现频次较多的单词的颜色,需要设置wordcloud
函数的HighlightColor
属性看,例如:
1
2
3
4
5
6
7
8
9
10
11
figure
tiledlayout(1,2)
nexttile
wordcloud(textData(idx), 'HighlightColor', 'blue');
title("Cost > $100")
idx = cost > 1000;
nexttile
wordcloud(textData(idx), 'HighlightColor', 'red');
title("Cost > $1,000")
(3)WordCloudChart
属性
wordcloud
函数可以返回WordCloudChart
对象:
1
2
3
figure
wc = wordcloud(textData);
title("Factory Reports")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
wc =
WordCloudChart (Factory Reports) with properties:
WordData: [1×315 string]
SizeData: [1×315 double]
MaxDisplayWords: 100
Show all properties
Box: off
Color: [1×3 double]
FontName: 'Helvetica'
HandleVisibility: 'on'
HighlightColor: [1×3 double]
InnerPosition: [1×4 double]
Layout: [0×0 matlab.ui.layout.LayoutOptions]
LayoutNum: 1
MaxDisplayWords: 100
OuterPosition: [1×4 double]
Parent: [1×1 Figure]
Position: [1×4 double]
PositionConstraint: 'outerposition'
Shape: 'oval'
SizeData: [1×315 double]
SizePower: 0.5000
SizeVariable: ''
SourceTable: [0×0 table]
Title: 'Factory Reports'
TitleFontName: 'Helvetica'
Units: 'normalized'
Visible: on
WordData: [1×315 string]
WordVariable: ''
可以通过.
语法设置属性,如:
1
2
3
4
figure
wc = wordcloud(textData);
wc.MaxDisplayWords = 3;
title("Factory Reports")
Example 2: Create Word Cloud With String Arrays
本示例来自:Create Word Cloud With String Arrays - MathWorks,展示的是从.txt文件提取文本数据并创建词云的过程,包含了一些最基本的文本数据处理步骤。
sonnets.txt
文件中保存的是莎士比亚的十四行诗:
首先,使用fileread
函数读取sonnets.txt
文件:
1
sonnets = fileread('sonnets.txt');
1
2
3
4
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 1x100266 200532 char
之后,将sonnets
转换为string
数据类型:
1
sonnets = string(sonnets);
1
2
3
4
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 1x1 200678 string
此时的sonnets
变量中包含很多的换行符:
使用splitlines
函数将其分行:
1
sonnets = splitlines(sonnets);
1
2
3
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 2625x1 320806 string
之后,将一些标点符号替换为空格:
1
2
p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets, p, " ");
然后,将sonnets
分割成string array,其中的每一个元素都是individual words:
1
2
sonnets = join(sonnets);
sonnets = split(sonnets);
1
2
3
4
>> whos sonnets
Name Size Bytes Class Attributes
sonnets 17712x1 976416 string
移除掉字母小于5个的单词:
1
sonnets(strlength(sonnets) < 5) = [];
最后,将sonnets
转换为categorical array,并绘制词云:
1
2
3
4
C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")
Reference
[1] Visualize Text Data Using Word Clouds - MathWorks.