Create Word Cloud in MATLAB

Oct. 02, 2022

Example 1: Visualize Text Data Using Word Clouds

本示例来自:Visualize Text Data Using Word Clouds - MathWorks,展示了从.csv文件中读取文本数据并创建词云的过程。

首先读取.csv文件:

1
2
filename = "factoryReports.csv";
tbl = readtable(filename, 'TextType', 'string');

table变量tbl的结构和内容如下所示:

image-20221002093538628

本示例主要是根据各种条件创建tbl.Description列的词云:

1
textData = tbl.Description;

(1)直接创建tbl.Description列的词云:

1
2
3
figure
wordcloud(textData);
title("Factory Reports")

pic1

(2)根据tbl.Category列的标签分别对tbl.Description列创建词云:

1
2
3
4
5
6
7
8
9
10
11
12
figure
labels = tbl.Category;

subplot(1,2,1)
idx = labels == "Leak";
wordcloud(textData(idx), 'Color', 'blue');
title("Leak")

subplot(1,2,2)
idx = labels == "Mechanical Failure";
wordcloud(textData(idx), 'Color', 'magenta');
title("Mechanical Failure")

pic2

在这里,设置了wordcloud函数的Color属性设置出现频次较少的单词的颜色,如果想要设置出现频次较多的单词的颜色,需要设置wordcloud函数的HighlightColor属性看,例如:

1
2
3
4
5
6
7
8
9
10
11
figure
tiledlayout(1,2)

nexttile
wordcloud(textData(idx), 'HighlightColor', 'blue');
title("Cost > $100")
idx = cost > 1000;

nexttile
wordcloud(textData(idx), 'HighlightColor', 'red');
title("Cost > $1,000")

pic3

(3)WordCloudChart属性

wordcloud函数可以返回WordCloudChart对象:

1
2
3
figure
wc = wordcloud(textData);
title("Factory Reports")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
wc = 

  WordCloudChart (Factory Reports) with properties:

           WordData: [1×315 string]
           SizeData: [1×315 double]
    MaxDisplayWords: 100

  Show all properties

                   Box: off
                 Color: [1×3 double]
              FontName: 'Helvetica'
      HandleVisibility: 'on'
        HighlightColor: [1×3 double]
         InnerPosition: [1×4 double]
                Layout: [0×0 matlab.ui.layout.LayoutOptions]
             LayoutNum: 1
       MaxDisplayWords: 100
         OuterPosition: [1×4 double]
                Parent: [1×1 Figure]
              Position: [1×4 double]
    PositionConstraint: 'outerposition'
                 Shape: 'oval'
              SizeData: [1×315 double]
             SizePower: 0.5000
          SizeVariable: ''
           SourceTable: [0×0 table]
                 Title: 'Factory Reports'
         TitleFontName: 'Helvetica'
                 Units: 'normalized'
               Visible: on
              WordData: [1×315 string]
          WordVariable: ''

可以通过.语法设置属性,如:

1
2
3
4
figure
wc = wordcloud(textData);
wc.MaxDisplayWords = 3;
title("Factory Reports")

pic4


Example 2: Create Word Cloud With String Arrays

本示例来自:Create Word Cloud With String Arrays - MathWorks,展示的是从.txt文件提取文本数据并创建词云的过程,包含了一些最基本的文本数据处理步骤。

sonnets.txt文件中保存的是莎士比亚的十四行诗:

image-20221002104149524

首先,使用fileread函数读取sonnets.txt文件:

1
sonnets = fileread('sonnets.txt');
1
2
3
4
>> whos sonnets
  Name         Size                 Bytes  Class    Attributes

  sonnets      1x100266            200532  char

之后,将sonnets转换为string数据类型:

1
sonnets = string(sonnets);
1
2
3
4
>> whos sonnets
  Name         Size             Bytes  Class     Attributes

  sonnets      1x1             200678  string

此时的sonnets变量中包含很多的换行符:

image-20221002104921134

使用splitlines函数将其分行:

1
sonnets = splitlines(sonnets);
1
2
3
>> whos sonnets
  Name            Size             Bytes  Class     Attributes
  sonnets      2625x1             320806  string

image-20221002105229472

之后,将一些标点符号替换为空格:

1
2
p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets, p, " ");

然后,将sonnets分割成string array,其中的每一个元素都是individual words:

1
2
sonnets = join(sonnets);
sonnets = split(sonnets);
1
2
3
4
>> whos sonnets
  Name             Size             Bytes  Class     Attributes

  sonnets      17712x1             976416  string

image-20221002105732324

移除掉字母小于5个的单词:

1
sonnets(strlength(sonnets) < 5) = [];

最后,将sonnets转换为categorical array,并绘制词云:

1
2
3
4
C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")

pic


Reference

[1] Visualize Text Data Using Word Clouds - MathWorks.

[2] Create Word Cloud With String Arrays - MathWorks.

[3] wordcloud - MathWorks.