Redirect Figure Links in Markdown Files using MATLAB

Oct. 02, 2023

A few days ago, I began to use a GitHub public repository as image hosting for my personal website1. However, previously I had been using Tencent Cloud COS (Cloud Object Storage) service for a long while, therefore the pictures I’ve ever uploaded are stored there. Today I migrate these pictures to my repository1; definitely, it is easy, but I should redirect those picture links in the all .md files, which is kind of slow and painful if I modified them by hand. So I decide to use a MATLAB script to realize it automatically.

There exist three subfolders in my Tencent Cloud COS space, namely DeLLLaptop, img and imgpersonal. I download and directly put them into the subdirectory migration of my new GitHub repository1:

image-20231002162321331

So, all I need is to modify the former part of the original picture links, like converting links from:

1
https://blogimages-1309804558.cos.ap-nanjing.myqcloud.com/imgpersonal/image-20220707190620188.png

to

1
https://github.com/HelloWorld-1017/blog-images/blob/main/migration/imgpersonal/image-20220707190620188.png?raw=true

N.B.: The ?raw=true after picture file extension .png is seemingly necessary, otherwise the picture will not display normally. And note that the r of raw must be lowercase.

The .md source files of my website are all in the _post folder under the root directory (this is convention while using Jekyll to deploy website), so I make a copy of it and rename as _posts_backup. After that, I write the following code to batch processing links. Basically, the code firstly read the content of original .md file, and then matches specific link started with https://blogimages-1309804558.cos.ap-nanjing.myqcloud.com/ and ended with .png (and .jpg, .svg, .gif); afterwards, the former part of links is replaced with https://github.com/HelloWorld-1017/blog-images/blob/main/migration/ and tag ?raw=true is added at the end of the link. At last, the code writes the new content into the new .md file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
clc,clear,close all

mdFiles = dir("_posts_backup/*.md");

for i = 1:numel(mdFiles)
    fileName = mdFiles(i).name;
    content = fileread(fullfile(pwd,"_posts_backup",fileName));
    content = string(content);
    newContent = helperReplaceContent(content,".jpg");
    newContent = helperReplaceContent(newContent,".png");
    newContent = helperReplaceContent(newContent,".svg");
    newContent = helperReplaceContent(newContent,".gif");

    fileID = fopen(fullfile(pwd,"_posts",fileName),'w');
    fwrite(fileID,newContent);
    fclose(fileID);
    clear newContent
    disp(i)
end

function newContent = helperReplaceContent(content,fmt)
    text_withBoundaries = extractBetween(content,"https://blogimages-1309804558.cos.ap-nanjing.myqcloud.com/",fmt,"Boundaries","inclusive");
    text_withoutBoundaries = extractBetween(content,"https://blogimages-1309804558.cos.ap-nanjing.myqcloud.com/",fmt,"Boundaries","exclusive");
    newText = repmat("https://github.com/HelloWorld-1017/blog-images/blob/main/migration/",numel(text),1)+text_withoutBoundaries+repmat(fmt+"?raw=true",numel(text),1);
    newContent = replace(content,text_withBoundaries,newText);
end

This script seems easy, and actually it is, but I still spent hours on it as I have little experience about dealing with text by regular expression. On the other hand, there exist so many MATLAB built-in function to handle text which are all unfamiliar to me, such as regexp2, pattern3, regexprep4, strfind5, strrep6, extract7, and the functions which I finally adopt extractBetween8 and replace9, which means that there may be various approaches to achieve my goal. All these functions are from MATLAB Language Fundamentals part (rather than Text Analytics Toolbox by the way), and are for operating characters and strings. They are fundamental tools for analyzing text in the fields of like Natural Language Processing and Web crawler.

Anyway, the effect of this MATLAB script is basically in expectation, although it is kind of ugly (I think so, as I believe there exist a better and more concise way to realize it; it’s just I don’t know).


References