Make A Deep Learning Model Inference based on the Pretrained ResNet-101
Jul. 26, 2024
Stevens et al.’s book, Deep Learning with PyTorch1 (on Subchapter 2.1, A pretrained network that recognizes the subject of an image), provides a simple example showing how to realize model inference based on the pretrained ResNet-101. The complete script shows as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import torch
from torchvision import models
from torchvision.models import ResNet101_Weights
from PIL import Image
from torchvision import transforms
# Obtain pretrained ResNet-101
resnet101 = models.resnet101(weights=ResNet101_Weights.IMAGENET1K_V1)
# Define image preprocess pipeline
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# Load an image and preprocess it
img = Image.open("bobby.jpg") # img.size: (1280, 720)
# img # Show the image inline
# img.show() # Show the image in a new pop-up viewer window
img_t = preprocess(img) # img_t.shape: (torch.Size([3, 224, 224])
batch_t = torch.unsqueeze(img_t, 0)# batch_t.shape: torch.Size([1, 3, 224, 224]))
# NOTE: Put the network in `eval()` mode to do inference
resnet101.eval()
out = resnet101(batch_t) # out.shape: torch.Size([1, 1000])
# Load labels from `.txt` file (1,000 labels)
with open('imagenet_classes.txt') as f:
labels = [line.strip() for line in f.readlines()]
# Determine the index corresponding to the maximum score in the `out` tensor
_, index = torch.max(out,1) # index: tensor([207])
percentage = torch.nn.functional.softmax(out, dim=1)[0]*100 # percentage.shape: torch.Size([1000])
(labels[index[0]], percentage[index[0]].item()) # ('golden retriever', 96.57185363769531)
# Determine the indexes corresponding to the top-5 maximum score in the `out` tensor
_, indices = torch.sort(out, descending=True)
[(labels[idx], percentage[idx].item()) for idx in indices[0][:5]]
# [('golden retriever', 96.57185363769531),
# ('Labrador retriever', 2.6082706451416016),
# ('cocker spaniel, English cocker spaniel, cocker', 0.2699621915817261),
# ('redbone', 0.17958936095237732),
# ('tennis ball', 0.10991999506950378)]
Required files
-
bobby.jpg
: dlwpt-code/data/p1ch2/bobby.jpg at master · deep-learning-with-pytorch/dlwpt-code. -
imagenet_classes.txt
: dlwpt-code/data/p1ch2/imagenet_classes.txt at master · deep-learning-with-pytorch/dlwpt-code.
Notes
- Image preprocess step: we have to preprocess the input images so they are the right size and so that their values (colors) sit roughly in the same numerical range. In order to do that, the
torchvision
module providestransforms
, which allow us to quickly define pipelines of basic preprocessing functions1. torch.unsqueeze
: Returns a new tensor with a dimension of size one inserted at the specified position2.- inference: The process of running a trained model on new data is called inference in deep learning circles1.
labels = [line.strip() for line in f.readlines()]
:torch.max
: Returns a namedtuple(values, indices)
wherevalues
is the maximum value of each row of theinput
tensor in the given dimensiondim
. Andindices
is the index location of each maximum value found (argmax)5.
References
-
Deep Learning with PyTorch, Eli Stevens, Luca Antiga, and Thomas Viehmann, 2020, GitHub repository: deep-learning-with-pytorch, pp. 22-27. ˄ ˄2 ˄3