Convert 4D-Array Image (with RGBA Channels) to Normal 3D-Array Image

Jul. 29, 2024

In blog¹ I take following images to test the image-captioning model ImageCaptioning.pytorch²:

However, when I added the image below, R66.png, to be captioned:

R66

I got an error saying:

Traceback (most recent call last):
  File "G:\...\ImageCaptioning.pytorch-master\eval.py", line 136, in <module>
    loss, split_predictions, lang_stats = eval_utils.eval_split(
                                          ^^^^^^^^^^^^^^^^^^^^^^
  File "G:\...\ImageCaptioning.pytorch-master\eval_utils.py", line 84, in eval_split
    data = loader.get_batch(split)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\...\ImageCaptioning.pytorch-master\dataloaderraw.py", line 115, in get_batch
    img = Variable(preprocess(img))
                   ^^^^^^^^^^^^^^^
  File "G:\...\venv\Lib\site-packages\torchvision\transforms\transforms.py", line 95, in __call__
    img = t(img)
          ^^^^^^
  File "G:\...\venv\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\...\venv\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\...\venv\Lib\site-packages\torchvision\transforms\transforms.py", line 277, in forward
    return F.normalize(tensor, self.mean, self.std, self.inplace)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\...\venv\Lib\site-packages\torchvision\transforms\functional.py", line 350, in normalize
    return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\...\venv\Lib\site-packages\torchvision\transforms\_functional_tensor.py", line 926, in normalize
    return tensor.sub_(mean).div_(std)
           ^^^^^^^^^^^^^^^^^
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

which prompts that, the image is a 4D array, that is besides RGB channels this image file has an additional alpha (A) channel (used to specify opacity)³. So, it cannot be normally processed by certain PyTorch function. We can verify this point by printing array size of each input image:

from PIL import Image
import numpy as np
import os

folder_path = '.'
for file_name in os.listdir(folder_path):  
    image_path = os.path.join(folder_path, file_name)
    
    if file_name.endswith(('.png', '.jpg')):  
        image = Image.open(image_path)
        print("%-19s: %s" % (file_name, np.array(image).shape))

Camus.jpg          : (900, 1321)
horse.jpg          : (1220, 1500, 3)
moon.jpg           : (1800, 2880, 3)
QingFengChuiFu.jpg : (1080, 1920, 3)
R66.png            : (1800, 2880, 4)
sunset.jpg         : (1200, 1920, 3)
Tommy.jpg          : (3412, 5118, 3)
YourName.jpg       : (1200, 1920, 3)
zebra.jpg          : (256, 316, 3)

To avoid above error, we could use convert function⁴ to convert RGBA image to RGB image:

image = Image.open("R66.png").convert('RGB')
print(np.array(image).shape)
image.save("R66_RGB.png")

(1800, 2880, 3)

After conversion, the image-captioning model can caption it as usual:

".\data\images\new_images\R66_RGB.png": a close up of a yellow and orange cat

But, in this caption the only right part is “yellow” 😂 Having said that, anyway, this image becomes a valid input for the model.

By the way, due to that R66.png is the only image with .png extension among those figures, so I guess that PNG file may be always an image with RGBA channels, but above result illustrates that I was wrong.

References