๐Ÿ–‹๏ธ
noviceforever
  • About me
  • Miscellaneous
    • Introduction
      • ์ปค๋ฆฌ์–ด ์š”์•ฝ
      • ๋ฐ์ดํ„ฐ ๊ณผํ•™์ด๋ž€?
  • Machine Learning
    • Tabular Data
      • XGBoost Algorithm Overview
      • TabNet Overview
      • Imbalanced Learning
        • Introduction
        • Oversampling Basic (SMOTE variants)
        • Undersampling Basic
        • Cost-sensitive Learning
        • RBF(Radial Basis Function)-based Approach
    • Computer Vision (CNN-based)
      • [Hands-on] Fast Training ImageNet on on-demand EC2 GPU instances with Horovod
      • R-CNN(Regions with Convolutional Neuron Networks)
      • Fast R-CNN
      • Faster R-CNN
      • Mask R-CNN
      • YOLO (You Only Look Once)
      • YOLO v2(YOLO 9000) Better, Faster, Stronger
      • YOLO v3
      • SSD (Single Shot Multibox Detector)
      • Data Augmentation Tips
    • Computer Vision (Transformer-based)
      • ViT for Image Classification
      • DeiT (Training Data-efficient Image Transformers & Distillation through Attention)
      • DETR for Object Detection
      • Zero-Shot Text-to-Image Generation (DALL-E) - Paper Review
    • Natural Language Processing
      • QRNN(Quasi-Recurrent Neural Network)
      • Transformer is All You Need
      • BERT(Bi-directional Encoder Representations from Transformers)
      • DistilBERT, a distilled version of BERT
      • [Hands-on] Fine Tuning Naver Movie Review Sentiment Classification with KoBERT using GluonNLP
      • OpenAI GPT-2
      • XLNet: Generalized Autoregressive Pretraining for Language Understanding
    • Recommendation System
      • Recommendation System Overview
      • Learning to Rank
      • T-REC(Towards Accurate Bug Triage for Technical Groups) ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ
    • Reinforcement Learning
      • MAB(Multi-Armed Bandits) Overview
      • MAB Algorithm Benchmarking
      • MAB(Multi-Armed Bandits) Analysis
      • Policy Gradient Overview
    • IoT on AWS
      • MXNet Installation on NVIDIA Jetson Nano
      • Neo-DLR on NVIDIA Jetson Nano
    • Distributed Training
      • Data Parallelism Overview
      • SageMaker's Data Parallelism Library
      • SageMaker's Model Parallelism Library
    • Deployment
      • MobileNet V1/V2/V3 Overview
      • TensorRT Overview
      • Multi Model Server and SageMaker Multi-Model Endpoint Overview
  • AWS AIML
    • Amazon Personalize
      • Amazon Personalize - User Personalization Algorithm Deep Dive
      • Amazon Personalize Updates(~2021.04) ๋ฐ FAQ
Powered by GitBook
On this page
  • 1. Albumentations Library
  • ํŠน์žฅ์ 
  • Benchmarking
  • Code Snippets
  • 2. CutMix
  • Background
  • Algorithm
  • Code Snippets
  • References

Was this helpful?

  1. Machine Learning
  2. Computer Vision (CNN-based)

Data Augmentation Tips

1. Albumentations Library

ํŠน์žฅ์ 

  • ๋‹ค์–‘ํ•œ Augmentation ๊ธฐ๋ฒ• ๋ฐ ๋น ๋ฅธ ์†๋„๋กœ augmentation pipeline์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์˜คํ”ˆ ์†Œ์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

  • Kaggle Master๋“ค์ด ์ž‘์„ฑํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ Kaggle, topcoder, CVPR, MICCAI์˜ ๋งŽ์€ ์ปดํ”ผ์ด์…˜์—์„œ ํ™œ๋ฐœํžˆ ์‚ฌ์šฉ ์ค‘

  • ๋‹ค์–‘ํ•œ Pixel ๋ ˆ๋ฒจ ๋ณ€ํ™˜ ๋ฐ Spatial ๋ ˆ๋ฒจ ๋ณ€ํ™˜์„ ์ง€์›ํ•˜๋ฉฐ, ํ™•๋ฅ ์ ์œผ๋กœ augmentation ์—ฌ๋ถ€๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๊ณ OneOf block์œผ๋กœ ์„ ํƒ์ ์œผ๋กœ augmentation ๋ฐฉ๋ฒ•์„ ์„ ํƒ ๊ฐ€๋Šฅ

Benchmarking

  • Intel Xeon Platinum 8168 CPU์—์„œ ImageNet ๊ฒ€์ฆ์…‹์˜ 2,000๊ฐœ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋ฒค์น˜๋งˆํ‚น ์ˆ˜ํ–‰

  • ์•„๋ž˜ ํ‘œ๋Š” ๋‹จ์ผ ์ฝ”์–ด์—์„œ ์ฒ˜๋ฆฌํ•˜๋Š” ์ดˆ๋‹น ์ด๋ฏธ์ง€ ์ˆ˜๋กœ albumentation์ด ๋Œ€๋ถ€๋ถ„์˜ transform์—์„œ 2๋ฐฐ ์ด์ƒ ๋น ๋ฆ„

Code Snippets

How to use

  • PyTorch์˜ torchvision๊ณผ ๋งค์šฐ ์œ ์‚ฌ (5~10๋ถ„์ด๋ฉด ์ตํž ์ˆ˜ ์žˆ์Œ)

  • Documentation: https://albumentations.readthedocs.io/en/latest/

  • Colab์—์„œ ์‰ฝ๊ฒŒ ํ…Œ์ŠคํŠธ ๊ฐ€๋Šฅ: https://colab.research.google.com/drive/1JuZ23u0C0gx93kV0oJ8Mq0B6CBYhPLXy#scrollTo=GwFN-In3iagp&forceEdit=true&offline=true&sandboxMode=true

torchvision_transform = transforms.Compose([
    transforms.Resize((256, 256)), 
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
])

# Same transform with torchvision_transform
albumentation_transform = albumentations.Compose([
    albumentations.Resize(256, 256), 
    albumentations.RandomCrop(224, 224),
    albumentations.HorizontalFlip(), # Same with transforms.RandomHorizontalFlip()
    albumentations.pytorch.transforms.ToTensor()
])

img = img[:,:,np.newaxis]
img = np.repeat(img, 3, axis=2)
torchvision_img = torchvision_transform(img)
albumentation_img = albumentation_transform(image=img)['image']

Probability Calculation

from albumentations import (
   RandomRotate90, IAAAdditiveGaussianNoise, GaussNoise, Compose, OneOf
)
import numpy as np

def aug(p1, p2, p3):
   return Compose([
       RandomRotate90(p=p2),
       OneOf([
           IAAAdditiveGaussianNoise(p=0.9),
           GaussNoise(p=0.6),
       ], p=p3)
   ], p=p1)

image = np.ones((300, 300, 3), dtype=np.uint8)
mask = np.ones((300, 300), dtype=np.uint8)
whatever_data = "my name"
augmentation = aug(p1=0.9, p2=0.7, p3=0.3)
data = {"image": image, "mask": mask, "whatever_data": whatever_data, "additional": "hello"}
augmented = augmentation(**data)
image, mask, whatever_data, additional = augmented["image"], augmented["mask"], augmented["whatever_data"], augmented["additional"]
  • p1p_1p1โ€‹: augmentation ์ ์šฉ ์—ฌ๋ถ€ ํŒ๋‹จ (1์ผ ๊ฒฝ์šฐ์—๋Š” ํ•ญ์ƒ augmentation ์ ์šฉ)

  • p2p_2p2โ€‹: 90๋„ ํšŒ์ „ ์—ฌ๋ถ€ ๊ฒฐ์ •

  • p3p_3p3โ€‹: OneOf ๋ธ”๋ก ์ ์šฉ ์—ฌ๋ถ€ ๊ฒฐ์ • (๋ธ”๋ก ๋‚ด ๋ชจ๋“  ํ™•๋ฅ ์„ 1๋กœ ์ •๊ทœํ™”ํ•œ ๋‹ค์Œ, ์ •๊ทœํ™”ํ•œ ํ™•๋ฅ ์— ๋”ฐ๋ผ augmentation ์„ ํƒ)

    • ์˜ˆ: IAAAdditiveGaussianNoise์˜ ํ™•๋ฅ ์ด 0.9์ด๊ณ  GaussNoise์˜ ํ™•๋ฅ ์ด 0.6์ด๋ฉด ์ •๊ทœํ™” ํ›„์—๋Š” ๊ฐ๊ฐ 0.6๊ณผ 0.4๋กœ ๋ณ€๊ฒฝ๋จ

      • 0.6=(0.9/(0.9+0.6)),0.4=(0.9/(0.9+0.6))0.6 = (0.9 / (0.9 + 0.6)), 0.4 = (0.9 / (0.9 + 0.6))0.6=(0.9/(0.9+0.6)),0.4=(0.9/(0.9+0.6))

  • ๊ฐ augmentation์€ ์•„๋ž˜์™€ ๊ฐ™์€ ํ™•๋ฅ ์ด ์ ์šฉ๋จ

    • RandomRotate90: p1โˆ—p2p_1 * p_2p1โ€‹โˆ—p2โ€‹

    • IAAAdditiveGaussianNoise: p1โˆ—p3โˆ—0.6p_1 * p_3 * 0.6p1โ€‹โˆ—p3โ€‹โˆ—0.6

    • GaussianNoise: p1โˆ—p3โˆ—0.4p_1 * p_3 * 0.4p1โ€‹โˆ—p3โ€‹โˆ—0.4

2. CutMix

Background

  • Cutout: ํ›ˆ๋ จ ์ด๋ฏธ์ง€์—์„œ ์ผ๋ถ€ ์˜์—ญ์„ ๊ฒ€์€ ์ƒ‰ ํ”ฝ์…€ ๋ฐ ๋žœ๋ค ๋…ธ์ด์ฆˆ ํŒจ์น˜๋กœ ์˜ค๋ฒ„๋ ˆ์ดํ•˜์—ฌ ์ œ๊ฑฐ โ†’ ์ผ๋ถ€ ๊ฒฝ์šฐ์— ์ž˜ ๋™์ž‘ํ•˜์ง€๋งŒ ์ž‘์€ object๋‚˜ ์ค‘์š”ํ•œ ์˜์—ญ์— ๋Œ€ํ•œ ์ •๋ณด ์†์‹ค์ด ๋งŽ์ด ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ

  • Mixup: ์ด๋ฏธ์ง€์™€ ์ •๋‹ต ๋ ˆ์ด๋ธ”์˜ ์„ ํ˜• ๋ณด๊ฐ„(linear interpolation)์„ ํ†ตํ•ด ๋‘ ๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ํ˜ผํ•ฉ โ†’ ์ง€๋‚˜์นœ smoothing ํšจ๊ณผ๋กœ object detection์—์„œ ๊ทธ๋ฆฌ ์ข‹์ง€ ์•Š์Œ

  • CutMix: ๋‘ ์ด๋ฏธ์ง€์˜ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์‚ด๋ ค ๋ณด์ž๋Š” ์ทจ์ง€

Algorithm

  • ๋จผ์ €, ํ›ˆ๋ จ ์ด๋ฏธ์ง€, ํด๋ž˜์Šค, ๊ฐ ํด๋ž˜์Šค์— ํ•ด๋‹นํ•˜๋Š” ํ›ˆ๋ จ ์ƒ˜ํ”Œ์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•˜๋ฉด,

(x,y):Trainingย image,ย label(A,B):Trainingย class(xA,yA),(xB,yB):Trainingย sample(x, y): \text{Training image, label} \\ (A, B): \text{Training class} \\ (x_A, y_A), (x_B, y_B): \text{Training sample}(x,y):Trainingย image,ย label(A,B):Trainingย class(xAโ€‹,yAโ€‹),(xBโ€‹,yBโ€‹):Trainingย sample
  • ๋‘ ๊ฐœ์˜ ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€์ธ (xA,yA),(xB,yB)(x_A, y_A), (x_B, y_B)(xAโ€‹,yAโ€‹),(xBโ€‹,yBโ€‹)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ (x~,y~)(\tilde{x}, \tilde{y})(x~,y~โ€‹)๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

    (โŠ™\odotโŠ™: element-wise multiplication)

x~=MโŠ™xA+(1โˆ’M)โŠ™xBy~=ฮปyA+(1โˆ’ฮป)yB\tilde{x} = \mathrm{M} \odot x_A + (1 - \mathrm{M}) \odot x_B \\ \tilde{y} = \lambda{y_A} + (1 - \lambda)y_Bx~=MโŠ™xAโ€‹+(1โˆ’M)โŠ™xBโ€‹y~โ€‹=ฮปyAโ€‹+(1โˆ’ฮป)yBโ€‹
  • MMM: 0์ด๋‚˜ 1๋กœ ํ‘œํ˜„๋˜๋Š” Wโˆ—HW*HWโˆ—H ์ฐจ์›์˜ binary mask ์˜์—ญ์œผ๋กœ ์–ด๋А ๋ถ€๋ถ„์„ mixํ•  ๊ฒƒ์ธ์ง€ ๊ฒฐ์ •

    • MMM์˜ ์˜์—ญ์€ ฮป\lambdaฮปํŒŒ๋ผ๋ฉ”ํ„ฐ์— ์˜ํ•ด ๊ฒฐ์ •๋˜๋ฉฐ, ๋‘ ์ด๋ฏธ์ง€์—์„œ ์ž˜๋ผ๋‚ผ ์˜์—ญ์„ ์•Œ๋ ค ์ฃผ๋Š” bounding box ์ขŒํ‘œ B๋ฅผ ๊ฐ€์ ธ์™€์„œ ์ƒ˜ํ”Œ๋ง

    • xA,xBx_A, x_BxAโ€‹,xBโ€‹๋ผ๋Š” ์ด๋ฏธ์ง€๊ฐ€ ์žˆ์„ ๋•Œ, xAx_AxAโ€‹๋‚ด์˜ ํŠน์ • bounding box ์˜์—ญ์„ xBx_BxBโ€‹ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ๊ฐ€์ ธ์™€์„œ ๋ถ™์ž„

    • ์ฆ‰, xAx_AxAโ€‹์˜ bounding box ์˜์—ญ B๊ฐ€ ์ œ๊ฑฐ(crop)๋˜๊ณ  ๊ทธ ์˜์—ญ์€ xBx_BxBโ€‹ ์˜ bounding box B์—์„œ ์ž˜๋ฆฐ ํŒจ์น˜๋กœ ๋Œ€์ฒด๋จ(paste).

  • ฮป\lambdaฮป: mixing ratio๋กœ ๋ฒ ํƒ€ ๋ถ„ํฌ์— ์˜ํ•ด ๊ฒฐ์ •

  • ์ด๋ฅผ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

B:Boundingย boxย coordinatesย (rx,ry,rw,rh)rxโˆผUnifย (0,W),rw=W1โˆ’ฮป,ryโˆผUnifย (0,H),rh=H1โˆ’ฮป\mathrm{B}: \text{Bounding box coordinates } (r_x, r_y, r_w, r_h) \\ r_x \sim \text{Unif }(0, W), r_w = W\sqrt{1-\lambda}, \\ r_y \sim \text{Unif } (0, H), r_h = H\sqrt{1-\lambda}B:Boundingย boxย coordinatesย (rxโ€‹,ryโ€‹,rwโ€‹,rhโ€‹)rxโ€‹โˆผUnifย (0,W),rwโ€‹=W1โˆ’ฮปโ€‹,ryโ€‹โˆผUnifย (0,H),rhโ€‹=H1โˆ’ฮปโ€‹
  • rx,ryr_x, r_yrxโ€‹,ryโ€‹๋Š” bounding box ์ค‘์‹ฌ ์ขŒํ‘œ์ด๋ฉฐ, uniform distribution์— ์˜ํ•ด ๊ฒฐ์ •๋จ

  • rw,rhr_w, r_hrwโ€‹,rhโ€‹๋Š” bounding box์˜ ๋„ˆ๋น„ ๋ฐ ๋†’์ด๋กœ, ์ด ์ˆ˜์‹์„ ํ†ตํ•ด cropped area ratio 1โˆ’ฮป1-\lambda1โˆ’ฮป๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

  • rwrhWH=1โˆ’ฮป\dfrac{r_w r_h}{WH} = 1 - \lambdaWHrwโ€‹rhโ€‹โ€‹=1โˆ’ฮป

Code Snippets

Bounding box ์ขŒํ‘œ ์ƒ์„ฑ

def rand_bbox(size, lam):
    '''
    CutMix Helper function.
    Retrieved from https://github.com/clovaai/CutMix-PyTorch/blob/master/train.py
    '''
    W = size[2]
    H = size[3]
    # ํญ๊ณผ ๋†’์ด๋Š” ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€์˜ ํญ๊ณผ ๋†’์ด์˜ beta distribution์—์„œ ๋ฝ‘์€ lambda๋กœ ์–ป๋Š”๋‹ค
    cut_rat = np.sqrt(1. - lam)

    # patch size ์˜ w, h ๋Š” original image ์˜ w,h ์— np.sqrt(1-lambda) ๋ฅผ ๊ณฑํ•ด์ค€ ๊ฐ’์ž…๋‹ˆ๋‹ค.
    cut_w = np.int(W * cut_rat)
    cut_h = np.int(H * cut_rat)

    # patch์˜ ์ค‘์‹ฌ์ ์€ uniformํ•˜๊ฒŒ ๋ฝ‘ํž˜
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    return bbx1, bby1, bbx2, bby2

์‹ค์ œ ํ˜ธ์ถœ ์˜ˆ์‹œ (Kaggle Bangali.ai Handwritten recognition)

bbx1, bby1, bbx2, bby2 = rand_bbox(inputs.size(), lam)
inputs[:, :, bbx1:bbx2, bby1:bby2] = inputs[rand_index, :, bbx1:bbx2, bby1:bby2]
# ํ”ฝ์…€ ๋น„์œจ๊ณผ ์ •ํ™•ํžˆ ์ผ์น˜ํ•˜๋„๋ก lambda ํŒŒ๋ผ๋ฉ”ํ„ฐ ์กฐ์ •  
lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (inputs.size()[-1] * inputs.size()[-2]))

logits = model(inputs)
grapheme = logits[:,:168]
vowel = logits[:, 168:179]
cons = logits[:, 179:]

loss1 = loss_fn(grapheme, targets_gra) * lam + loss_fn(grapheme, shuffled_targets_gra) * (1. - lam)

References

  • Paper

    • Albumentations: https://arxiv.org/pdf/1809.06839.pdf

    • CutMix: https://arxiv.org/pdf/1905.04899.pdf

  • Official

    • Albumentations: https://github.com/albumentations-team/albumentations

    • CutMix: https://github.com/clovaai/CutMix-PyTorch

  • Blog

    • https://towardsdatascience.com/cutmix-a-new-strategy-for-data-augmentation-bbc1c3d29aab

  • Video Clip (๊ฐ•์ถ”)

    • Albumentations: https://www.youtube.com/watch?v=n_f6d4bPFME

    • CutMix: https://www.youtube.com/watch?v=Haj-SRL72LY

PreviousSSD (Single Shot Multibox Detector)NextComputer Vision (Transformer-based)

Last updated 4 years ago

Was this helpful?