Introduction

MNIST dataset ํด๋ž˜์Šค ๋ถ„ํฌ๋Š” ์™„๋ฒฝํ•˜๊ฒŒ ๋™์ผํ•œ ๋น„์œจ๋กœ ๋งž์ถฐ์ ธ ์žˆ์ง€๋งŒ, ์‹ค์ œ ๋ฐ์ดํ„ฐ๋Š” ํŠน์ • ํด๋ž˜์Šค์˜ ๋ถ„ํฌ๊ฐ€ ๋งค์šฐ ์ ์€ ๊ฒฝ์šฐ๋“ค์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ Predictive Analytics์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ํ™œ์šฉ๋˜๋Š” Tabular ๋ฐ์ดํ„ฐ์—์„œ ๋งŽ์ด ์ฐพ์•„๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ณ ๊ฐ ์ดํƒˆ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ churn prediction ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ์˜ˆ์‹œ๋กœ ๋“ค์–ด๋„ ์‹ค์ œ ์ดํƒˆํ•œ ๊ณ ๊ฐ์˜ ๋น„์œจ์€ ์ดํƒˆํ•˜์ง€ ์•Š์€ ๊ณ ๊ฐ ๋Œ€๋น„ ๋งค์šฐ ์ ์Šต๋‹ˆ๋‹ค. (1:10~1:100)

์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ํ›ˆ๋ จ ์‹œ์—๋Š” ๋‹ค์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ถ„ํฌ๋ฅผ ์œ„์ฃผ๋กœ ๊ณ ๋ คํ•˜๊ธฐ์— ๋‹ค์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋˜๋ฉฐ, ์†Œ์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ๋Š” ์ž˜ ๋ถ„๋ฅ˜ํ•˜์ง€ ๋ชปํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„์ง‘๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ๋“ค์„ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์œผ๋กœ ์ ๊ฒŒ ์ถ”์ถœํ•˜๋Š” undersampling ๊ธฐ๋ฒ•์ด๋‚˜ ์†Œ์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ๋“ค์˜ ํŒจํ„ด์„ ํŒŒ์•…ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” oversampling ๊ธฐ๋ฒ•๋“ค์„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ ์™ธ์— ์†Œ์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ๋“ค์— ๋” ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” weighting ๊ธฐ๋ฒ•์ด๋‚˜, ์†Œ์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜๋ชป ๋ถ„๋ฅ˜ ์‹œ penalty๋ฅผ ํฌ๊ฒŒ ๋ถ€์—ฌํ•˜๋Š” cost-sensitive learning ๊ธฐ๋ฒ•, ๋‹ค์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋ฅผ ์†Œ์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ ๋‚ด์—์„œ ๋ณต์› ์ถ”์ถœ ํ›„ ์•™์ƒ๋ธ”ํ•˜๋Š” ensemble sampling ๊ธฐ๋ฒ•๋„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ถˆ๊ท ํ˜• ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” metric๋“ค์„ ๊ฐ„๋‹จํžˆ ์‚ดํŽด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋งค์šฐ ๊ธฐ๋ณธ์ ์ธ ๋‚ด์šฉ์ด๋ฏ€๋กœ, ์ด๋ฏธ ๋‚ด์šฉ์„ ์•Œ๊ณ  ์žˆ์œผ๋ฉด ์Šคํ‚ตํ•ด๋„ ๋ฌด๋ฐฉํ•ฉ๋‹ˆ๋‹ค.

Metrics

ROC Curve

Receiver Operating Characteristic(์ˆ˜์‹ ์ž ์กฐ์ž‘ ํŠน์„ฑ)์ด๋ผ๋Š” ์ด์ƒํ•œ ์šฉ์–ด ๋•Œ๋ฌธ์— ํ—ท๊ฐˆ๋ฆด ๊ฒƒ ๊ฐ™์•„ ์ž ๊น ์šฉ์–ด์˜ ์œ ๋ž˜๋ฅผ ์–ธ๊ธ‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ์šฉ์–ด๋Š” 2์ฐจ ์„ธ๊ณ„ ๋Œ€์ „ ๋•Œ "Chain Home" ๋ ˆ์ด๋” ์‹œ์Šคํ…œ์˜ ์ผ๋ถ€๋กœ ์˜๊ตญ์—์„œ ์ฒ˜์Œ ์‚ฌ์šฉ๋œ ๊ฐœ๋…์œผ๋กœ ๋ ˆ์ด๋”๋กœ ์ ๊ตฐ ์ „ํˆฌ๊ธฐ์™€ ์‹ ํ˜ธ ์žก์Œ(์˜ˆ: ์ƒˆ) ํŒ๋ณ„ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ ˆ์ด๋” ๋ฒ”์œ„์— ์ ๊ตฐ ์ „ํˆฌ๊ธฐ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ƒˆ๋„ ๋“ค์–ด์˜ค๋Š” ๊ฒฝ์šฐ๋“ค์ด ์ข…์ข… ์žˆ๋Š”๋ฐ, ์ด ๋•Œ ๋ ˆ์ด๋” ์ •์ฐฐ๋ณ‘์ด ๊ฒฝ๋ณด๋ฅผ ๋ชจ๋‘ ์ „ํˆฌ๊ธฐ๋กœ ํŒ๋‹จํ•˜๋ฉด ์˜ค๋ณด์ผ ํ™•๋ฅ ์ด ์˜ฌ๋ผ๊ฐ€๊ณ  ๊ฒฝ๋ณด๋ฅผ ๋Œ€์ˆ˜๋กญ์ง€ ์•Š๊ฒŒ ์ƒ๊ฐํ•ด์„œ ๋ฌด์‹œํ•˜๋ฉด ์ •์ž‘ ์ค‘์š”ํ•œ ๋•Œ๋ฅผ ๋†“์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•œ trade-off๋ฅผ 2์ฐจ์› ์ขŒํ‘œ(y์ถ•์€ TPR; True Positive Ratio, x์ถ•์€ FPR; False Positive Ratio)๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด ROC ๊ณก์„ ์ž…๋‹ˆ๋‹ค. ํŒ๋ณ„ ๊ธฐ์ค€์ด ์ •์ฐฐ๋ณ‘๋งˆ๋‹ค ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ ์ •์ฐฐ๋ณ‘์˜ ํŒ๋ณ„ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ฌ๋ž์ง€๋งŒ, ์ •์ฐฐ๋ณ‘๋“ค์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ข…ํ•ฉํ•˜๋‹ˆ ๊ณก์„  ํ˜•ํƒœ๊ฐ€ ํฌ๊ฒŒ ๋ฐ”๋€Œ์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๊ณ  ์ด๋Š” ์•ˆ์ •์ ์œผ๋กœ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํŒ๋ณ„ํ•˜๋Š” ์ง€ํ‘œ ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

PR(Precision-Recall) Curve

์ „๋ฐ˜์ ์ธ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํŒ๋ณ„ํ•˜๋Š” ์ง€ํ‘œ๋กœ ROC ๊ณก์„ ์ด ํ˜„์žฌ๋„ ๋„๋ฆฌ ์“ฐ์ด์ง€๋งŒ, ๋ถˆ๊ท ํ˜•๋„๊ฐ€ ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ์…‹์ด๋‚˜ ํŠน์ • ํ…Œ์ŠคํŠธ ์…‹์—์„œ์˜ ๊ฒฐ๊ณผ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค๋ฉด PR ๊ณก์„ ๋„ ๊ฐ™์ด ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. The Relationship Between Precision-Recall and ROC Curve ๋…ผ๋ฌธarrow-up-right์—์„œ PR ๊ณก์„ ์˜ ํ•„์š”์„ฑ์— ๋Œ€ํ•œ ์ด์œ ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ธฐ์ˆ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Consequently, a large change in the number of false positives can lead to a small change in the false positive rate used in ROC analysis. Precision, on the other hand, by comparing false positives to true positives rather than true negatives, captures the effect of the large number of negative examples on the algorithmโ€™s performance.

์ฆ‰, TN์ด ๋งŽ๋‹ค๋ฉด(๋‹ค์ˆ˜ ๋ฒ”์ฃผ์— ์†ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ๋‹ค๋ฉด), FP์˜ ๋ณ€ํ™”๋Ÿ‰์— ๋น„ํ•ด FPR์˜ ๋ณ€ํ™”๋Ÿ‰์ด ๋ฏธ๋ฏธํ•ฉ๋‹ˆ๋‹ค.

ROC ๊ณก์„ ์€ TN(True Negative)์— ์†ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ๋‹ค๋ฉด (์ฆ‰, ๋‹ค์ˆ˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋ฐ์ดํ„ฐ๊ฒ ์ฃ ), FP(False Positive)์˜ ๋ณ€ํ™”๋Ÿ‰์— ๋น„ํ•ด FPR์˜ ๋ณ€ํ™”๋Ÿ‰์ด ๋ฏธ๋ฏธํ•ฉ๋‹ˆ๋‹ค

๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ๋กœ 1๋ฐฑ๋งŒ ๋ช…์˜ ์ •์ƒ์ธ๊ณผ 100๋ช…์˜ ์•”ํ™˜์ž๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์•”ํ™˜์ž๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋‘ ๊ฐœ์˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ–ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

  • 1๋ฒˆ ๋ชจ๋ธ: 100๋ช…์˜ ์•”ํ™˜์ž๋กœ ๊ฒ€์ถœํ–ˆ๋Š”๋ฐ ์‹ค์ œ๋กœ ์•”ํ™˜์ž๊ฐ€ 90๋ช…์ธ ๊ฒฝ์šฐ

  • 2๋ฒˆ ๋ชจ๋ธ: 2,000๋ช…์„ ์•”ํ™˜์ž๋กœ ๊ฒ€์ถœํ–ˆ๋Š”๋ฐ ์‹ค์ œ๋กœ ์•”ํ™˜์ž๊ฐ€ 90๋ช…์ธ ๊ฒฝ์šฐ

๋”ฐ๋กœ ๊ณ„์‚ฐํ•˜์ง€ ์•Š์•„๋„ ๋‹น์—ฐํžˆ 1๋ฒˆ ๋ชจ๋ธ์ด ๋” ์ข‹์€ ๋ชจ๋ธ์ด๊ฒ ์ฃ ? ๊ทธ๋Ÿผ ROC์™€ PR ๊ธฐ์ค€์œผ๋กœ ์‹ค์ œ๋กœ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ROC ๊ธฐ์ค€์œผ๋กœ ํ‰๊ฐ€ ์‹œ,

    • 1๋ฒˆ ๋ชจ๋ธ: TPR=0.9,โ€…โ€ŠFPR=(100โˆ’90)/1000000=0.00001\text{TPR} = 0.9, \; \text{FPR} = (100 - 90) / 1000000 = 0.00001

    • 2๋ฒˆ ๋ชจ๋ธ: TPR=0.9,โ€…โ€ŠFPR=(2000โˆ’90)/1000000โ‰ˆ0.00191\text{TPR} = 0.9, \;\text{FPR} = (2000 - 90) / 1000000 \approx 0.00191

    • ๋‘ ๋ชจ๋ธ์˜ FPR ์ฐจ์ด๋Š” 0.00191โˆ’0.00001=0.00190.00191 - 0.00001 = 0.0019์ž…๋‹ˆ๋‹ค.

  • PR ๊ธฐ์ค€์œผ๋กœ ํ‰๊ฐ€ ์‹œ,

    • 1๋ฒˆ ๋ชจ๋ธ: Recall=0.9,โ€…โ€ŠPrecision=90/100=0.9\text{Recall} = 0.9, \; \text{Precision} = 90/100 = 0.9

    • 2๋ฒˆ ๋ชจ๋ธ: Recall=0.9,โ€…โ€ŠPrecision=90/100=0.9\text{Recall} = 0.9, \; \text{Precision} = 90/100 = 0.9

    • ๋‘ ๋ชจ๋ธ์˜ Precision ์ฐจ์ด๋Š” 0.9โˆ’0.0045=0.8550.9 - 0.0045 = 0.855์ž…๋‹ˆ๋‹ค.

  • ๋ถˆ๊ท ํ˜• ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋‘ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ์ฐจ์ด๋ฅผ ๋ช…ํ™•ํžˆ ํŒŒ์•…ํ•˜๋ ค๋ฉด, PR ์ปค๋ธŒ๋„ ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

AUROC (Area Under a ROC Curve, aka ROC AUC, AUC)

ROC ๊ณก์„  ์•„๋ž˜ ์˜์—ญ, ์ฆ‰ TPR๊ณผ FPR์— ๋Œ€ํ•œ ๋ฉด์ ์„ ์˜๋ฏธํ•˜๋ฉฐ, ์ด ๊ฐ’์˜ ๋ฒ”์œ„๋Š” 0~1์ž…๋‹ˆ๋‹ค. ์ž„๊ณ„๊ฐ’(threshold)๊ณผ ์ƒ๊ด€ ์—†์ด ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ์ •๋Ÿ‰์ ์œผ๋กœ ์•Œ ์ˆ˜ ์žˆ๊ธฐ์— ๋ถ„๋ฅ˜ ๋ฌธ์ œ์˜ metric์œผ๋กœ ๋„๋ฆฌ ์“ฐ์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

AUPRC (Area Under a PR Curve, aka PR AUC)

PR ๊ณก์„  ์•„๋ž˜ ์˜์—ญ, ์ฆ‰ Precision๊ณผ Recall์— ๋Œ€ํ•œ ๋ฉด์ ์„ ์˜๋ฏธํ•˜๋ฉฐ, ์ด ๊ฐ’์˜ ๋ฒ”์œ„๋Š” 0~1์ž…๋‹ˆ๋‹ค.

MCC (Matthews correlation coefficient)

F1 ์ ์ˆ˜๋Š” TN์„ ๋ฌด์‹œํ•˜์ง€๋งŒ, MCC๋Š” confusion matrix์˜ 4๊ฐœ ๊ฐ’ ๋ชจ๋‘๋ฅผ ๊ณ ๋ คํ•˜๋ฏ€๋กœ 4๊ฐœ ๊ฐ’ ๋ชจ๋‘ ๋ชจ๋‘ ์ข‹์€ ์˜ˆ์ธก ๊ฒฐใ…๊ณผ๋ฅผ ์–ป๋Š” ๊ฒฝ์šฐ์—๋งŒ ๋†’์€ ์ ์ˆ˜๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

MCC=TPโ‹…TNโˆ’FPโ‹…FN(TP+FP)โ‹…(TP+FN)โ‹…(TN+FP)โ‹…(TN+FN)ย {\begin{aligned} \textrm{MCC} = \frac{\text{TP}\cdot\text{TN}-\text{FP}\cdot\text{FN}}{\sqrt{ (\text{TP}+\text{FP})\cdot(\text{TP}+\text{FN})\cdot(\text{TN}+\text{FP})\cdot(\text{TN}+\text{FN}) }}\ \end{aligned}}

MCC๋Š” -1์—์„œ 1์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ 1์€ Perfect Prediction, 0์€ Random Prediction, -1์€ Worst Prediction์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. Accuracy, F1 ์ ์ˆ˜, MCC์˜ ๊ฒฐ๊ณผ ๋น„๊ต์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Last updated