AgentCore Observability

0. Observability and OpenTelemetry (OTEL)


What is Observability?

๊ด€์ฐฐ ๊ฐ€๋Šฅ์„ฑ(Observability)์€ ์‹œ์Šคํ…œ์˜ ๋‚ด๋ถ€ ์ž‘๋™ ๋ฐฉ์‹์„ ์•Œ์ง€ ๋ชปํ•ด๋„ ๊ทธ ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด ์งˆ๋ฌธํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์—ฌ ์™ธ๋ถ€์—์„œ ์‹œ์Šคํ…œ์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€, ์ด๋Š” ์‰ฝ๊ฒŒ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์ƒˆ๋กœ์šด ๋ฌธ์ œ, ์ฆ‰ "์•Œ๋ ค์ง€์ง€ ์•Š์€ ๋ฏธ์ง€์˜ ๋ฌธ์ œ(unknown unknowns)"๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ๋˜ํ•œ "์™œ ์ด๋Ÿฐ ์ผ์ด ๋ฐœ์ƒํ•˜๋Š”๊ฐ€?"๋ผ๋Š” ์งˆ๋ฌธ์— ๋‹ตํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.

์‹œ์Šคํ…œ์— ๋Œ€ํ•ด ์ด๋Ÿฌํ•œ ์งˆ๋ฌธ์„ ํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์ ์ ˆํžˆ ๊ณ„์ธก(instrumented)๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ฝ”๋“œ๊ฐ€ ์ถ”์ (trace), ์ง€ํ‘œ(metrics), ๋กœ๊ทธ(log)์™€ ๊ฐ™์€ ์‹ ํ˜ธ๋ฅผ ๋‚ด๋ณด๋‚ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐœ๋ฐœ์ž๊ฐ€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€์ ์ธ ๊ณ„์ธก์„ ๋”ํ•  ํ•„์š” ์—†์ด ํ•„์š”ํ•œ ๋ชจ๋“  ์ •๋ณด๋ฅผ ์ด๋ฏธ ๊ฐ–์ถ”๊ณ  ์žˆ์„ ๋•Œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์€ ์ ์ ˆํžˆ ๊ณ„์ธก๋œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

3๊ฐ€์ง€ ๊ธฐ๋ณธ ์š”์†Œ

  • Traces(์ถ”์ ): ํ•˜๋‚˜์˜ ์š”์ฒญ์ด ์‹œ์Šคํ…œ์„ ํ†ต๊ณผํ•˜๋Š” End-to-end ์—ฌ์ •์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. GenAI์—์„œ๋Š” ์‚ฌ์šฉ์ž ์งˆ๋ฌธ๋ถ€ํ„ฐ ์ตœ์ข… ์‘๋‹ต๊นŒ์ง€์˜ ์ „์ฒด ํ๋ฆ„์„ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.

  • Metrics(๋ฉ”ํŠธ๋ฆญ): ์ˆ˜์น˜ํ˜• ์ง‘๊ณ„ ๋ฐ์ดํ„ฐ๋กœ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰, ์‘๋‹ต ์‹œ๊ฐ„, ๋น„์šฉ ๋“ฑ์„ ์ถ”์ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Logs(๋กœ๊ทธ): ๊ตฌ์กฐํ™”๋œ ์ƒ์„ธ ์ด๋ฒคํŠธ ๋ฐ์ดํ„ฐ๋กœ GenAI์—์„œ๋Š” ํ”„๋กฌํ”„ํŠธ ๋‚ด์šฉ, ๋ชจ๋ธ ์‘๋‹ต, ์—๋Ÿฌ ๋ฉ”์‹œ์ง€ ๋“ฑ์„ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ฐœ๋…

  • Spans(๊ตฌ๊ฐ„): Trace ๋‚ด์˜ ๊ฐœ๋ณ„ ์ž‘์—… ๋‹จ์œ„(Traces are collections of spans)์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด "ํ”„๋กฌํ”„ํŠธ ์ „์ฒ˜๋ฆฌ", "LLM ํ˜ธ์ถœ", "์‘๋‹ต ํ›„์ฒ˜๋ฆฌ" ๊ฐ๊ฐ์ด ํ•˜๋‚˜์˜ span์ด ๋ฉ๋‹ˆ๋‹ค.

  • Baggage: OpenTelemetry์—์„œ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ์ „ํŒŒํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ Span ๊ฐ„์— ์ „๋‹ฌ์ด ํ•„์š”ํ•œ ์ปจํ…์ŠคํŠธ ์ •๋ณด์ž…๋‹ˆ๋‹ค.

    • ์ž๋™ ์ „ํŒŒ: ๋ช…์‹œ์ ์œผ๋กœ ์ „๋‹ฌํ•˜์ง€ ์•Š์•„๋„ ๋ชจ๋“  span์—์„œ ์ ‘๊ทผ ๊ฐ€๋Šฅ

    • ํ‚ค-๊ฐ’ ์Œ: ๋ฌธ์ž์—ด ํ˜•ํƒœ๋กœ ์ €์žฅ

    • ์ „์—ญ ์ ‘๊ทผ: ์ฝ”๋“œ ์–ด๋””์„œ๋“  ํ˜„์žฌ baggage ๊ฐ’์„ ์ฝ์„ ์ˆ˜ ์žˆ์Œ

Monitoring vs. Observability

GenAI & Agentic AI ์‹œ์Šคํ…œ์€ ํŠนํžˆ ์˜ˆ์ธกํ•˜๊ธฐ ์–ด๋ ค์šด ๋™์ž‘์ด ๋งŽ์•„์„œ Observability๊ฐ€ ๋”์šฑ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. "์™œ ์ด ํ”„๋กฌํ”„ํŠธ์—์„œ๋งŒ ์ด์ƒํ•œ ๋‹ต๋ณ€์ด ๋‚˜์˜ฌ๊นŒ?" ๊ฐ™์€ ์งˆ๋ฌธ์— ๋‹ตํ•˜๋ ค๋ฉด ๋‹จ์ˆœํ•œ ๋ฉ”ํŠธ๋ฆญ ๋ชจ๋‹ˆํ„ฐ๋ง์„ ๋„˜์–ด์„œ ์ „์ฒด ์š”์ฒญ ํ๋ฆ„์„ ์ถ”์ ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • Monitoring์œผ๋กœ ๋ฌธ์ œ๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฐ์ง€: reactive(๋ฐ˜์‘์ ) ์ ‘๊ทผ์œผ๋กœ ์ง‘๊ณ„๋œ ๋ฉ”ํŠธ๋ฆญ์„ ํ†ตํ•ด ์‚ฌ์ „ ์ •์˜๋œ ๋ฌธ์ œ๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ๊ฒƒ

    • LLM API ์‘๋‹ต์‹œ๊ฐ„ > 5์ดˆ๋ฉด ์•Œ๋žŒ

    • ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰์ด ์ผ์ผ ํ•œ๋„์˜ 80% ์ดˆ๊ณผ์‹œ ์•Œ๋ฆผ

    • ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ ์‹คํŒจ์œจ > 1%๋ฉด ๊ฒฝ๊ณ 

  • Observability๋กœ ๋ฌธ์ œ์˜ ์›์ธ์„ ๊นŠ์ด ํŒŒ์•…: proactive(๋Šฅ๋™์ ) ์ ‘๊ทผ์œผ๋กœ ์›์‹œ ๋ฐ์ดํ„ฐ์™€ ์ปจํ…์ŠคํŠธ๋ฅผ ํ†ตํ•ด ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ๋ฌธ์ œ๋„ ๋””๋ฒ„๊น…ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ

    • ํ•ด๋‹น ์‚ฌ์šฉ์ž/์„ธ์…˜์˜ ์ „์ฒด ์š”์ฒญ trace ์ถ”์ 

    • ํ”„๋กฌํ”„ํŠธ ๋ณ€ํ™” ํŒจํ„ด ๋ถ„์„

    • ๋ชจ๋ธ ๋ฒ„์ „, ์˜จ๋„ ์„ค์ •, ์ปจํ…์ŠคํŠธ ๊ธธ์ด ๋“ฑ ์ƒ๊ด€๊ด€๊ณ„ ํŒŒ์•…

๊ตฌ๋ถ„
Monitoring
Observability

ํ•ต์‹ฌ ๋ชฉ์ 

์‚ฌ์ „์— ์ •์˜๋œ ๋ฌธ์ œ๋“ค์„ ๊ฐ์ง€ํ•˜๊ณ  ์•Œ๋ฆผ์„ ๋ณด๋‚ด๋Š” ๊ฒƒ์ด ์ฃผ๋œ ๋ชฉ์ ์ž…๋‹ˆ๋‹ค.

์˜ˆ์ƒ์น˜ ๋ชปํ•œ ๋ฌธ์ œ๊นŒ์ง€ ํฌํ•จํ•˜์—ฌ ์‹œ์Šคํ…œ์˜ ๋‚ด๋ถ€ ์ƒํƒœ๋ฅผ ์ดํ•ดํ•˜๊ณ  ๋””๋ฒ„๊น…ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ ‘๊ทผ ๋ฐฉ์‹

์•Œ๋ ค์ง„ ๋ฉ”ํŠธ๋ฆญ๊ณผ ์ž„๊ณ„๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹œ์Šคํ…œ ์ƒํƒœ๋ฅผ ํŒ๋‹จํ•˜๋Š” ๋ฐ˜์‘์  ์ ‘๊ทผ์„ ์ทจํ•ฉ๋‹ˆ๋‹ค.

์‹œ์Šคํ…œ์˜ ์ „์ฒด์ ์ธ ๋™์ž‘์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ๋ถ„์„ํ•˜๋Š” ๋Šฅ๋™์  ์ ‘๊ทผ์„ ์ทจํ•ฉ๋‹ˆ๋‹ค.

์งˆ๋ฌธ์˜ ์ข…๋ฅ˜

"์‹œ์Šคํ…œ์ด ์ •์ƒ์ ์œผ๋กœ ์ž‘๋™ํ•˜๊ณ  ์žˆ๋Š”๊ฐ€?"๋ผ๋Š” ์ด๋ถ„๋ฒ•์  ์งˆ๋ฌธ์— ๋‹ตํ•ฉ๋‹ˆ๋‹ค.

"์™œ ์‹œ์Šคํ…œ์ด ์ด๋ ‡๊ฒŒ ๋™์ž‘ํ•˜๋Š”๊ฐ€?"๋ผ๋Š” ๋ณต์žกํ•œ ์งˆ๋ฌธ์— ๋‹ตํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ํ˜•ํƒœ

CPU ์‚ฌ์šฉ๋ฅ , ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰, ์‘๋‹ต์‹œ๊ฐ„ ๋“ฑ ์ง‘๊ณ„๋œ ๋ฉ”ํŠธ๋ฆญ ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

traces, logs, metrics๋ฅผ ํฌํ•จํ•œ ์›์‹œ ๋ฐ์ดํ„ฐ์™€ ํ’๋ถ€ํ•œ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ œ ๋ฒ”์œ„

๋ฏธ๋ฆฌ ์˜ˆ์ƒํ•˜๊ณ  ์ •์˜ํ•œ ์•Œ๋ ค์ง„ ์žฅ์•  ์ƒํ™ฉ๋“ค์„ ๊ฐ์ง€ํ•˜๋Š” ๋ฐ ํŠนํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์ƒํ•˜์ง€ ๋ชปํ•œ ์ƒˆ๋กœ์šด ๋ฌธ์ œ๋‚˜ ๋ณต์žกํ•œ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ ์ธํ•œ ์ด์Šˆ๊นŒ์ง€ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹œ๊ฐ„์  ๊ด€์ 

ํ˜„์žฌ ์‹œ์ ์˜ ์‹œ์Šคํ…œ ์ƒํƒœ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค.

๊ณผ๊ฑฐ์˜ ํŠน์ • ์‹œ์  ์ƒํ™ฉ์„ ์žฌ๊ตฌ์„ฑํ•˜๊ณ  ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ๋ณ€ํ™” ํŒจํ„ด์„ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

GenAI ํ™œ์šฉ ์˜ˆ์‹œ

LLM API ์‘๋‹ต์‹œ๊ฐ„์ด 5์ดˆ๋ฅผ ์ดˆ๊ณผํ•˜๊ฑฐ๋‚˜ ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰์ด ์ผ์ผ ํ•œ๋„์˜ 80%๋ฅผ ๋„˜์œผ๋ฉด ์•Œ๋žŒ์„ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

ํŠน์ • ์‚ฌ์šฉ์ž๊ฐ€ "AI๊ฐ€ ์ด์ƒํ•œ ๋‹ต๋ณ€์„ ํ•œ๋‹ค"๊ณ  ๋ณด๊ณ ํ–ˆ์„ ๋•Œ, ํ•ด๋‹น ์š”์ฒญ์˜ ์ „์ฒด trace๋ฅผ ์ถ”์ ํ•˜์—ฌ ํ”„๋กฌํ”„ํŠธ, ๋ชจ๋ธ ์„ค์ •, ์ปจํ…์ŠคํŠธ ๋“ฑ์„ ์ข…ํ•ฉ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.

๋„๊ตฌ ์„ค์ •

์ž„๊ณ„๊ฐ’๊ณผ ์•Œ๋žŒ ๊ทœ์น™์„ ์‚ฌ์ „์— ์ •์˜ํ•˜๊ณ  ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์„ค์ • ์ค‘์‹ฌ์  ์ ‘๊ทผ์ž…๋‹ˆ๋‹ค.

๋‹ค์–‘ํ•œ ์ƒํ™ฉ์—์„œ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ฟผ๋ฆฌํ•˜๊ณ  ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์—ฐํ•œ ๋„๊ตฌ์™€ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์žฅ์ 

๋น ๋ฅธ ์žฅ์•  ๊ฐ์ง€์™€ ์ฆ‰๊ฐ์ ์ธ ๋Œ€์‘์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์šด์˜ํŒ€์˜ ์—…๋ฌด ๋ถ€๋‹ด์„ ์ค„์—ฌ์ค๋‹ˆ๋‹ค.

๋ณต์žกํ•œ ์‹œ์Šคํ…œ์˜ ๊ทผ๋ณธ ์›์ธ์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๊ณ , ์ƒˆ๋กœ์šด ์œ ํ˜•์˜ ๋ฌธ์ œ๋„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

ํ•œ๊ณ„

๋ฏธ๋ฆฌ ์ •์˜ํ•˜์ง€ ์•Š์€ ๋ฌธ์ œ๋Š” ๊ฐ์ง€ํ•  ์ˆ˜ ์—†์œผ๋ฉฐ, ๋ฌธ์ œ์˜ ์›์ธ ํŒŒ์•…๋ณด๋‹ค๋Š” ์ฆ์ƒ ๊ฐ์ง€์— ๋จธ๋ฌผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘๊ณผ ์ €์žฅ์ด ํ•„์š”ํ•˜๋ฉฐ, ๋ถ„์„์„ ์œ„ํ•œ ์ „๋ฌธ์„ฑ๊ณผ ๋„๊ตฌ ํ™œ์šฉ ๋Šฅ๋ ฅ์ด ์š”๊ตฌ๋ฉ๋‹ˆ๋‹ค.

What is OpenTelemetry?

OpenTelemetry๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์„ฑ๋Šฅ๊ณผ ๋™์ž‘์„ ๊ด€์ฐฐํ•˜๊ณ  ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ธฐ ์œ„ํ•œ ์˜คํ”ˆ์†Œ์Šค ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ GenAI ์‹œ์Šคํ…œ์—์„œ๋Š” LLM ํ˜ธ์ถœ, ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ, ์‘๋‹ต ์ƒ์„ฑ ๊ณผ์ •์„ ์ถ”์ ํ•˜๋Š” ๋ฐ ๋งค์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. GenAI์—์„œ์˜ ํ™œ์šฉ ์˜ˆ์‹œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • LLM API ํ˜ธ์ถœ ์ง€์—ฐ์‹œ๊ฐ„๊ณผ ํ† ํฐ ๋น„์šฉ ์ถ”์ 

  • ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ํšจ๊ณผ ์ธก์ •

  • RAG ์‹œ์Šคํ…œ์—์„œ ๋ฌธ์„œ ๊ฒ€์ƒ‰๋ถ€ํ„ฐ ๋‹ต๋ณ€ ์ƒ์„ฑ๊นŒ์ง€์˜ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ๋ชจ๋‹ˆํ„ฐ๋ง

  • ๋ชจ๋ธ๋ณ„ ์„ฑ๋Šฅ ๋น„๊ต์™€ A/B ํ…Œ์ŠคํŠธ

๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ๋ชจ๋ธ ํ›ˆ๋ จ ๊ณผ์ •์„ wandb๋‚˜ mlflow๋กœ ์ถ”์ ํ•˜๋“ฏ์ด, OpenTelemetry๋Š” ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ GenAI ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์‹ค์‹œ๊ฐ„ ๋™์ž‘์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ ๋ณ‘๋ชฉ์ ์„ ์ฐพ๊ณ , ๋น„์šฉ์„ ์ตœ์ ํ™”ํ•˜๋ฉฐ, ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

OTLP (OpenTelemetry Protocol)

OpenTelemetry์—์„œ ํ…”๋ ˆ๋ฉ”ํŠธ๋ฆฌ ๋ฐ์ดํ„ฐ(trace, metrics, logs)๋ฅผ ๊ตฌ์กฐํ™”ํ•˜๊ณ  ์ „์†กํ•˜๊ธฐ ์œ„ํ•œ ํ‘œ์ค€ ๋ฐ์ดํ„ฐ ํ˜•์‹ ์ค‘ OTLP (OpenTelemetry Protocol)๋ฅผ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • OpenTelemetry์˜ ๋„ค์ดํ‹ฐ๋ธŒ ํ”„๋กœํ† ์ฝœ

  • gRPC(๋ฐ”์ด๋„ˆ๋ฆฌ, ๊ณ ์„ฑ๋Šฅ), HTTP/JSON(๋””๋ฒ„๊น… ์šฉ์ด), HTTP/Protobuf ํ˜•ํƒœ๋กœ ์ „์†ก

  • ๊ฐ€์žฅ ํšจ์œจ์ ์ด๊ณ  ์™„์ „ํ•œ ๊ธฐ๋Šฅ ์ง€์›

OpenTelemetry Distro

OpenTelemetry์˜ ํ‘œ์ค€ ๊ธฐ๋Šฅ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ, ๊ธฐ์—…์ด๋‚˜ ๋ฒค๋”๊ฐ€ ์ž์‹ ๋“ค์˜ ์š”๊ตฌ์— ๋งž๊ฒŒ ๊ธฐ๋Šฅ์„(์˜ˆ: ๋กœ๊ทธ ์ˆ˜์ง‘, ์„ค์ • ์ž๋™ํ™” ๋“ฑ) ํ™•์žฅํ•˜๊ฑฐ๋‚˜ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•ํ•œ ๋ฐฐํฌํŒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๋น„์œ ํ•˜์ž๋ฉด,

  • OpenTelemetry = ๋ฆฌ๋ˆ…์Šค ์ปค๋„

  • OpenTelemetry Distro = Ubuntu, CentOS ๊ฐ™์€ ๋ฐฐํฌํŒ

์ข…๋ž˜ OpenTelemetry

OpenTelemetry Distro๋Š” ์ด๋Ÿฐ ์„ค์ •์„ ์ž๋™ํ™”ํ•ด์ค๋‹ˆ๋‹ค:

AWS Distro for OpenTelemetry(ADOT)

circle-check
  • AWS ํ™˜๊ฒฝ์— ์ตœ์ ํ™”๋œ ๊ณต์‹ Distro์œผ๋กœ ์„ค์น˜ ํ›„ ๋ฐ”๋กœ X-Ray, CloudWatch, OpenSearch ๊ฐ™์€ AWS ์„œ๋น„์Šค์™€ ์—ฐ๋™๋˜๋„๋ก ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  • Lambda, ECS, EC2, EKS ๋“ฑ์—์„œ ์‰ฝ๊ฒŒ ๋ฐฐํฌ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ํ•ญ๋ชฉ
AWS X-Ray
Amazon CloudWatch

๋ชฉ์ 

๋ถ„์‚ฐ ์ถ”์ (Trace) ๋ถ„์„

๋ฉ”ํŠธ๋ฆญ, ๋กœ๊ทธ, ์•Œ๋žŒ ๊ด€๋ฆฌ

๋ถ„์„ ๋Œ€์ƒ

์š”์ฒญ ํ๋ฆ„, ์„œ๋น„์Šค ๊ฐ„ ํ˜ธ์ถœ

CPU, Memory, ์‘๋‹ต ์‹œ๊ฐ„ ๋“ฑ

์‹œ๊ฐํ™”

ํ˜ธ์ถœ ํŠธ๋ฆฌ(Trace Map), ํƒ€์ž„๋ผ์ธ

๋Œ€์‹œ๋ณด๋“œ(๊ทธ๋ž˜ํ”„, ๋กœ๊ทธ ๊ฒ€์ƒ‰)

์‚ฌ์šฉ ํ™˜๊ฒฝ

๋งˆ์ดํฌ๋กœ์„œ๋น„์Šค, Lambda, ์ปจํ…Œ์ด๋„ˆ ๋“ฑ

์ „์ฒด AWS ์„œ๋น„์Šค ์ „๋ฐ˜

์•Œ๋žŒ ๊ธฐ๋Šฅ

์—†์Œ (CloudWatch ์‚ฌ์šฉ)

์žˆ์Œ (์ž๋™ ์•Œ๋žŒ ์„ค์ • ๊ฐ€๋Šฅ)

๋กœ๊ทธ ๊ธฐ๋Šฅ

์—†์Œ (CloudWatch์— ํ†ตํ•ฉ ๊ฐ€๋Šฅ)

์žˆ์Œ (CloudWatch Logs)

ADOT Python ์„ค์น˜ ๋ฐ ์‹คํ–‰ ์˜ˆ์‹œ:

OpenTelemetry Distro vs. ADOT

๊ตฌ๋ถ„
OpenTelemetry Distro
ADOT

์ œ๊ณต์ž

OpenTelemetry ์ปค๋ฎค๋‹ˆํ‹ฐ

AWS

๋Œ€์ƒ ํ™˜๊ฒฝ

๋ชจ๋“  ํ™˜๊ฒฝ (AWS, GCP, Azure, ์˜จํ”„๋ ˆ๋ฏธ์Šค)

AWS ํ™˜๊ฒฝ ์ตœ์ ํ™”

AWS ์„œ๋น„์Šค ํ†ตํ•ฉ

๊ธฐ๋ณธ์ ์ธ HTTP ๊ณ„์ธก๋งŒ

๋„ค์ดํ‹ฐ๋ธŒ AWS SDK ๊ณ„์ธก

์ถ”์  ID ํ˜•์‹

ํ‘œ์ค€ OpenTelemetry ํ˜•์‹

OpenTelemetry ํ˜•์‹ + X-Ray ํ˜ธํ™˜ ํ˜•์‹ ์ง€์›

๋ฉ”ํƒ€๋ฐ์ดํ„ฐ

๊ธฐ๋ณธ ๋ฆฌ์†Œ์Šค ์ •๋ณด

AWS ๋ฆฌ์†Œ์Šค ์ •๋ณด ์ž๋™ ์ˆ˜์ง‘

๋ฐฑ์—”๋“œ ์ง€์›

๋ชจ๋“  OTLP ํ˜ธํ™˜ ๋ฐฑ์—”๋“œ

AWS ์„œ๋น„์Šค + OTLP ๋ฐฑ์—”๋“œ

Category
OpenTelemetry Distro
ADOT

Provider

OpenTelemetry Community

AWS

Target Environment

All environments (AWS, GCP, Azure, On-premises)

Optimized for AWS environment

AWS Service Integration

Basic HTTP instrumentation only

Native AWS SDK instrumentation

Trace ID Format

Standard OpenTelemetry format

OpenTelemetry format + X-Ray compatible format support

Metadata

Basic resource information

Automatic collection of AWS resource information

Backend Support

All OTLP-compatible backends

AWS services + OTLP backends

์ฝ”๋“œ ์ˆ˜์ค€์—์„œ์˜ ์ฐจ์ด

OpenTelemetry Distro:

ADOT

1. Overview


circle-check

Amazon Bedrock AgentCore Observability๋Š” AI ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ, ์‚ฌ์šฉ๋Ÿ‰ ๋ฐ ๋™์ž‘์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ํฌ๊ด„์ ์ธ ๋„๊ตฌ ์„ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ์„œ๋น„์Šค๋Š” ์—์ด์ „ํŠธ ์šด์˜์— ๋Œ€ํ•œ ๊ฐ€์‹œ์„ฑ์„ ๋†’์ด๊ณ , ๋ฌธ์ œ๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ์ง„๋‹จํ•˜๋ฉฐ, ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

AgentCore Observability๋Š” Amazon CloudWatch ๊ธฐ๋ฐ˜ ๋Œ€์‹œ๋ณด๋“œ์™€ ์„ธ์…˜ ์ˆ˜, ์ง€์—ฐ ์‹œ๊ฐ„, ์ง€์† ์‹œ๊ฐ„, ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰, ์˜ค๋ฅ˜์œจ๊ณผ ๊ฐ™์€ ์ฃผ์š” ์ง€ํ‘œ์— ๋Œ€ํ•œ ์›๊ฒฉ ๋ถ„์„์„ ํ†ตํ•ด ์—์ด์ „ํŠธ ์šด์˜ ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ์‹ค์‹œ๊ฐ„ ๊ฐ€์‹œ์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํ’๋ถ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํƒœ๊ทธ ์ง€์ • ๋ฐ ํ•„ํ„ฐ๋ง์„ ํ†ตํ•ด ๋Œ€๊ทœ๋ชจ ๋ฌธ์ œ ์กฐ์‚ฌ ๋ฐ ํ’ˆ์งˆ ์œ ์ง€ ๊ด€๋ฆฌ๊ฐ€ ๊ฐ„์†Œํ™”๋ฉ๋‹ˆ๋‹ค. AgentCore๋Š” ํ‘œ์ค€ํ™”๋œ OpenTelemetry(OTEL) ํ˜ธํ™˜ ํ˜•์‹์œผ๋กœ ์›๊ฒฉ ๋ถ„์„ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋ฏ€๋กœ ๊ธฐ์กด ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๊ด€์ธก ์Šคํƒ๊ณผ ์‰ฝ๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

AgentCore Observability๋Š” ๊ธฐ์กด ๋ชจ๋‹ˆํ„ฐ๋ง ์‹œ์Šคํ…œ๊ณผ๋„ ํ†ตํ•ฉ๋˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ‘œ์ค€ ์›๊ฒฉ ์ธก์ •(๋ฐ CloudWatch)์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์ด๋Ÿฌํ•œ ๊ด€์ธก ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฅธ ๋„๊ตฌ๋กœ ๋‚ด๋ณด๋‚ด๊ฑฐ๋‚˜ ์ „๋‹ฌํ•˜๊ฑฐ๋‚˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๊ด‘๋ฒ”์œ„ํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง๊ณผ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.๊ฐœ๋ฐœ์ž๋Š” ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ๊ณผ ๊ทธ ์ด์œ ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋ฌธ์ œ ํ•ด๊ฒฐ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—์ด์ „ํŠธ ์„ค๊ณ„ ์ตœ์ ํ™”(์˜ˆ: ๊ด€์ฐฐ๋œ ๋™์ž‘์— ๋”ฐ๋ผ ํ”„๋กฌํ”„ํŠธ ๋˜๋Š” tool usage ์กฐ์ •)์—๋„ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๊ธฐ๋Šฅ

๐ŸŽฏ ์‹ค์‹œ๊ฐ„ ๋ชจ๋‹ˆํ„ฐ๋ง

  • ์ฃผ์š” ์ง€ํ‘œ ์ถ”์ : ์ง€์—ฐ ์‹œ๊ฐ„, ์„ธ์…˜ ์ˆ˜, ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰, ์˜ค๋ฅ˜์œจ

  • ํ’๋ถ€ํ•œ ์‹œ๊ฐํ™” ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ˜ CloudWatch ๋Œ€์‹œ๋ณด๋“œ ๋‚ด์žฅ

๐Ÿ”ง ์‹ฌ์ธต ๋””๋ฒ„๊น…

  • ์ „์ฒด ์—์ด์ „ํŠธ ์‹คํ–‰ ๊ฒฝ๋กœ ์ถ”์ 

  • ์ค‘๊ฐ„ ์ถœ๋ ฅ ๋ฐ ์˜์‚ฌ ๊ฒฐ์ • ์ง€์  ๊ฒ€์‚ฌ

  • ์„ฑ๋Šฅ ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ์‹ ์†ํ•˜๊ฒŒ ํŒŒ์•…

๐Ÿ“Š ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์— ๋ฐ”๋กœ ์ ์šฉ ๊ฐ€๋Šฅ

  • OpenTelemetry ํ˜ธํ™˜ - ๊ธฐ์กด ๋ชจ๋‹ˆํ„ฐ๋ง ์Šคํƒ๊ณผ ํ†ตํ•ฉ

  • ํ’๋ถ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํƒœ๊ทธ ๊ธฐ๋Šฅ์œผ๋กœ ๊ฐ„ํŽธํ•œ ํ•„ํ„ฐ๋ง ๋ฐ ์กฐ์‚ฌ ๊ฐ€๋Šฅ

  • ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง

๐Ÿ›ก๏ธ ๊ทœ์ • ์ค€์ˆ˜ ๋ฐ ๊ฐ์‚ฌ

  • ๊ทœ์ • ์ค€์ˆ˜ ์š”๊ตฌ ์‚ฌํ•ญ์— ๋Œ€ํ•œ ์—์ด์ „ํŠธ ์›Œํฌํ”Œ๋กœ์šฐ์— ๋Œ€ํ•œ ์™„๋ฒฝํ•œ ๊ฐ€์‹œ์„ฑ ํ™•๋ณด

  • ์—์ด์ „ํŠธ ์˜์‚ฌ ๊ฒฐ์ • ๋ฐ ์ถœ๋ ฅ์— ๋Œ€ํ•œ ๊ฐ์‚ฌ ์ถ”์ 

AWS ์„œ๋น„์Šค์™€์˜ ํ†ตํ•ฉ

AgentCore Observability๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ AWS ์„œ๋น„์Šค์™€ ํ†ตํ•ฉ๋˜์–ด ํฌ๊ด„์ ์ธ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋ถ„์„ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • Amazon CloudWatch: ์—์ด์ „ํŠธ ์„ฑ๋Šฅ ์ง€ํ‘œ์™€ ๋กœ๊ทธ๋ฅผ CloudWatch์— ์ž๋™์œผ๋กœ ์ „์†กํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ๋ชจ๋‹ˆํ„ฐ๋ง๊ณผ ๊ฒฝ๊ณ ๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • AWS X-Ray: ์—์ด์ „ํŠธ ์‹คํ–‰ ๊ณผ์ •์„ ์ถ”์ ํ•˜์—ฌ ์„ฑ๋Šฅ ๋ณ‘๋ชฉ๊ณผ ์˜ค๋ฅ˜๋ฅผ ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Amazon S3: ์ƒ์„ธํ•œ ๋กœ๊ทธ์™€ ์ถ”์  ๋ฐ์ดํ„ฐ๋ฅผ S3์— ์ €์žฅํ•˜์—ฌ ์žฅ๊ธฐ ๋ณด๊ด€ ๋ฐ ๋ถ„์„์„ ์œ„ํ•œ ์•ก์„ธ์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

  • AWS Cost Explorer: ์—์ด์ „ํŠธ ์‚ฌ์šฉ๊ณผ ๊ด€๋ จ๋œ ๋น„์šฉ์„ ๋ถ„์„ํ•˜๊ณ  ์ถ”์ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ถ”๊ฐ€ ๊ฐ€์ด๋“œ

2. Getting Started


AgentCore observability๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ํŠธ๋žœ์žญ์…˜ ๊ฒ€์ƒ‰(Transaction Searcharrow-up-right)์„ ๋จผ์ € ํ™œ์„ฑํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. AgentCore ์ฝ˜์†”์—์„œ ํŠธ๋žœ์žญ์…˜ ๊ฒ€์ƒ‰์„ ์ž๋™์œผ๋กœ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด 'Enable Observability' ๋ฒ„ํŠผ์„ ์„ ํƒํ•˜์„ธ์š”. CloudWatch์—์„œ๋„ ํŠธ๋žœ์žญ์…˜ ๊ฒ€์ƒ‰์„ ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํŠธ๋žœ์žญ์…˜ ๊ฒ€์ƒ‰์€ X-Ray๊ฐ€ ์ˆ˜์ง‘ํ•œ ๋ชจ๋“  ์ŠคํŒฌ์„ CloudWatch Logs์— ์ €์žฅํ•ด Logs Insights๋กœ ์ฆ‰์‹œ ์กฐํšŒํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๊ณ , ํ•„์š”ํ•˜๋ฉด Trace Summary ์ƒ‰์ธ์„ ์ถ”๊ฐ€ํ•ด ์˜ค๋ฅ˜ยท์ง€์—ฐ์˜ ์›์ธ์„ ํ•œ๋ˆˆ์— ํŒŒ์•…ํ•  ๊ณ ๊ธ‰ ๋ถ„์„๊นŒ์ง€ ์ œ๊ณตํ•˜๋Š” AWS CloudWatch์˜ ๋ถ„์‚ฐ ํŠธ๋žœ์žญ์…˜ ๋ถ„์„ ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค.

  • 100% Span ๋กœ๊ทธ ์ˆ˜์ง‘ ๋ฐ ์ €์žฅ: ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๋ชจ๋“  span ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ์กฐํ™”๋œ ๋กœ๊ทธ๋กœ CloudWatch Logs์˜ aws/spans ๋กœ๊ทธ ๊ทธ๋ฃน์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํฐ trace(์ตœ๋Œ€ 10,000 spans๊นŒ์ง€๋„)๋„ ๋ฌธ์ œ์—†์ด ์กฐ์‚ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์ฐธ์กฐ:https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Transaction-Search.htmlarrow-up-right)

    • Why 10,000 spans?: ๋ถ„์‚ฐ ํŠธ๋ ˆ์ด์Šค๊ฐ€ ๋งค์šฐ ๋ณต์žกํ•œ ์‹œ์Šคํ…œ์—์„œ ์‹คํ–‰๋˜๋Š” ๊ฒฝ์šฐ, ์ŠคํŒฌ์ด ์ˆ˜๋งŒ ๊ฐœ์— ์ด๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ œํ•œ์„ ํ†ตํ•ด ์ด๋Ÿฌํ•œ ๋ณต์žกํ•œ ํŠธ๋ ˆ์ด์Šค์—์„œ๋„ ๋ถ„์„ ๊ฐ€๋Šฅํ•œ ๋ฒ”์œ„ ๋‚ด์—์„œ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ ํ™”๋œ ๊ฐ’์ž…๋‹ˆ๋‹ค.

  • Indexing์„ ํ†ตํ•œ Trace ์š”์•ฝ ์ƒ์„ฑ: ์ธ๋ฑ์‹ฑ์€ Trace Summary์™€ ์—ฐ๊ด€๋œ ๊ธฐ๋Šฅ (Trace Summary Search/Analytics/Insights) ์„ ํ†ตํ•œ ๊ณ ๊ธ‰ ํŠธ๋ ˆ์ด์Šค ๋ถ„์„ ํ™œ์šฉ์„ ์œ„ํ•ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. AWS๋Š” X-Ray์—์„œ ์ˆ˜์‹ ๋œ span ์ค‘ ๊ธฐ๋ณธ์ ์œผ๋กœ 1%๋งŒ(์กฐ์ • ๊ฐ€๋Šฅ)๋ฅผ ์ธ๋ฑ์‹ฑํ•˜์—ฌ CloudWatch Logs Insights ์ฟผ๋ฆฌ๋กœ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

    • ๋Œ€๊ทœ๋ชจ ์„œ๋น„์Šค์—์„œ๋Š” trace ์ˆ˜๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์—, ์ „์ฒด ์ธ๋ฑ์‹ฑ์€ ๋น„ํšจ์œจ์  + ์“ธ๋ชจ ์—†๋Š” ์ •๋ณด๊ฐ€ ๋งŽ์„ ํ™•๋ฅ ์ด ๋†’์Šต๋‹ˆ๋‹ค.

    ๊ณ ์† ๊ฒ€์ƒ‰

    ํŠน์ • ์‚ฌ์šฉ์ž ID๋‚˜ ์š”์ฒญ ID๋กœ trace๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ

    APM ๋Œ€์‹œ๋ณด๋“œ ๊ธฐ๋Šฅ

    Application Signals์—์„œ ์˜ค๋ฅ˜ ๋น„์œจ, ์ง€์—ฐ ์‹œ๊ฐ„ ๋ถ„์„

    CloudWatch Logs Insights์™€ ์—ฐ๊ณ„

    SQL-like ์ฟผ๋ฆฌ๋กœ ์ธ๋ฑ์‹ฑ๋œ trace ๋ฐ์ดํ„ฐ ๋ถ„์„

  • ์„ฑ๋Šฅ ์ด์Šˆ ๋ฐœ์ƒ ์‹œ: ์ฆ‰์‹œ ๋น„์œจ ์ฆ๊ฐ€ / AB ํ…Œ์ŠคํŠธ์‹œ: ํ•ด๋‹น ๊ธฐ๊ฐ„๋งŒ ์ฆ๊ฐ€

  • ๋น„์šฉ ์ตœ์ ํ™” ์ „๋žต ์˜ˆ์‹œ: ์ดˆ๊ธฐ (1์ฃผ์ผ): 10% ์ธ๋ฑ์‹ฑ์œผ๋กœ ํŒจํ„ด ํŒŒ์•… / ์•ˆ์ •ํ™” (2์ฃผ-1๋‹ฌ): 5% ์ธ๋ฑ์‹ฑ์œผ๋กœ ๋ชจ๋‹ˆํ„ฐ๋ง / ์šด์˜: 1% ์ธ๋ฑ์‹ฑ์œผ๋กœ ๋น„์šฉ ์ ˆ์•ฝ

  • ์‹œ๊ฐ์  ๊ฒ€์ƒ‰ ๋ฐ ๋ถ„์„ UI ์ œ๊ณต: CloudWatch Application Signals ๋‚ด์˜ ๋น„์ฅฌ์–ผ ์—๋””ํ„ฐ๋ฅผ ํ†ตํ•ด span ์†์„ฑ(์˜ˆ: ์„œ๋น„์Šค ์ด๋ฆ„, ์ƒํƒœ ์ฝ”๋“œ, ๋น„์ฆˆ๋‹ˆ์Šค ID ๋“ฑ) ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง, ๊ทธ๋ฃน ๋ถ„์„, ์‹œ๊ฐ„ ์‹œ๊ณ„์—ด ๋ถ„์„ ๋“ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

ํŠธ๋žœ์žญ์…˜ ๊ฒ€์ƒ‰ ํ™œ์„ฑํ™” ์ดํ›„ ์—์ด์ „ํŠธ ํ˜ธ์ถœ์€ CloudWatch Logs์˜ aws/spans ๋กœ๊ทธ ๊ทธ๋ฃน์— ์ €์žฅ๋˜๋ฉฐ, spans ํ‚ค์›Œ๋“œ๋กœ ์‰ฝ๊ฒŒ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Model invocation logging (Optional)

  • Amazon Bedrock ์ฝ˜์†” ์ขŒ์ธก์˜ Settings์—์„œ Model invocation logging ํ™œ์„ฑํ™”

  • Model invocation logging: Amazon Bedrock ์ฝ˜์†”์—์„œ ์ขŒ์ธก ํ•˜๋‹จ์˜ Settings ์„ ํƒ ํ›„, Model invocation logging ํ™œ์„ฑํ™”

  • Select the data types to include with logs: ๋กœ๊ทธ์— ํฌํ•จํ•  ํ•„์ˆ˜ ๋ฐ์ดํ„ฐ ์œ ํ˜• ์„ ํƒ

  • Select the logging destinations: ๋กœ๊ทธ๋ฅผ CloudWatch Logs์—๋งŒ ์ „์†กํ•˜๊ฑฐ๋‚˜ Amazon S3์™€ CloudWatch Logs ๋ชจ๋‘์— ์ „์†กํ•˜๋„๋ก ์„ ํƒ

  • CloudWatch Logs configuration: Log group name์„ ์ƒ์„ฑํ•˜๊ณ  ์ ์ ˆํ•œ ์„œ๋น„์Šค ์—ญํ•  ์„ ํƒ

Option A: Runtime-Hosted Agents (AgentCore)

์ฐธ์กฐ: https://github.com/awslabs/amazon-bedrock-agentcore-samples/tree/main/01-tutorials/06-AgentCore-observability/01-Agentcore-runtime-hostedarrow-up-right

  1. SDK ์„ค์น˜: pip install aws-opentelemetry-distro boto3

circle-exclamation
  1. Restart with monitoring: Run: opentelemetry-instrument python my_runtime_agent.py

    1. Starter Toolkit ์‚ฌ์šฉ ์‹œ opentelemetry-instrument python ์ปค๋งจ๋“œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด ์ž๋™์œผ๋กœ Runtime Agent๊ฐ€ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.

  2. View data: Open CloudWatch GenAI Dashboardarrow-up-right ์„ ์—ด์–ด์„œ Bedrock AgentCore ํƒญ ํด๋ฆญ

์œ„ ๊ณผ์ • ํ›„์— AgentCore Runtime์œผ๋กœ ์—์ด์ „ํŠธ๋ฅผ ๋ฐฐํฌํ•˜๋ฉด GenAI Observavility ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ํ†ตํ•ด log, trace, metrics๋“ฑ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Option B: Local Agents (Non-Runtime Hosted Agents)

Local, Lambda, EC2, EKS ๋“ฑ์˜ non-AgentCore ํ™˜๊ฒฝ์—์„œ ๋กœ๊น…ํ•˜๋Š” ๋ฐฉ๋ฒ• - ์ฐธ์กฐ: https://github.com/awslabs/amazon-bedrock-agentcore-samples/tree/main/01-tutorials/06-AgentCore-observability/02-Agent-not-hosted-on-runtimearrow-up-right

  • ์•„๋ž˜์™€ ๊ฐ™์ด ํ”„๋กœ์ ํŠธ์˜ .env์— OTEL ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•œ ํ›„ ๋ช‡ ์ค„์˜ ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ™˜๊ฒฝ ๋ณ€์ˆ˜ (.env)

.env.template (<agent-name>, <agent-id> ๋ณ€๊ฒฝ ํ•„์š”)

circle-exclamation

.env ์˜ˆ์‹œ

OTEL_EXPORTER_OTLP_LOGS_HEADERS=x-aws-log-group=agents/strands-agent-logs,x-aws-log-stream=default,x-aws-metric-namespace=agents ๋กœ ์„ค์ • ํ›„, opentelemetry-instrument ์ปค๋งจ๋“œ๋กœ ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ ์—์ด์ „ํŠธ ์‹คํ–‰ ์‹œ:

OTEL_RESOURCE_ATTRIBUTES=service.name=custom-span-agent ๋กœ ์„ค์ • ํ›„, opentelemetry-instrument ์ปค๋งจ๋“œ๋กœ ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ ์—์ด์ „ํŠธ ์‹คํ–‰ ์‹œ:

Session ID Support

  • baggage.set_baggage("session.id", session_id)

Session ์ ์šฉ/๋ฏธ์ ์šฉ ์ฝ”๋“œ ์ฐจ์ด

Creating a Custom Span Agent

Span์€ ์—์ด์ „ํŠธ ๊ด€์ธก์„ฑ์— ํ•„์ˆ˜์ ์ธ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์†์„ฑ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:

  • ์‹คํ–‰ ์ค‘์ธ ํŠน์ • ํ•จ์ˆ˜๋‚˜ ํ”„๋กœ์„ธ์Šค๋ฅผ ์‹๋ณ„ํ•˜๋Š” ์ž‘์—… ์ด๋ฆ„

  • ์ž‘์—…์˜ ์ •ํ™•ํ•œ ์‹œ์ž‘ ๋ฐ ์ข…๋ฃŒ ์‹œ๊ฐ„์„ ํ‘œ์‹œํ•˜๋Š” ํƒ€์ž„์Šคํƒฌํ”„

  • ์ž‘์—…๋“ค์ด ๋” ํฐ ํ”„๋กœ์„ธ์Šค ๋‚ด์—์„œ ์–ด๋–ป๊ฒŒ ์ค‘์ฒฉ๋˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋Š” ๋ถ€๋ชจ-์ž์‹ ๊ด€๊ณ„

  • ์ž‘์—…์— ๋Œ€ํ•œ ์ปจํ…์ŠคํŠธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•˜๋Š” ํƒœ๊ทธ์™€ ์†์„ฑ

  • ์ŠคํŒฌ ์ˆ˜๋ช… ๋‚ด ์ค‘์š”ํ•œ ์‚ฌ๊ฑด์„ ํ‘œ์‹œํ•˜๋Š” ์ด๋ฒคํŠธ

  • ์„ฑ๊ณต, ์‹คํŒจ ๋˜๋Š” ๊ธฐํƒ€ ์™„๋ฃŒ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ƒํƒœ ์ •๋ณด

  • ์ž‘์—…์— ํŠนํ™”๋œ ๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ๋Ÿ‰ ์ง€ํ‘œ

Custom Span์„ ์ •์˜ํ•˜๋ฉด ์—์ด์ „ํŠธ ์‹คํ–‰ ํ๋ฆ„ ๋‚ด ํŠน์ • ์ž‘์—…์ด๋‚˜ ๊ตฌ๊ฐ„์„ ์ถ”์ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํŠน์ • ์ž‘์—… ์ถ”์ : ๋„๊ตฌ ํ˜ธ์ถœ, ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ, ์˜์‚ฌ๊ฒฐ์ • ์ง€์  ๋“ฑ ์ค‘์š”ํ•œ ์ž‘์—…์— ๋Œ€ํ•œ ์ŠคํŒฌ ์ƒ์„ฑ

  • ์ปค์Šคํ…€ ์†์„ฑ ์ถ”๊ฐ€: ํ•„ํ„ฐ๋ง ๋ฐ ๋ถ„์„์„ ์œ„ํ•œ ๋น„์ฆˆ๋‹ˆ์Šค ํŠนํ™” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋กœ ์ŠคํŒฌ์„ ํ’๋ถ€ํ•˜๊ฒŒ ํ•จ

  • ์ด๋ฒคํŠธ ๊ธฐ๋ก: ์ŠคํŒฌ ์ˆ˜๋ช… ์ฃผ๊ธฐ ๋‚ด ์ค‘์š”ํ•œ ์ˆœ๊ฐ„ ํ‘œ์‹œ

  • ์˜ค๋ฅ˜ ์ถ”์ : ์ƒ์„ธํ•œ ์ปจํ…์ŠคํŠธ์™€ ํ•จ๊ป˜ ์˜ค๋ฅ˜ ์บก์ฒ˜ ๋ฐ ๋ณด๊ณ 

  • ๊ด€๊ณ„ ์„ค์ •: ์‹คํ–‰ ํ๋ฆ„ ๋ชจ๋ธ๋ง์„ ์œ„ํ•ด ์ŠคํŒฌ ๊ฐ„ ๋ถ€๋ชจ-์ž์‹ ๊ด€๊ณ„ ์ƒ์„ฑ

์ด๋ฅผ ํ†ตํ•ด CloudWatch GenAI Observability ๋Œ€์‹œ๋ณด๋“œ์— ํ‘œ์‹œ๋˜๋Š” ๋‚ด์šฉ์„ ํ›จ์”ฌ ๋” ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Code snippet

GenAI Observability ํ™”๋ฉด

Model Invocations Tab

  • Invocation count (ํ˜ธ์ถœ ํšŸ์ˆ˜) โ€“ Converse, ConverseStream, InvokeModel, InvokeModelWithResponseStream API ์ž‘์—…์— ๋Œ€ํ•œ ์„ฑ๊ณต์ ์ธ ์š”์ฒญ ์ˆ˜

  • Invocation latency โ€“ ํ˜ธ์ถœ์˜ ์ง€์—ฐ ์‹œ๊ฐ„

  • Token Counts by Mode โ€“ ์ž…๋ ฅ ํ† ํฐ ์ˆ˜์™€ ์ถœ๋ ฅ ํ† ํฐ ์ˆ˜๋กœ ๊ตฌ๋ถ„๋œ ๋ชจ๋ธ๋ณ„ ํ† ํฐ ์ˆ˜

  • Daily Token Counts by ModelID โ€“ ๋ชจ๋ธ ID๋ณ„ ์ผ์ผ ์ด ํ† ํฐ ์ˆ˜

  • InputTokenCount, OutputTokenCount โ€“ ์„ ํƒํ•œ ๋ชจ๋ธ ์ „๋ฐ˜์— ๊ฑธ์นœ ํ•ด๋‹น ๊ณ„์ •์˜ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ ํ† ํฐ ์ด ์ˆ˜

  • Requests, grouped by input tokens (์ž…๋ ฅ ํ† ํฐ๋ณ„ ์š”์ฒญ ์ˆ˜) โ€“ 6๊ฐœ์˜ ๋ฒ”์œ„๋กœ ๋‚˜๋ˆˆ ์ž…๋ ฅ ํ† ํฐ๋ณ„ ์š”์ฒญ ์ˆ˜. ๊ฐ ๋ผ์ธ์€ ํŠน์ • ๋ฒ”์œ„์— ์†ํ•˜๋Š” ์š”์ฒญ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ„

  • Invocation Throttles (ํ˜ธ์ถœ ์ œํ•œ ํšŸ์ˆ˜) โ€“ ์‹œ์Šคํ…œ์ด ์ œํ•œํ•œ ํ˜ธ์ถœ ์ˆ˜. ํ‘œ์‹œ๋˜๋Š” ์ œํ•œ ํšŸ์ˆ˜๋Š” SDK์˜ retry ์„ค์ •์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง. (์ฐธ์กฐ: https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.htmlarrow-up-right)

  • Invocation Error Count โ€“ ์„œ๋ฒ„ ์ธก ๋ฐ ํด๋ผ์ด์–ธํŠธ ์ธก ์˜ค๋ฅ˜๋กœ ์ธํ•ด ๋ฐœ์ƒํ•œ ํ˜ธ์ถœ ์ˆ˜

Bedrock AgentCore Tab

  • Runtime sessions (๋Ÿฐํƒ€์ž„ ์„ธ์…˜) - AgentCore Runtime์—์„œ ์‹คํ–‰๋˜๋Š” ์—์ด์ „ํŠธ๊ฐ€ ์ƒ์„ฑํ•œ ์„ธ์…˜ ์ˆ˜ ์ถ”์ . ์„ธ์…˜์€ ๋Œ€ํ™”์™€ ์œ ์‚ฌํ•˜๋ฉฐ ์ „์ฒด ์ƒํ˜ธ์ž‘์šฉ ํ๋ฆ„์˜ ๊ด‘๋ฒ”์œ„ํ•œ ๋งฅ๋ฝ์„ ํฌํ•จํ•˜๋ฉฐ, ์ „๋ฐ˜์ ์ธ ํ”Œ๋žซํผ ์‚ฌ์šฉ๋Ÿ‰ ๋ชจ๋‹ˆํ„ฐ๋ง, ์šฉ๋Ÿ‰ ๊ณ„ํš ๋ฐ ์‚ฌ์šฉ์ž ์ฐธ์—ฌ ํŒจํ„ด ํŒŒ์•…์— ์œ ์šฉํ•จ.

  • Runtime invocations (๋Ÿฐํƒ€์ž„ ํ˜ธ์ถœ) - ๋ฐ์ดํ„ฐ ํ”Œ๋ ˆ์ธ API์— ๋Œ€ํ•œ ์ด ์š”์ฒญ ์ˆ˜. ๊ฐ API ํ˜ธ์ถœ์€ ์š”์ฒญ ํŽ˜์ด๋กœ๋“œ ํฌ๊ธฐ ๋˜๋Š” ์‘๋‹ต ์ƒํƒœ์— ๊ด€๊ณ„์—†์ด ํ•˜๋‚˜์˜ ํ˜ธ์ถœ๋กœ ๊ณ„์‚ฐ๋จ

  • Runtime error (๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜) - ์‹œ์Šคํ…œ ๋ฐ ์‚ฌ์šฉ์ž ์˜ค๋ฅ˜ ์ˆ˜

  • Runtime throttles (๋Ÿฐํƒ€์ž„ ์ œํ•œ) - ํ—ˆ์šฉ๋œ TPS(์ดˆ๋‹น ํŠธ๋žœ์žญ์…˜ ์ˆ˜)๋ฅผ ์ดˆ๊ณผํ•˜์—ฌ ์„œ๋น„์Šค์—์„œ ์ œํ•œ๋˜๋Š” ์š”์ฒญ ์ˆ˜๋กœ HTTP ์ƒํƒœ ์ฝ”๋“œ 429์™€ ํ•จ๊ป˜ ThrottlingException ๋ฐ˜ํ™˜

Sessions View & Traces View

3. ๊ฒฐ๋ก 


Amazon Bedrock AgentCore Observability๋Š” AI ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ, ์‚ฌ์šฉ๋Ÿ‰ ๋ฐ ๋™์ž‘์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ํฌ๊ด„์ ์ธ ๋„๊ตฌ ์„ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ์„œ๋น„์Šค๋ฅผ ํ†ตํ•ด ์—์ด์ „ํŠธ ์šด์˜์— ๋Œ€ํ•œ ๊ฐ€์‹œ์„ฑ์„ ๋†’์ด๊ณ , ๋ฌธ์ œ๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ์ง„๋‹จํ•˜๋ฉฐ, ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์„ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

CloudWatch, X-Ray, S3 ๋“ฑ์˜ AWS ์„œ๋น„์Šค์™€์˜ ํ†ตํ•ฉ์„ ํ†ตํ•ด ์‹ค์‹œ๊ฐ„ ๋ชจ๋‹ˆํ„ฐ๋ง, ์ƒ์„ธํ•œ ๋กœ๊น…, ์ถ”์  ๋ฐ ๊ฒฝ๊ณ  ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜์—ฌ ์—์ด์ „ํŠธ์˜ ์•ˆ์ •์„ฑ๊ณผ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

๋‹ค๋ฅธ AgentCore ์„œ๋น„์Šค(Runtime, Memory, Code Interpreter ๋“ฑ)์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด ์—์ด์ „ํŠธ์˜ ์ „์ฒด ์ˆ˜๋ช… ์ฃผ๊ธฐ๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ๊ณผ ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์„ ์ง€์†์ ์œผ๋กœ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Last updated