도구 호출 vs 에이전트 간 통신

··은 추상이 한 단계씩 올라간 같은 아이디어다. 셋의 경계가 흐려질수록 설계가 자유로워진다.

진화의 세 단계

단계	누가 무엇을 부르나	추상 수준
	이 JSON으로 함수를 지목	가장 낮음
	LLM이 도구 세트 중 골라 호출	중간
	에이전트가 동료 에이전트 호출	가장 높음

같은 구조다: “이름 + 입력 스키마”로 정의된 호출 가능한 단위. 자릿수만 다르다. 이 1바이트라면, 은 한 페이지짜리 작업이다.

function-calling: 기본기

OpenAI·Anthropic의 SDK가 노출하는 가장 작은 단위. 는 , 호출은 tool_use 블록이다. 은 결국 모델이 “이 함수를 이 인자로 부르고 싶다”고 선언하는 것일 뿐, 실제 실행은 클라이언트 몫이다.

# Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
# Verified at: 2026-06-02
from anthropic import Anthropic

client = Anthropic()
tools = [{
  "name": "get_weather",
  "description": "도시의 현재 날씨 조회",
  "input_schema": {
      "type": "object",
      "properties": {"city": {"type": "string"}},
      "required": ["city"],
  },
}]

r = client.messages.create(
  model="claude-sonnet-4-6",
  max_tokens=1024,
  tools=tools,
  messages=[{"role": "user", "content": "서울 날씨 알려줘"}],
)
print(r.stop_reason)  # "tool_use"
print(r.content)

// Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
// Verified at: 2026-06-02
import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic()
const tools = [{
name: 'get_weather',
description: '도시의 현재 날씨 조회',
input_schema: {
  type: 'object',
  properties: { city: { type: 'string' } },
  required: ['city'],
},
}] as const

const r = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
tools,
messages: [{ role: 'user', content: '서울 날씨 알려줘' }],
})
console.log(r.stop_reason)
console.log(r.content)

구조화 출력 vs tool use

OpenAI가 같은 문제에 살짝 다른 답을 제시했다 — . 를 강제하지만, 결과는 **이 아니라 모델 출력 그 자체다. 이 도구를 부를 의도가 없을 때도 같은 구조를 보장한다.

# Verified against: https://platform.openai.com/docs/guides/structured-outputs
# Verified at: 2026-06-02
from openai import OpenAI
from pydantic import BaseModel

class Weather(BaseModel):
  city: str
  temperature_c: float
  summary: str

client = OpenAI()
r = client.beta.chat.completions.parse(
  model="gpt-4o",
  messages=[{"role": "user", "content": "서울 날씨를 객체로 줘"}],
  response_format=Weather,
)
print(r.choices[0].message.parsed)

// Verified against: https://platform.openai.com/docs/guides/structured-outputs
// Verified at: 2026-06-02
import OpenAI from 'openai'
import { z } from 'zod'
import { zodResponseFormat } from 'openai/helpers/zod'

const Weather = z.object({
city: z.string(),
temperatureC: z.number(),
summary: z.string(),
})

const client = new OpenAI()
const r = await client.beta.chat.completions.parse({
model: 'gpt-4o',
messages: [{ role: 'user', content: '서울 날씨를 객체로 줘' }],
response_format: zodResponseFormat(Weather, 'weather'),
})
console.log(r.choices[0].message.parsed)

규칙: 호출이 필요하면 tool, 구조화된 답만 필요하면 structured output.

tool-calling 루프

다이어그램 로딩…

은 한 번으로 끝나지 않는다. 의 본질은 루프다.

LLM에 메시지 + tools 전달
LLM이 stop_reason="tool_use"로 응답
호출자가 도구 실행 → 받음
메시지 히스토리에 tool_result 추가
다시 LLM 호출 → 답 또는 다음 tool_use

이 루프가 의 SDK 버전이다.

tool-result와 tool-error

가 성공·실패 둘 다 반환할 수 있어야 한다. 와 를 같은 블록 구조로 흘리는 게 표준이다. Anthropic SDK의 메시지 블록 모양:

# Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
# Verified at: 2026-06-02
def tool_result(tool_use_id: str, content, is_error=False):
  return {
      "type": "tool_result",
      "tool_use_id": tool_use_id,
      "content": content,
      "is_error": is_error,
  }

ok = tool_result("toolu_01", [{"type": "text", "text": "서울 18도 맑음"}])
err = tool_result("toolu_02", "rate limited", is_error=True)

// Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
// Verified at: 2026-06-02
type Block = { type: 'text'; text: string }
function toolResult(id: string, content: Block[] | string, isError = false) {
return {
  type: 'tool_result' as const,
  tool_use_id: id,
  content,
  is_error: isError,
}
}

const ok = toolResult('toolu_01', [{ type: 'text', text: '서울 18도 맑음' }])
const err = toolResult('toolu_02', 'rate limited', true)

is_error: true를 LLM이 보면 대안 도구 또는 사용자에게 묻기로 전환한다.

재시도·폴백·에스컬레이션

실패는 평범한 일이다. 가 났을 때 은 코드에 미리 박아둬야 한다. 가 생각하는 시점에 임시 장애까지 처리하게 두면 토큰만 낭비된다.

# Verified against: https://docs.python.org/3/library/asyncio.html
# Verified at: 2026-06-02
import asyncio, random

async def call_with_retry(fn, *, retries=3, base=0.5):
  for i in range(retries):
      try:
          return await fn()
      except Exception as e:
          if i == retries - 1:
              raise
          # 지수 백오프 + 지터
          await asyncio.sleep(base * (2 ** i) + random.random() * 0.1)

// Verified against: https://nodejs.org/api/timers.html
// Verified at: 2026-06-02
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms))

export async function callWithRetry<T>(
fn: () => Promise<T>,
retries = 3,
base = 500,
): Promise<T> {
for (let i = 0; i < retries; i++) {
  try { return await fn() } catch (e) {
    if (i === retries - 1) throw e
    await sleep(base * 2 ** i + Math.random() * 100)
  }
}
throw new Error('unreachable')
}

규칙: 재시도가 통해야 멱등하다. 부수효과 있는 도구는 를 함께 보낸다.

tool-router: 도구 폭증 해소

가 50개를 넘어가면 이 헷갈리기 시작한다 — . 해법은 다.

# Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
# Verified at: 2026-06-02
from typing import Callable

TOOLS: dict[str, dict] = {...}  # 전체 카탈로그

def route(query: str, k: int = 5) -> list[dict]:
  """질의에 가장 관련 높은 k개만 노출."""
  # 실제로는 임베딩 유사도. 데모는 키워드.
  scored = sorted(TOOLS.items(),
                  key=lambda kv: score(query, kv[1]["description"]),
                  reverse=True)
  return [v for _, v in scored[:k]]

def score(q: str, desc: str) -> float:
  return sum(w in desc for w in q.split())

// Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
// Verified at: 2026-06-02
type Tool = { name: string; description: string; input_schema: object }
const TOOLS: Record<string, Tool> = {}

export function route(query: string, k = 5): Tool[] {
const score = (desc: string) =>
  query.split(' ').reduce((acc, w) => acc + (desc.includes(w) ? 1 : 0), 0)
return Object.values(TOOLS)
  .map((t) => [score(t.description), t] as const)
  .sort((a, b) => b[0] - a[0])
  .slice(0, k)
  .map(([, t]) => t)
}

agent-as-tool: 에이전트를 도구로 노출

여기가 큰 추상 점프 지점이다. — 다른 를 호출자 입장에서 일반 처럼 다룬다.

# Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
# Verified at: 2026-06-02
import httpx

researcher_as_tool = {
  "name": "delegate_to_researcher",
  "description": "주제를 받아 보고서를 만드는 동료 에이전트",
  "input_schema": {
      "type": "object",
      "properties": {"topic": {"type": "string"}},
      "required": ["topic"],
  },
}

async def call_researcher(topic: str) -> str:
  body = {
      "jsonrpc": "2.0", "id": 1, "method": "message/send",
      "params": {"message": {"messageId": "m1", "role": "user",
                              "parts": [{"text": topic}]}},
  }
  async with httpx.AsyncClient() as c:
      r = await c.post("https://example.com/a2a/v1", json=body)
      return r.json()["result"]["status"]["message"]

// Verified against: https://platform.claude.com/docs/en/docs/build-with-claude/tool-use
// Verified at: 2026-06-02
export const researcherAsTool = {
name: 'delegate_to_researcher',
description: '주제를 받아 보고서를 만드는 동료 에이전트',
input_schema: {
  type: 'object',
  properties: { topic: { type: 'string' } },
  required: ['topic'],
},
} as const

export async function callResearcher(topic: string): Promise<string> {
const r = await fetch('https://example.com/a2a/v1', {
  method: 'POST',
  headers: { 'content-type': 'application/json' },
  body: JSON.stringify({
    jsonrpc: '2.0', id: 1, method: 'message/send',
    params: { message: { messageId: 'm1', role: 'user', parts: [{ text: topic }] } },
  }),
})
const j = await r.json()
return j.result.status.message
}

호출자 LLM은 그냥 함수 하나를 본다. 안쪽에서는 이 도는데도.

같은 기능, 세 가지 노출 방식

문서 검색을 가정해보자. 어떻게 노출하느냐에 따라 시스템 모양이 달라진다.

방식	호출 인터페이스	장단
	`search(query)` 직접	빠름, 단발
	`tools/call` over MCP	호스트 앱에서 재사용
	검색 에이전트가 RAG 전체 담당	큰 단위 위임, 상태

같은 문제, 세 결정. 시스템이 작을수록 1번이 맞고, 도메인이 복잡할수록 3번이 맞다.

에러를 위로 흘리는 법

가 났을 때 세 갈래 길이 있다.

재시도(같은 도구): 일시 장애. + .
폴백(다른 도구): 비슷한 기능 도구로 대체. LLM이 골라야 한다.
에스컬레이션(위로): 위 두 가지가 안 되면 는 부모 또는 사람에게 넘긴다 — .

LLM의 이 위 셋 중 어디로 갈지를 결정한다. 시스템 프롬프트에 규칙을 박아두자: “동일 도구 3회 실패 시 다른 도구 시도, 5회 시 사용자에게 보고”.

다음 챕터로

, 자연스럽게 한 명이 여럿을 지휘하는 패턴이 보인다. 다음 장에서는 — 한 감독자가 여러 를 디스패치하고 결과를 합치는 가장 흔한 구조다.