Programmatic Tool Calling

Published

Tags: AI

Claude introduces to a new way of tool calling early 2026 called programmatic tool calling. The key difference between programmatic tool calling and previous tool calling (code_execution_20250825) is it allows Claude to create python script wrapping around user defined tool callings. The python script is run on server side on sandbox. Traditional tool calling only lets Claude request calling a function user defines. Claude can only take the results and try to get final response to you. Claude doesn't have the ability to create its own code to get final response. With the help of programmatic tool calling, Claude can not only requests tool calling on user defined functions but also can integrate the results into python code it drafted to get final response. There are a few typical scenarios that would benefit from this ability:

I create a simple code example to illustrate how programmatic tool calling can be used. In this example, I have a CLI database, which contains two tables:

I want to answer this question: what are the top cli application declined reasons for the last 7 days?

This is a perfect data analysis scenario, where requires Claude to:

Now let's dive in the code.

Step 1: create a mock up query_database function. This function takes an SQL query and return the database result in a pandas dataframe object. In this example, I just hard coded some dummy records to return. In real-world, this is a generic function which takes any SQL query and try to extract the data and return.

import pandas as pd

def query_database(sql):
# Placeholder function to execute the SQL query against the database
# In a real implementation, this would connect to the database and execute the query
print(f"Executing SQL query: {sql}")
# Simulate a response from the database
df = pd.DataFrame({
"decline_reason": ["Reason 1", "Reason 2", "Reason 3"],
"occurrence_count": [10, 7, 5]
})
return df

Step 2: prepare the Claude API call parameters.

import json

import anthropic

client = anthropic.Anthropic()

model = "claude-opus-4-6"

system_prompt = '''
You have the complete database schema below. Treat it as authoritative.
Do not inspect or request schema metadata.
Do not query sqlite_master, pragma_table_info, information_schema, or similar tables.
Write SQL only against these tables/columns:

cli_applications(account_id, application_id, application_date, application_status)
cli_rules(application_id, rule_id, description, rule_status)

Semantics:
- cli_applications.application_status: approved, declined, pending
- cli_rules.rule_status: 0=referred, 1=declined, 2=passed, 3=ignored
'''


tools = [
{"type": "code_execution_20260120", "name": "code_execution"},
{
"name": "query_database",
"description": (
"Execute a SQL query against the CLI applications database. "
"Returns a JSON string — an array of row objects, "
"e.g. '[{\"decline_reason\": \"...\", \"occurrence_count\": 5}]'. "
"Parse with json.loads() to get a list of dicts."
),
"input_schema": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "The SQL query to execute."}
},
"required": ["sql"],
},
"allowed_callers": ["code_execution_20260120"],
},
]

messages = [
{
"role": "user",
"content": "Get me the top CLI declined reasons for the last 7 days.",
}
]

Notice that the tools list contains the code execution tool ({"type": "code_execution_20260120", "name": "code_execution"}). Then I provide the query_database(). Inside I set the allowed_callers to be the container. This declaration restricts that only code execution sandbox can call my function. Claude's direct reasoning cannot. Why this is important? Because if I don't set this restriction, Claude may directly read the return from query_database() and produce response. With this restriction, I force Claude to write server side script to process the return data. Claude cannot directly reason from the return.

Also notice that in order to teach Claude how to write the query, I provide the database schema in the system prompt. Without this explanation, Claude may try to run a query to read the database master schema and learn how to write the query. This system prompt saves extra learning steps.

In this example, I ask Claude to get me the top credit limit increase declined reasons for the last 7 days.

response = client.messages.create(
model=model,
max_tokens=4096,
system=system_prompt,
messages=messages,
tools=tools,
)

print(response)

messages.append({"role": "assistant", "content": response.content})

container = getattr(response, 'container', None)
container_id = container.id if container else None

The response looks like this:

Message(
    id="xxx"
    container=Container(
        id="srvtoolu_01hq2Jg6BeyErjP7VbtjW5T",
        expires_at=datetime.datetime(yyyy,mm,dd,hh,mm,ss)
    ),
    content=[
        TextBlock(text="I'll help you find the top CLI declined reasons for the last 7 days. Let me query the database for that information."),
        ServerToolUseBlock(
            id="xxx",
            caller=DirectCaller(type="direct"),
            input={
                "code": import json\n\nresult = await query_database({"sql": select ...})\n\nrows = json.loads(result)...
            },
            name="code_execution",
            type="server_tool_use"
        ),
        ToolUseBlock(
            id="xxx",
            caller="ServerToolCaller20260120(
                tool_id="srvtoolu_01hq2Jg6BeyErjP7VbtjW5T",
                type="code_execution_20260120"
            ),
            input={
                "sql": "select ..."
            },
            name="query_database",
            type="tool_use"
        )
    ],
    model="xxx",
    role="assistant",
    stop_reason="tool_use",
    stop_sequence=None,
    type="message",
    usage=Usage(...),
    server_tool_use=ServerToolUsage(web_fetch_requests=0, web_search_requests=0),
    input_tokens=xxx,
    output_tokens=xxx,
    ...
)

In this return, the first important information is the container info. The container ID is what we need to save for subsequent messages. The container will expire. If the code takes longer time to run before the container expires, you will get an container expired error so you can re-run.

The content contains 3 elements:

  1. The TextBlock, where Claude tells you that it will run server side script to get the answer.
  2. The ServerToolUseBlock, which contains the code Claude creates to be run in the container.
  3. The ToolUseBlock, which is the tool calling on user-defined function. The caller is set to be the ServerToolUseBlock's ID, which means that the result of this normal tool calling will be consumed by the ServerToolUseBlock.

You may ask this question: what if the server side script has multiple tool callings. How can Claude know which tool calling result is for which function call inside its own code? In our example, the server side script only calls a single user-defined function. So it knows which return is for which call. But how about there are more than one tool callings? It turns out that on the server side, Claude creates some place holding IDs to link with each tool calling. So each tool calling result is mapped back to the server side script with no confusion. This server side setup is not visible from the response json.

The stop_reason is tool_use, so we will call our function and return the results to Claude. This is usually implemented as a loop:

from anthropic import APIStatusError

while response.stop_reason == "tool_use":
tool_calls = [b for b in response.content if b.type == 'tool_use']
tool_results = []
for tool_call in tool_calls:
if tool_call.name == "query_database":
sql_query = tool_call.input.get("sql")
if sql_query:
result = query_database(sql_query)
content = result.to_json(orient="records") if hasattr(result, "to_json") else json.dumps(result)
else:
content = json.dumps({"error": "No SQL query provided in tool call."})
else:
content = json.dumps({"error": f"Unknown tool: {tool_call.name}"})


tool_results.append({
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": content
})

messages.append({"role": "user", "content": tool_results})
print(f"Message ==> {messages}")

try:
response = client.messages.create(
model=model,
max_tokens=4096,
system=system_prompt,
messages=messages,
tools=tools,
extra_body={"container": container_id} if container_id else {},
)
except APIStatusError as e:
if "container_expired" in str(e):
print("Container expired — restarting without container")
container_id = None # drop it, next call creates a fresh one
response = client.messages.create(
model=model,
max_tokens=4096,
system=system_prompt,
messages=messages,
tools=tools,
)
else:
raise

print(response)

container = getattr(response, 'container', None)
if container:
container_id = container.id

messages.append({"role": "assistant", "content": response.content})

The loop will continue if the stop_reason is tool_use. Then it will loop through all the tool_use blocks, get the database query result, dump into a json object, and append the result to the messages list:

tool_results.append({
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": content
})

messages.append({"role": "user", "content": tool_results})

After the first iteration, the messages list looks like:

[
    {
        "role": "user",
        "content": "Get me the top CLI declined reasons for the last 7 days"
    },
    {
        "role": "assistant",
        "content": [
            TextBlock(...),
            ServerToolUseBlock(...),
            ToolUseBlock(...)
        ]
    },
    {
        "role": "user",
        "content": [
            {
                "type": "tool_result",
                "tool_use_id": "",
                "content": '[{"decline_reason": "ABC", "occurrence_count": 10},{"decline_reason": "OPQ", "occurrence_count": 7},{"decline_reason": "XYZ", "occurrence_count": 6}]'
            }
        ]
    }
]

You can also see that on each subsequent Claude API call, I provide the container ID from previous call:

extra_body={"container": container_id} if container_id else {},

This is required, otherwise Claude will lose the container information and cannot tell which container to run.

The code also demonstrates when container expires before the code finishes. In this way, we catch the exception and re-call the Claude API for retry.

In our example, there is one round of communication and Claude can provide the final response. So the loop is run for one iteration. In this iteration, there is only one tool calling.

After the loop, the final response looks like:

Message(
    id="xxx",
    container=None,
    content=[
        CodeExecutionToolResultBlock(
            content=CodeExecutionResultBlock(
                content=[],
                return_code=0,
                stderr="",
                stdout="...",
                type="code_execution_result",
                abort_reason=None
            ),
            tool_use_id="xxx",
            type="code_execution_tool_result"
        ),
        TextBlock(
            text="..."
        ),
    ],
    model="",
    role="assistant",
    stop_reason="end_turn",
    stop_sequence=None,
    type="message",
    ...
)

You can extract the final response:

text = next(b.text for b in response.content if b.type == "text")
print("Final response:", text)

The output would be something like this:

Final response: Here are the **top CLI declined reasons for the last 7 days**:

| Rank | Decline Reason | Count |
|------|---------------|-------|
| 1 | Reason 1 | 10 |
| 2 | Reason 2 | 7 |
| 3 | Reason 3 | 5 |

### Key Takeaways:
- **Reason 1** is the most frequent decline reason, accounting for the highest number of declined applications.
- **Reason 2** follows as the second most common cause.
- **Reason 3** rounds out the top 3.

These results reflect rules with a `rule_status` of **1 (declined)** tied to applications that were **declined** within the past 7 days. Would you like me to drill deeper into any specific decline reason or break this down further (e.g., by day or account)?

Summary #