Programmatic Tool Calling

Claude introduces to a new way of tool calling early 2026 called programmatic tool calling. The key difference between programmatic tool calling and previous tool calling (code_execution_20250825) is it allows Claude to create python script wrapping around user defined tool callings. The python script is run on server side on sandbox. Traditional tool calling only lets Claude request calling a function user defines. Claude can only take the results and try to get final response to you. Claude doesn't have the ability to create its own code to get final response. With the help of programmatic tool calling, Claude can not only requests tool calling on user defined functions but also can integrate the results into python code it drafted to get final response. There are a few typical scenarios that would benefit from this ability:

In some cases, Claude would need to call multiple user defined functions to get to the final response. For example, if user asks to get the total ages of her kids, then Claude would need to call a get_all_kids_names() function to get all the kids' names and then subsequentially call get_age_by_name(name) multiple times to get the age of each kid. Then sum them up to get to final response. Without programtic tool calling, Claude would need to raise function call request per each kid. There are multiple rounds of communication, each communication contains the full session context. With programtic tool calling, Claude can write a python code to get all kids' ages and sum them up. There will be one round of communication with multiple function calls in one go, which largely saves input token cost.
Programmatic tool calling is particularly useful for data analysis tasks, which typically requires two steps: first to run the SQL query on database to get the data and then use python pandas package to perform analysis. User only needs to define a tooling to run SQL query on database and return the raw data to Claude. Then Claude can build python script around the raw data to complete the requested analysis. A big benefit of programmatic tool calling is that intermediate tool results are never loaded into Claude's context window — only the final code execution output is.

I create a simple code example to illustrate how programmatic tool calling can be used. In this example, I have a CLI database, which contains two tables:

cli_application, where each row contains a credit limit increase record. The columns are: account_id, application_id, application_date, application_status (approved, declined, pending)
cli_rules, which contains all the rules and evaluation results within a given limit increase application. The columns are: application_id, rule_id, description, rule_status (0=referred, 1=declined, 2=passed, 3=ignored)
Each application contains multiple rules and each rule is evaluated.

I want to answer this question: what are the top cli application declined reasons for the last 7 days?

This is a perfect data analysis scenario, where requires Claude to:

Create the query based on the database schema
Call user defined tooling query_database(sql:string) to fetch the raw data
Wrap the tool calling with a sever side python script. In this example the python script's job is just to print the data in tableau format. But in real-world case, python script will perform heavy lifting analysis tasks

Now let's dive in the code.

Step 1: create a mock up query_database function. This function takes an SQL query and return the database result in a pandas dataframe object. In this example, I just hard coded some dummy records to return. In real-world, this is a generic function which takes any SQL query and try to extract the data and return.

import pandas as pd

def query_database(sql):
    # Placeholder function to execute the SQL query against the database
    # In a real implementation, this would connect to the database and execute the query
    print(f"Executing SQL query: {sql}")
    # Simulate a response from the database
    df = pd.DataFrame({
        "decline_reason": ["Reason 1", "Reason 2", "Reason 3"],
        "occurrence_count": [10, 7, 5] 
    })
    return df

Step 2: prepare the Claude API call parameters.

import json

import anthropic

client = anthropic.Anthropic()

model = "claude-opus-4-6"

system_prompt = '''
You have the complete database schema below. Treat it as authoritative.
Do not inspect or request schema metadata.
Do not query sqlite_master, pragma_table_info, information_schema, or similar tables.
Write SQL only against these tables/columns:

cli_applications(account_id, application_id, application_date, application_status)
cli_rules(application_id, rule_id, description, rule_status)

Semantics:
- cli_applications.application_status: approved, declined, pending
- cli_rules.rule_status: 0=referred, 1=declined, 2=passed, 3=ignored
'''

tools = [
    {"type": "code_execution_20260120", "name": "code_execution"},
    {
        "name": "query_database",
        "description": (
            "Execute a SQL query against the CLI applications database. "
            "Returns a JSON string — an array of row objects, "
            "e.g. '[{\"decline_reason\": \"...\", \"occurrence_count\": 5}]'. "
            "Parse with json.loads() to get a list of dicts."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "sql": {"type": "string", "description": "The SQL query to execute."}
            },
            "required": ["sql"],
        },
        "allowed_callers": ["code_execution_20260120"],
    },
]

messages = [
    {
        "role": "user",
        "content": "Get me the top CLI declined reasons for the last 7 days.",
    }
]

Notice that the tools list contains the code execution tool ({"type": "code_execution_20260120", "name": "code_execution"}). Then I provide the query_database(). Inside I set the allowed_callers to be the container. This declaration restricts that only code execution sandbox can call my function. Claude's direct reasoning cannot. Why this is important? Because if I don't set this restriction, Claude may directly read the return from query_database() and produce response. With this restriction, I force Claude to write server side script to process the return data. Claude cannot directly reason from the return.

Also notice that in order to teach Claude how to write the query, I provide the database schema in the system prompt. Without this explanation, Claude may try to run a query to read the database master schema and learn how to write the query. This system prompt saves extra learning steps.

In this example, I ask Claude to get me the top credit limit increase declined reasons for the last 7 days.

response = client.messages.create(
    model=model,
    max_tokens=4096,
    system=system_prompt,
    messages=messages,
    tools=tools,
)

print(response)

messages.append({"role": "assistant", "content": response.content})

container = getattr(response, 'container', None)
container_id = container.id if container else None

The response looks like this:

Message(
    id="xxx"
    container=Container(
        id="srvtoolu_01hq2Jg6BeyErjP7VbtjW5T",
        expires_at=datetime.datetime(yyyy,mm,dd,hh,mm,ss)
    ),
    content=[
        TextBlock(text="I'll help you find the top CLI declined reasons for the last 7 days. Let me query the database for that information."),
        ServerToolUseBlock(
            id="xxx",
            caller=DirectCaller(type="direct"),
            input={
                "code": import json\n\nresult = await query_database({"sql": select ...})\n\nrows = json.loads(result)...
            },
            name="code_execution",
            type="server_tool_use"
        ),
        ToolUseBlock(
            id="xxx",
            caller="ServerToolCaller20260120(
                tool_id="srvtoolu_01hq2Jg6BeyErjP7VbtjW5T",
                type="code_execution_20260120"
            ),
            input={
                "sql": "select ..."
            },
            name="query_database",
            type="tool_use"
        )
    ],
    model="xxx",
    role="assistant",
    stop_reason="tool_use",
    stop_sequence=None,
    type="message",
    usage=Usage(...),
    server_tool_use=ServerToolUsage(web_fetch_requests=0, web_search_requests=0),
    input_tokens=xxx,
    output_tokens=xxx,
    ...
)

In this return, the first important information is the container info. The container ID is what we need to save for subsequent messages. The container will expire. If the code takes longer time to run before the container expires, you will get an container expired error so you can re-run.

The content contains 3 elements:

The TextBlock, where Claude tells you that it will run server side script to get the answer.
The ServerToolUseBlock, which contains the code Claude creates to be run in the container.
The ToolUseBlock, which is the tool calling on user-defined function. The caller is set to be the ServerToolUseBlock's ID, which means that the result of this normal tool calling will be consumed by the ServerToolUseBlock.

You may ask this question: what if the server side script has multiple tool callings. How can Claude know which tool calling result is for which function call inside its own code? In our example, the server side script only calls a single user-defined function. So it knows which return is for which call. But how about there are more than one tool callings? It turns out that on the server side, Claude creates some place holding IDs to link with each tool calling. So each tool calling result is mapped back to the server side script with no confusion. This server side setup is not visible from the response json.

The stop_reason is tool_use, so we will call our function and return the results to Claude. This is usually implemented as a loop:

from anthropic import APIStatusError

while response.stop_reason == "tool_use":
    tool_calls = [b for b in response.content if b.type == 'tool_use']
    tool_results = []
    for tool_call in tool_calls:
        if tool_call.name == "query_database":
            sql_query = tool_call.input.get("sql")
            if sql_query:
                result = query_database(sql_query)
                content = result.to_json(orient="records") if hasattr(result, "to_json") else json.dumps(result)
            else:
                content = json.dumps({"error": "No SQL query provided in tool call."})
        else:
            content = json.dumps({"error": f"Unknown tool: {tool_call.name}"})


        tool_results.append({
            "type": "tool_result",
            "tool_use_id": tool_call.id,
            "content": content
        })
   
    messages.append({"role": "user", "content": tool_results})
    print(f"Message ==> {messages}")

    try:
        response = client.messages.create(
            model=model,
            max_tokens=4096,
            system=system_prompt,
            messages=messages,
            tools=tools,
            extra_body={"container": container_id} if container_id else {},
        )
    except APIStatusError as e:
            if "container_expired" in str(e):
                print("Container expired — restarting without container")
                container_id = None  # drop it, next call creates a fresh one
                response = client.messages.create(
                    model=model,
                    max_tokens=4096,
                    system=system_prompt,
                    messages=messages,
                    tools=tools,
                )
            else:
                raise

    print(response)

    container = getattr(response, 'container', None)
    if container:
        container_id = container.id

    messages.append({"role": "assistant", "content": response.content})

The loop will continue if the stop_reason is tool_use. Then it will loop through all the tool_use blocks, get the database query result, dump into a json object, and append the result to the messages list:

tool_results.append({
    "type": "tool_result",
    "tool_use_id": tool_call.id,
    "content": content
})

messages.append({"role": "user", "content": tool_results})

After the first iteration, the messages list looks like:

[
    {
        "role": "user",
        "content": "Get me the top CLI declined reasons for the last 7 days"
    },
    {
        "role": "assistant",
        "content": [
            TextBlock(...),
            ServerToolUseBlock(...),
            ToolUseBlock(...)
        ]
    },
    {
        "role": "user",
        "content": [
            {
                "type": "tool_result",
                "tool_use_id": "",
                "content": '[{"decline_reason": "ABC", "occurrence_count": 10},{"decline_reason": "OPQ", "occurrence_count": 7},{"decline_reason": "XYZ", "occurrence_count": 6}]'
            }
        ]
    }
]

You can also see that on each subsequent Claude API call, I provide the container ID from previous call:

extra_body={"container": container_id} if container_id else {},

This is required, otherwise Claude will lose the container information and cannot tell which container to run.

The code also demonstrates when container expires before the code finishes. In this way, we catch the exception and re-call the Claude API for retry.

In our example, there is one round of communication and Claude can provide the final response. So the loop is run for one iteration. In this iteration, there is only one tool calling.

After the loop, the final response looks like:

Message(
    id="xxx",
    container=None,
    content=[
        CodeExecutionToolResultBlock(
            content=CodeExecutionResultBlock(
                content=[],
                return_code=0,
                stderr="",
                stdout="...",
                type="code_execution_result",
                abort_reason=None
            ),
            tool_use_id="xxx",
            type="code_execution_tool_result"
        ),
        TextBlock(
            text="..."
        ),
    ],
    model="",
    role="assistant",
    stop_reason="end_turn",
    stop_sequence=None,
    type="message",
    ...
)

You can extract the final response:

text = next(b.text for b in response.content if b.type == "text")
print("Final response:", text)

The output would be something like this:

Final response: Here are the **top CLI declined reasons for the last 7 days**:

| Rank | Decline Reason | Count |
|------|---------------|-------|
| 1 | Reason 1 | 10 |
| 2 | Reason 2 | 7 |
| 3 | Reason 3 | 5 |

### Key Takeaways:
- **Reason 1** is the most frequent decline reason, accounting for the highest number of declined applications.
- **Reason 2** follows as the second most common cause.
- **Reason 3** rounds out the top 3.

These results reflect rules with a `rule_status` of **1 (declined)** tied to applications that were **declined** within the past 7 days. Would you like me to drill deeper into any specific decline reason or break this down further (e.g., by day or account)?

Summary #

If you want Claude API to do programtic tool callings, you need to restrict Claude to only call your defined functions from container sandbox. Otherwise Claude may read the results and produce response directly.
The main loop is similar to normal tool callings. The only extra work is to track the container ID from previous response and attach it as extra_body in the subsequent calls.