Responses API

Using Northstar

Build your own computer-use loop where you control every step.

The Responses API lets you build a computer-use loop — a pattern where Northstar looks at a screenshot of a computer screen, decides the next action (click, type, scroll), and you execute it — repeating until the work is done. Unlike Tasks where the model operates autonomously, the Responses API gives you control at every step.

How it works

Send input — provide a text instruction and optionally a screenshot
Get output — the model returns either a text message or a computer_call action
Execute the action — perform the click, type, or scroll on the computer
Feed back the result — send a screenshot of the new state as computer_call_output
Repeat until the model returns a message (it’s done) or you decide to stop

Input format

The input field accepts either a string or an array of message objects:

String — simplest form, for text-only instructions
Array — when you need to include images (screenshots) or structured multi-turn input

Both formats work identically for text-only requests. Use the array format when you need to attach a screenshot with input_image.

Create a response

The simplest form — pass a string as input:

from tzafon import Lightcone

client = Lightcone()

response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    input="Open the terminal and check disk usage",
    tools=[
        {
            "type": "computer_use",
            "display_width": 1280,
            "display_height": 720,
            "environment": "desktop",
        },
    ],
)

import Lightcone from "@tzafon/lightcone";

const client = new Lightcone();

const response = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  input: "Open the terminal and check disk usage",
  tools: [
    {
      type: "computer_use",
      display_width: 1280,
      display_height: 720,
      environment: "desktop",
    },
  ],
});

When you need to include a screenshot (common in computer-use loops), use the array format:

response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Click the search button"},
                {"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
            ],
        },
    ],
    tools=[
        {
            "type": "computer_use",
            "display_width": 1280,
            "display_height": 720,
            "environment": "desktop",
        },
    ],
)

const response = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  input: [
    {
      role: "user",
      content: [
        { type: "input_text", text: "Click the search button" },
        { type: "input_image", image_url: screenshotUrl, detail: "auto" },
      ],
    },
  ],
  tools: [
    {
      type: "computer_use",
      display_width: 1280,
      display_height: 720,
      environment: "desktop",
    },
  ],
});

Process the output

The response output is an array of items. Each item has a type:

Type	Meaning
`computer_call`	The model wants to perform an action (click, type, scroll, and more)
`message`	The model is responding with text (work may be done)
`reasoning`	Internal reasoning (if available)

When you get a computer_call, the action field tells you what to do:

for item in response.output:
    if item.type == "computer_call":
        action = item.action
        print(f"Action: {action.type}")  # e.g., "click", "type", "navigate"
        print(f"Coordinates: ({action.x}, {action.y})")
        print(f"Text: {action.text}")

    elif item.type == "message":
        for block in item.content:
            print(block.text)

for (const item of response.output ?? []) {
  if (item.type === "computer_call") {
    console.log(`Action: ${item.action?.type}`);
    console.log(`Coordinates: (${item.action?.x}, ${item.action?.y})`);
    console.log(`Text: ${item.action?.text}`);

  } else if (item.type === "message") {
    for (const block of item.content ?? []) {
      console.log(block.text);
    }
  }
}

Action types

The model returns actions with coordinates in the 0–999 model space. Denormalize to pixel coordinates before passing to computer.click() or other methods: x = int(action.x / 1000 * display_width).

Here’s what the raw JSON looks like on the wire — useful if you’re calling the API directly without an SDK:

{
  "output": [
    {
      "type": "computer_call",
      "call_id": "call_abc123",
      "action": {
        "type": "click",
        "x": 450,
        "y": 320,
        "button": "left"
      }
    }
  ]
}

Note that the action kind is always action.type (e.g., "click", "type", "scroll") and coordinates are separate x and y fields — not a tuple.

The model can request these actions:

Action	Fields	Description
`click`	`x`, `y`, `button`	Click at coordinates
`double_click`	`x`, `y`	Double-click
`triple_click`	`x`, `y`	Triple-click (select a line)
`right_click`	`x`, `y`	Right-click
`type`	`text`	Type text
`key` / `keypress`	`keys`	Press key combination
`key_down` / `key_up`	`keys`	Hold / release key
`scroll`	`x`, `y`, `scroll_y`	Scroll vertically
`hscroll`	`x`, `y`, `scroll_x`	Scroll horizontally
`navigate`	`url`	Go to a URL (browser mode)
`drag`	`x`, `y`, `end_x`, `end_y`	Drag between two points
`wait`	—	Wait for the screen to settle
`terminate`	`status`, `result`	Work is complete (`status`: `"success"` or `"failure"`)
`answer`	`result`	Answer a question with findings
`done`	`text`	Work is complete (alias)

Extracting information

When tools includes computer_use, the model strongly prefers returning actions over text. To extract data from a page — for scraping, research, or monitoring — use a two-phase pattern:

Explore (with tools) — the agent navigates, scrolls, dismisses popups
Extract (without tools) — the model reads the screen and responds with text

TOOL = {
    "type": "computer_use",
    "display_width": 1280,
    "display_height": 720,
    "environment": "browser",
}

with client.computer.create(kind="browser") as computer:
    computer.navigate("https://example.com/pricing")
    computer.wait(3)

    # --- Phase 1: Explore WITH tools (agent scrolls, dismisses popups, etc.) ---
    screenshot_url = computer.get_screenshot_url(computer.screenshot())
    response = client.responses.create(
        model="tzafon.northstar-cua-fast",
        tools=[TOOL],
        input=[{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Scroll down slowly. Dismiss any popups. Stop when you can see pricing details."},
                {"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
            ],
        }],
    )

    for step in range(10):
        computer_call = next(
            (o for o in (response.output or []) if o.type == "computer_call"), None
        )
        if not computer_call:
            break
        action = computer_call.action
        if action.type in ("terminate", "done", "answer"):
            break

        # Execute the action (click, scroll, type, etc.)
        # ... see the CUA protocol guide for the full execute_action helper ...

        computer.wait(1)
        screenshot_url = computer.get_screenshot_url(computer.screenshot())
        response = client.responses.create(
            model="tzafon.northstar-cua-fast",
            previous_response_id=response.id,
            tools=[TOOL],
            input=[{
                "type": "computer_call_output",
                "call_id": computer_call.call_id,
                "output": {"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
            }],
        )

    # --- Phase 2: Extract WITHOUT tools (forces a text response) ---
    screenshot_url = computer.get_screenshot_url(computer.screenshot())
    extraction = client.responses.create(
        model="tzafon.northstar-cua-fast",
        input=[{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "What is the price shown on this page? Reply with just the dollar amount."},
                {"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
            ],
        }],
        # No tools — the model MUST respond with text, not actions
    )

    for item in extraction.output or []:
        if item.type == "message":
            for block in item.content or []:
                print(block.text)  # e.g., "$29.99"

const tool = {
  type: "computer_use" as const,
  display_width: 1280,
  display_height: 720,
  environment: "browser",
};

const computer = await client.computers.create({ kind: "browser" });
const id = computer.id!;

try {
  await client.computers.navigate(id, { url: "https://example.com/pricing" });
  await new Promise((r) => setTimeout(r, 3000));

  // --- Phase 1: Explore WITH tools ---
  let screenshot = await client.computers.screenshot(id);
  let screenshotUrl = screenshot.result?.screenshot_url as string;

  let response = await client.responses.create({
    model: "tzafon.northstar-cua-fast",
    tools: [tool],
    input: [{
      role: "user",
      content: [
        { type: "input_text", text: "Scroll down slowly. Dismiss any popups. Stop when you can see pricing details." },
        { type: "input_image", image_url: screenshotUrl, detail: "auto" },
      ],
    }],
  });

  for (let step = 0; step < 10; step++) {
    const computerCall = response.output?.find((item) => item.type === "computer_call");
    if (!computerCall) break;

    const action = computerCall.action!;
    if (["terminate", "done", "answer"].includes(action.type!)) break;

    // Execute the action (click, scroll, type, etc.)
    // ... see the CUA protocol guide for the full action switch ...

    await new Promise((r) => setTimeout(r, 1000));
    screenshot = await client.computers.screenshot(id);
    screenshotUrl = screenshot.result?.screenshot_url as string;

    response = await client.responses.create({
      model: "tzafon.northstar-cua-fast",
      previous_response_id: response.id!,
      tools: [tool],
      input: [{
        type: "computer_call_output",
        call_id: computerCall.call_id!,
        output: { type: "input_image", image_url: screenshotUrl, detail: "auto" },
      }],
    });
  }

  // --- Phase 2: Extract WITHOUT tools ---
  screenshot = await client.computers.screenshot(id);
  screenshotUrl = screenshot.result?.screenshot_url as string;

  const extraction = await client.responses.create({
    model: "tzafon.northstar-cua-fast",
    input: [{
      role: "user",
      content: [
        { type: "input_text", text: "What is the price shown on this page? Reply with just the dollar amount." },
        { type: "input_image", image_url: screenshotUrl, detail: "auto" },
      ],
    }],
    // No tools — the model MUST respond with text, not actions
  });

  for (const item of extraction.output ?? []) {
    if (item.type === "message") {
      for (const block of item.content ?? []) {
        console.log(block.text); // e.g., "$29.99"
      }
    }
  }
} finally {
  await client.computers.delete(id);
}

For a complete working example of this pattern — extracting structured data from multiple websites — see competitor_research.py in the examples directory.

Multi-turn chaining

Use previous_response_id to chain conversations without resending the full history:

# First turn
response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    input="Open the file manager",
    tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],
)

# Execute the action, take a screenshot, then continue
followup = client.responses.create(
    model="tzafon.northstar-cua-fast",
    previous_response_id=response.id,
    input=[
        {
            "type": "computer_call_output",
            "call_id": response.output[0].call_id,
            "output": {"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
        },
    ],
    tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],
)

// First turn
const response = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  input: "Open the file manager",
  tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],
});

const followup = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  previous_response_id: response.id!,
  input: [
    {
      type: "computer_call_output",
      call_id: response.output![0].call_id!,
      output: { type: "input_image", image_url: screenshotUrl, detail: "auto" },
    },
  ],
  tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],
});

System instructions

Use instructions to set a system prompt that guides the model’s behavior across turns:

response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    instructions="You are operating a desktop computer. Be careful and verify each action before proceeding.",
    input="Find the system settings and check the display resolution",
    tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],
)

const response = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  instructions: "You are operating a desktop computer. Be careful and verify each action before proceeding.",
  input: "Find the system settings and check the display resolution",
  tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],
});

The instructions persist across chained turns when using previous_response_id.

Set store: false to skip response persistence. Responses created with store: false cannot be referenced via previous_response_id.

Manage responses

Each response has a status field:

Status	Meaning
`in_progress`	The model is still generating output
`completed`	The response is ready
`failed`	Something went wrong
`cancelled`	You cancelled it via `cancel()`

# Retrieve a response
response = client.responses.retrieve("resp_abc123")
print(response.status)  # "completed", "in_progress", etc.

# Cancel an in-progress response
client.responses.cancel("resp_abc123")

# Delete a response
client.responses.delete("resp_abc123")

const response = await client.responses.retrieve("resp_abc123");
console.log(response.status); // "completed", "in_progress", etc.

await client.responses.cancel("resp_abc123");
await client.responses.delete("resp_abc123");

Models

Model	Best for
`tzafon.northstar-cua-fast`	Computer-use — optimized for operating desktops and browsers
`tzafon.sm-1`	Low-level Rust programming

Responses API

How it works

Input format

Create a response

Process the output

Action types

Extracting information

Multi-turn chaining

System instructions

Manage responses

Models

See also

What can I help you with?

Suggestions