Responses API
Build your own computer-use loop where you control every step.
The Responses API lets you build a computer-use loop — a pattern where Northstar looks at a screenshot of a computer screen, decides the next action (click, type, scroll), and you execute it — repeating until the work is done. Unlike Tasks where the model operates autonomously, the Responses API gives you control at every step.
How it works
Section titled “How it works”- Send input — provide a text instruction and optionally a screenshot
- Get output — the model returns either a text
messageor acomputer_callaction - Execute the action — perform the click, type, or scroll on the computer
- Feed back the result — send a screenshot of the new state as
computer_call_output - Repeat until the model returns a
message(it’s done) or you decide to stop
Input format
Section titled “Input format”The input field accepts either a string or an array of message objects:
- String — simplest form, for text-only instructions
- Array — when you need to include images (screenshots) or structured multi-turn input
Both formats work identically for text-only requests. Use the array format when you need to attach a screenshot with input_image.
Create a response
Section titled “Create a response”The simplest form — pass a string as input:
from tzafon import Lightcone
client = Lightcone()
response = client.responses.create( model="tzafon.northstar-cua-fast", input="Open the terminal and check disk usage", tools=[ { "type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop", }, ],)import Lightcone from "@tzafon/lightcone";
const client = new Lightcone();
const response = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: "Open the terminal and check disk usage", tools: [ { type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop", }, ],});When you need to include a screenshot (common in computer-use loops), use the array format:
response = client.responses.create( model="tzafon.northstar-cua-fast", input=[ { "role": "user", "content": [ {"type": "input_text", "text": "Click the search button"}, {"type": "input_image", "image_url": screenshot_url, "detail": "auto"}, ], }, ], tools=[ { "type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop", }, ],)const response = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: [ { role: "user", content: [ { type: "input_text", text: "Click the search button" }, { type: "input_image", image_url: screenshotUrl, detail: "auto" }, ], }, ], tools: [ { type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop", }, ],});Process the output
Section titled “Process the output”The response output is an array of items. Each item has a type:
| Type | Meaning |
|---|---|
computer_call | The model wants to perform an action (click, type, scroll, and more) |
message | The model is responding with text (work may be done) |
reasoning | Internal reasoning (if available) |
When you get a computer_call, the action field tells you what to do:
for item in response.output: if item.type == "computer_call": action = item.action print(f"Action: {action.type}") # e.g., "click", "type", "navigate" print(f"Coordinates: ({action.x}, {action.y})") print(f"Text: {action.text}")
elif item.type == "message": for block in item.content: print(block.text)for (const item of response.output ?? []) { if (item.type === "computer_call") { console.log(`Action: ${item.action?.type}`); console.log(`Coordinates: (${item.action?.x}, ${item.action?.y})`); console.log(`Text: ${item.action?.text}`);
} else if (item.type === "message") { for (const block of item.content ?? []) { console.log(block.text); } }}Action types
Section titled “Action types”The model returns actions with coordinates in the 0–999 model space. Denormalize to pixel coordinates before passing to computer.click() or other methods: x = int(action.x / 1000 * display_width).
Here’s what the raw JSON looks like on the wire — useful if you’re calling the API directly without an SDK:
{ "output": [ { "type": "computer_call", "call_id": "call_abc123", "action": { "type": "click", "x": 450, "y": 320, "button": "left" } } ]}Note that the action kind is always action.type (e.g., "click", "type", "scroll") and coordinates are separate x and y fields — not a tuple.
The model can request these actions:
| Action | Fields | Description |
|---|---|---|
click | x, y, button | Click at coordinates |
double_click | x, y | Double-click |
triple_click | x, y | Triple-click (select a line) |
right_click | x, y | Right-click |
type | text | Type text |
key / keypress | keys | Press key combination |
key_down / key_up | keys | Hold / release key |
scroll | x, y, scroll_y | Scroll vertically |
hscroll | x, y, scroll_x | Scroll horizontally |
navigate | url | Go to a URL (browser mode) |
drag | x, y, end_x, end_y | Drag between two points |
wait | — | Wait for the screen to settle |
terminate | status, result | Work is complete (status: "success" or "failure") |
answer | result | Answer a question with findings |
done | text | Work is complete (alias) |
Extracting information
Section titled “Extracting information”When tools includes computer_use, the model strongly prefers returning actions over text. To extract data from a page — for scraping, research, or monitoring — use a two-phase pattern:
- Explore (with
tools) — the agent navigates, scrolls, dismisses popups - Extract (without
tools) — the model reads the screen and responds with text
TOOL = { "type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "browser",}
with client.computer.create(kind="browser") as computer: computer.navigate("https://example.com/pricing") computer.wait(3)
# --- Phase 1: Explore WITH tools (agent scrolls, dismisses popups, etc.) --- screenshot_url = computer.get_screenshot_url(computer.screenshot()) response = client.responses.create( model="tzafon.northstar-cua-fast", tools=[TOOL], input=[{ "role": "user", "content": [ {"type": "input_text", "text": "Scroll down slowly. Dismiss any popups. Stop when you can see pricing details."}, {"type": "input_image", "image_url": screenshot_url, "detail": "auto"}, ], }], )
for step in range(10): computer_call = next( (o for o in (response.output or []) if o.type == "computer_call"), None ) if not computer_call: break action = computer_call.action if action.type in ("terminate", "done", "answer"): break
# Execute the action (click, scroll, type, etc.) # ... see the CUA protocol guide for the full execute_action helper ...
computer.wait(1) screenshot_url = computer.get_screenshot_url(computer.screenshot()) response = client.responses.create( model="tzafon.northstar-cua-fast", previous_response_id=response.id, tools=[TOOL], input=[{ "type": "computer_call_output", "call_id": computer_call.call_id, "output": {"type": "input_image", "image_url": screenshot_url, "detail": "auto"}, }], )
# --- Phase 2: Extract WITHOUT tools (forces a text response) --- screenshot_url = computer.get_screenshot_url(computer.screenshot()) extraction = client.responses.create( model="tzafon.northstar-cua-fast", input=[{ "role": "user", "content": [ {"type": "input_text", "text": "What is the price shown on this page? Reply with just the dollar amount."}, {"type": "input_image", "image_url": screenshot_url, "detail": "auto"}, ], }], # No tools — the model MUST respond with text, not actions )
for item in extraction.output or []: if item.type == "message": for block in item.content or []: print(block.text) # e.g., "$29.99"const tool = { type: "computer_use" as const, display_width: 1280, display_height: 720, environment: "browser",};
const computer = await client.computers.create({ kind: "browser" });const id = computer.id!;
try { await client.computers.navigate(id, { url: "https://example.com/pricing" }); await new Promise((r) => setTimeout(r, 3000));
// --- Phase 1: Explore WITH tools --- let screenshot = await client.computers.screenshot(id); let screenshotUrl = screenshot.result?.screenshot_url as string;
let response = await client.responses.create({ model: "tzafon.northstar-cua-fast", tools: [tool], input: [{ role: "user", content: [ { type: "input_text", text: "Scroll down slowly. Dismiss any popups. Stop when you can see pricing details." }, { type: "input_image", image_url: screenshotUrl, detail: "auto" }, ], }], });
for (let step = 0; step < 10; step++) { const computerCall = response.output?.find((item) => item.type === "computer_call"); if (!computerCall) break;
const action = computerCall.action!; if (["terminate", "done", "answer"].includes(action.type!)) break;
// Execute the action (click, scroll, type, etc.) // ... see the CUA protocol guide for the full action switch ...
await new Promise((r) => setTimeout(r, 1000)); screenshot = await client.computers.screenshot(id); screenshotUrl = screenshot.result?.screenshot_url as string;
response = await client.responses.create({ model: "tzafon.northstar-cua-fast", previous_response_id: response.id!, tools: [tool], input: [{ type: "computer_call_output", call_id: computerCall.call_id!, output: { type: "input_image", image_url: screenshotUrl, detail: "auto" }, }], }); }
// --- Phase 2: Extract WITHOUT tools --- screenshot = await client.computers.screenshot(id); screenshotUrl = screenshot.result?.screenshot_url as string;
const extraction = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: [{ role: "user", content: [ { type: "input_text", text: "What is the price shown on this page? Reply with just the dollar amount." }, { type: "input_image", image_url: screenshotUrl, detail: "auto" }, ], }], // No tools — the model MUST respond with text, not actions });
for (const item of extraction.output ?? []) { if (item.type === "message") { for (const block of item.content ?? []) { console.log(block.text); // e.g., "$29.99" } } }} finally { await client.computers.delete(id);}For a complete working example of this pattern — extracting structured data from multiple websites — see competitor_research.py in the examples directory.
Multi-turn chaining
Section titled “Multi-turn chaining”Use previous_response_id to chain conversations without resending the full history:
# First turnresponse = client.responses.create( model="tzafon.northstar-cua-fast", input="Open the file manager", tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],)
# Execute the action, take a screenshot, then continuefollowup = client.responses.create( model="tzafon.northstar-cua-fast", previous_response_id=response.id, input=[ { "type": "computer_call_output", "call_id": response.output[0].call_id, "output": {"type": "input_image", "image_url": screenshot_url, "detail": "auto"}, }, ], tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],)// First turnconst response = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: "Open the file manager", tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],});
const followup = await client.responses.create({ model: "tzafon.northstar-cua-fast", previous_response_id: response.id!, input: [ { type: "computer_call_output", call_id: response.output![0].call_id!, output: { type: "input_image", image_url: screenshotUrl, detail: "auto" }, }, ], tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],});System instructions
Section titled “System instructions”Use instructions to set a system prompt that guides the model’s behavior across turns:
response = client.responses.create( model="tzafon.northstar-cua-fast", instructions="You are operating a desktop computer. Be careful and verify each action before proceeding.", input="Find the system settings and check the display resolution", tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],)const response = await client.responses.create({ model: "tzafon.northstar-cua-fast", instructions: "You are operating a desktop computer. Be careful and verify each action before proceeding.", input: "Find the system settings and check the display resolution", tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],});The instructions persist across chained turns when using previous_response_id.
Set store: false to skip response persistence. Responses created with store: false cannot be referenced via previous_response_id.
Manage responses
Section titled “Manage responses”Each response has a status field:
| Status | Meaning |
|---|---|
in_progress | The model is still generating output |
completed | The response is ready |
failed | Something went wrong |
cancelled | You cancelled it via cancel() |
# Retrieve a responseresponse = client.responses.retrieve("resp_abc123")print(response.status) # "completed", "in_progress", etc.
# Cancel an in-progress responseclient.responses.cancel("resp_abc123")
# Delete a responseclient.responses.delete("resp_abc123")const response = await client.responses.retrieve("resp_abc123");console.log(response.status); // "completed", "in_progress", etc.
await client.responses.cancel("resp_abc123");await client.responses.delete("resp_abc123");Models
Section titled “Models”| Model | Best for |
|---|---|
tzafon.northstar-cua-fast | Computer-use — optimized for operating desktops and browsers |
tzafon.sm-1 | Low-level Rust programming |
See also
Section titled “See also”- Computer-use loop — full implementation guide
- Tasks — fully managed alternative where Northstar operates autonomously
- Computers — computer lifecycle and all available actions
- How Lightcone works — how the API layers relate