Skip to content
NorthstarPlatformPricingLogin
Using Northstar

Responses API

Build your own computer-use loop where you control every step.

The Responses API lets you build a computer-use loop — a pattern where Northstar looks at a screenshot of a computer screen, decides the next action (click, type, scroll), and you execute it — repeating until the work is done. Unlike Tasks where the model operates autonomously, the Responses API gives you control at every step.

  1. Send input — provide a text instruction and optionally a screenshot
  2. Get output — the model returns either a text message or a computer_call action
  3. Execute the action — perform the click, type, or scroll on the computer
  4. Feed back the result — send a screenshot of the new state as computer_call_output
  5. Repeat until the model returns a message (it’s done) or you decide to stop

The input field accepts either a string or an array of message objects:

  • String — simplest form, for text-only instructions
  • Array — when you need to include images (screenshots) or structured multi-turn input

Both formats work identically for text-only requests. Use the array format when you need to attach a screenshot with input_image.

The simplest form — pass a string as input:

from tzafon import Lightcone
client = Lightcone()
response = client.responses.create(
model="tzafon.northstar-cua-fast",
input="Open the terminal and check disk usage",
tools=[
{
"type": "computer_use",
"display_width": 1280,
"display_height": 720,
"environment": "desktop",
},
],
)
import Lightcone from "@tzafon/lightcone";
const client = new Lightcone();
const response = await client.responses.create({
model: "tzafon.northstar-cua-fast",
input: "Open the terminal and check disk usage",
tools: [
{
type: "computer_use",
display_width: 1280,
display_height: 720,
environment: "desktop",
},
],
});

When you need to include a screenshot (common in computer-use loops), use the array format:

response = client.responses.create(
model="tzafon.northstar-cua-fast",
input=[
{
"role": "user",
"content": [
{"type": "input_text", "text": "Click the search button"},
{"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
],
},
],
tools=[
{
"type": "computer_use",
"display_width": 1280,
"display_height": 720,
"environment": "desktop",
},
],
)
const response = await client.responses.create({
model: "tzafon.northstar-cua-fast",
input: [
{
role: "user",
content: [
{ type: "input_text", text: "Click the search button" },
{ type: "input_image", image_url: screenshotUrl, detail: "auto" },
],
},
],
tools: [
{
type: "computer_use",
display_width: 1280,
display_height: 720,
environment: "desktop",
},
],
});

The response output is an array of items. Each item has a type:

TypeMeaning
computer_callThe model wants to perform an action (click, type, scroll, and more)
messageThe model is responding with text (work may be done)
reasoningInternal reasoning (if available)

When you get a computer_call, the action field tells you what to do:

for item in response.output:
if item.type == "computer_call":
action = item.action
print(f"Action: {action.type}") # e.g., "click", "type", "navigate"
print(f"Coordinates: ({action.x}, {action.y})")
print(f"Text: {action.text}")
elif item.type == "message":
for block in item.content:
print(block.text)
for (const item of response.output ?? []) {
if (item.type === "computer_call") {
console.log(`Action: ${item.action?.type}`);
console.log(`Coordinates: (${item.action?.x}, ${item.action?.y})`);
console.log(`Text: ${item.action?.text}`);
} else if (item.type === "message") {
for (const block of item.content ?? []) {
console.log(block.text);
}
}
}

The model returns actions with coordinates in the 0–999 model space. Denormalize to pixel coordinates before passing to computer.click() or other methods: x = int(action.x / 1000 * display_width).

Here’s what the raw JSON looks like on the wire — useful if you’re calling the API directly without an SDK:

{
"output": [
{
"type": "computer_call",
"call_id": "call_abc123",
"action": {
"type": "click",
"x": 450,
"y": 320,
"button": "left"
}
}
]
}

Note that the action kind is always action.type (e.g., "click", "type", "scroll") and coordinates are separate x and y fields — not a tuple.

The model can request these actions:

ActionFieldsDescription
clickx, y, buttonClick at coordinates
double_clickx, yDouble-click
triple_clickx, yTriple-click (select a line)
right_clickx, yRight-click
typetextType text
key / keypresskeysPress key combination
key_down / key_upkeysHold / release key
scrollx, y, scroll_yScroll vertically
hscrollx, y, scroll_xScroll horizontally
navigateurlGo to a URL (browser mode)
dragx, y, end_x, end_yDrag between two points
waitWait for the screen to settle
terminatestatus, resultWork is complete (status: "success" or "failure")
answerresultAnswer a question with findings
donetextWork is complete (alias)

When tools includes computer_use, the model strongly prefers returning actions over text. To extract data from a page — for scraping, research, or monitoring — use a two-phase pattern:

  1. Explore (with tools) — the agent navigates, scrolls, dismisses popups
  2. Extract (without tools) — the model reads the screen and responds with text
TOOL = {
"type": "computer_use",
"display_width": 1280,
"display_height": 720,
"environment": "browser",
}
with client.computer.create(kind="browser") as computer:
computer.navigate("https://example.com/pricing")
computer.wait(3)
# --- Phase 1: Explore WITH tools (agent scrolls, dismisses popups, etc.) ---
screenshot_url = computer.get_screenshot_url(computer.screenshot())
response = client.responses.create(
model="tzafon.northstar-cua-fast",
tools=[TOOL],
input=[{
"role": "user",
"content": [
{"type": "input_text", "text": "Scroll down slowly. Dismiss any popups. Stop when you can see pricing details."},
{"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
],
}],
)
for step in range(10):
computer_call = next(
(o for o in (response.output or []) if o.type == "computer_call"), None
)
if not computer_call:
break
action = computer_call.action
if action.type in ("terminate", "done", "answer"):
break
# Execute the action (click, scroll, type, etc.)
# ... see the CUA protocol guide for the full execute_action helper ...
computer.wait(1)
screenshot_url = computer.get_screenshot_url(computer.screenshot())
response = client.responses.create(
model="tzafon.northstar-cua-fast",
previous_response_id=response.id,
tools=[TOOL],
input=[{
"type": "computer_call_output",
"call_id": computer_call.call_id,
"output": {"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
}],
)
# --- Phase 2: Extract WITHOUT tools (forces a text response) ---
screenshot_url = computer.get_screenshot_url(computer.screenshot())
extraction = client.responses.create(
model="tzafon.northstar-cua-fast",
input=[{
"role": "user",
"content": [
{"type": "input_text", "text": "What is the price shown on this page? Reply with just the dollar amount."},
{"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
],
}],
# No tools — the model MUST respond with text, not actions
)
for item in extraction.output or []:
if item.type == "message":
for block in item.content or []:
print(block.text) # e.g., "$29.99"
const tool = {
type: "computer_use" as const,
display_width: 1280,
display_height: 720,
environment: "browser",
};
const computer = await client.computers.create({ kind: "browser" });
const id = computer.id!;
try {
await client.computers.navigate(id, { url: "https://example.com/pricing" });
await new Promise((r) => setTimeout(r, 3000));
// --- Phase 1: Explore WITH tools ---
let screenshot = await client.computers.screenshot(id);
let screenshotUrl = screenshot.result?.screenshot_url as string;
let response = await client.responses.create({
model: "tzafon.northstar-cua-fast",
tools: [tool],
input: [{
role: "user",
content: [
{ type: "input_text", text: "Scroll down slowly. Dismiss any popups. Stop when you can see pricing details." },
{ type: "input_image", image_url: screenshotUrl, detail: "auto" },
],
}],
});
for (let step = 0; step < 10; step++) {
const computerCall = response.output?.find((item) => item.type === "computer_call");
if (!computerCall) break;
const action = computerCall.action!;
if (["terminate", "done", "answer"].includes(action.type!)) break;
// Execute the action (click, scroll, type, etc.)
// ... see the CUA protocol guide for the full action switch ...
await new Promise((r) => setTimeout(r, 1000));
screenshot = await client.computers.screenshot(id);
screenshotUrl = screenshot.result?.screenshot_url as string;
response = await client.responses.create({
model: "tzafon.northstar-cua-fast",
previous_response_id: response.id!,
tools: [tool],
input: [{
type: "computer_call_output",
call_id: computerCall.call_id!,
output: { type: "input_image", image_url: screenshotUrl, detail: "auto" },
}],
});
}
// --- Phase 2: Extract WITHOUT tools ---
screenshot = await client.computers.screenshot(id);
screenshotUrl = screenshot.result?.screenshot_url as string;
const extraction = await client.responses.create({
model: "tzafon.northstar-cua-fast",
input: [{
role: "user",
content: [
{ type: "input_text", text: "What is the price shown on this page? Reply with just the dollar amount." },
{ type: "input_image", image_url: screenshotUrl, detail: "auto" },
],
}],
// No tools — the model MUST respond with text, not actions
});
for (const item of extraction.output ?? []) {
if (item.type === "message") {
for (const block of item.content ?? []) {
console.log(block.text); // e.g., "$29.99"
}
}
}
} finally {
await client.computers.delete(id);
}

For a complete working example of this pattern — extracting structured data from multiple websites — see competitor_research.py in the examples directory.

Use previous_response_id to chain conversations without resending the full history:

# First turn
response = client.responses.create(
model="tzafon.northstar-cua-fast",
input="Open the file manager",
tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],
)
# Execute the action, take a screenshot, then continue
followup = client.responses.create(
model="tzafon.northstar-cua-fast",
previous_response_id=response.id,
input=[
{
"type": "computer_call_output",
"call_id": response.output[0].call_id,
"output": {"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
},
],
tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],
)
// First turn
const response = await client.responses.create({
model: "tzafon.northstar-cua-fast",
input: "Open the file manager",
tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],
});
const followup = await client.responses.create({
model: "tzafon.northstar-cua-fast",
previous_response_id: response.id!,
input: [
{
type: "computer_call_output",
call_id: response.output![0].call_id!,
output: { type: "input_image", image_url: screenshotUrl, detail: "auto" },
},
],
tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],
});

Use instructions to set a system prompt that guides the model’s behavior across turns:

response = client.responses.create(
model="tzafon.northstar-cua-fast",
instructions="You are operating a desktop computer. Be careful and verify each action before proceeding.",
input="Find the system settings and check the display resolution",
tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "desktop"}],
)
const response = await client.responses.create({
model: "tzafon.northstar-cua-fast",
instructions: "You are operating a desktop computer. Be careful and verify each action before proceeding.",
input: "Find the system settings and check the display resolution",
tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "desktop" }],
});

The instructions persist across chained turns when using previous_response_id.

Set store: false to skip response persistence. Responses created with store: false cannot be referenced via previous_response_id.

Each response has a status field:

StatusMeaning
in_progressThe model is still generating output
completedThe response is ready
failedSomething went wrong
cancelledYou cancelled it via cancel()
# Retrieve a response
response = client.responses.retrieve("resp_abc123")
print(response.status) # "completed", "in_progress", etc.
# Cancel an in-progress response
client.responses.cancel("resp_abc123")
# Delete a response
client.responses.delete("resp_abc123")
const response = await client.responses.retrieve("resp_abc123");
console.log(response.status); // "completed", "in_progress", etc.
await client.responses.cancel("resp_abc123");
await client.responses.delete("resp_abc123");
ModelBest for
tzafon.northstar-cua-fastComputer-use — optimized for operating desktops and browsers
tzafon.sm-1Low-level Rust programming