Responses API
OpenAI-compatible computer-use agent interface for building your own agent loops.
The Responses API is an OpenAI-compatible interface for building computer-use agent (CUA) loops. A CUA loop is a pattern where a vision model looks at a screenshot of a computer screen, decides the next action (click, type, scroll), and you execute it — repeating until the task is complete. Unlike Agent Tasks where the agent runs autonomously, the Responses API gives you control at every step.
How it works
Section titled “How it works”- Send input — provide a text instruction and optionally a screenshot
- Get output — the model returns either a text
messageor acomputer_callaction - Execute the action — perform the click, type, or scroll on a computer session
- Feed back the result — send a screenshot of the new state as
computer_call_output - Repeat until the model returns a
message(it’s done) or you decide to stop
Input format
Section titled “Input format”The input field accepts either a string or an array of message objects:
- String — simplest form, for text-only instructions
- Array — when you need to include images (screenshots) or structured multi-turn input
Both formats work identically for text-only requests. Use the array format when you need to attach a screenshot with input_image.
Create a response
Section titled “Create a response”The simplest form — pass a string as input:
from tzafon import Lightcone
client = Lightcone()
response = client.responses.create( model="tzafon.northstar-cua-fast", input="Go to wikipedia.org and search for 'Alan Turing'", tools=[ { "type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "browser", }, ],)import Lightcone from "@tzafon/lightcone";
const client = new Lightcone();
const response = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: "Go to wikipedia.org and search for 'Alan Turing'", tools: [ { type: "computer_use", display_width: 1280, display_height: 720, environment: "browser", }, ],});When you need to include a screenshot (common in CUA loops), use the array format:
response = client.responses.create( model="tzafon.northstar-cua-fast", input=[ { "role": "user", "content": [ {"type": "input_text", "text": "Click the search button"}, {"type": "input_image", "image_url": screenshot_url}, ], }, ], tools=[ { "type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "browser", }, ],)const response = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: [ { role: "user", content: [ { type: "input_text", text: "Click the search button" }, { type: "input_image", image_url: screenshotUrl }, ], }, ], tools: [ { type: "computer_use", display_width: 1280, display_height: 720, environment: "browser", }, ],});Process the output
Section titled “Process the output”The response output is an array of items. Each item has a type:
| Type | Meaning |
|---|---|
computer_call | The model wants to perform an action (click, type, scroll, and more) |
message | The model is responding with text (task may be done) |
reasoning | Internal reasoning (if available) |
When you get a computer_call, the action field tells you what to do:
for item in response.output: if item.type == "computer_call": action = item.action print(f"Action: {action.type}") # e.g., "click", "type", "navigate" print(f"Coordinates: ({action.x}, {action.y})") print(f"Text: {action.text}")
elif item.type == "message": for block in item.content: print(block.text)for (const item of response.output ?? []) { if (item.type === "computer_call") { console.log(`Action: ${item.action?.type}`); console.log(`Coordinates: (${item.action?.x}, ${item.action?.y})`); console.log(`Text: ${item.action?.text}`);
} else if (item.type === "message") { for (const block of item.content ?? []) { console.log(block.text); } }}Action types
Section titled “Action types”The model returns actions with coordinates already scaled to match your display_width and display_height. You can pass action.x and action.y directly to computer.click() or other session methods without any conversion.
The model can request these actions:
| Action | Fields | Description |
|---|---|---|
click | x, y, button | Click at coordinates |
double_click | x, y | Double-click |
triple_click | x, y | Triple-click (select a line) |
right_click | x, y | Right-click |
type | text | Type text |
key / keypress | keys | Press key combination |
key_down / key_up | keys | Hold / release key |
scroll | x, y, scroll_y | Scroll vertically |
hscroll | x, y, scroll_x | Scroll horizontally |
navigate | url | Go to a URL (browser only) |
drag | x, y, end_x, end_y | Drag between two points |
wait | — | Wait for the page to settle |
terminate | status, result | Task is complete (status: "success" or "failure") |
answer | result | Answer a question with findings |
done | text | Task is complete (alias) |
Multi-turn chaining
Section titled “Multi-turn chaining”Use previous_response_id to chain conversations without resending the full history:
# First turn — string input is fine for text-onlyresponse = client.responses.create( model="tzafon.northstar-cua-fast", input="Navigate to example.com", tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "browser"}],)
# Execute the action, take a screenshot, then continuefollowup = client.responses.create( model="tzafon.northstar-cua-fast", previous_response_id=response.id, input=[ { "type": "computer_call_output", "call_id": response.output[0].call_id, "output": {"type": "input_image", "image_url": screenshot_url}, }, ], tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "browser"}],)// First turn — string input is fine for text-onlyconst response = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: "Navigate to example.com", tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "browser" }],});
const followup = await client.responses.create({ model: "tzafon.northstar-cua-fast", previous_response_id: response.id!, input: [ { type: "computer_call_output", call_id: response.output![0].call_id!, output: { type: "input_image", image_url: screenshotUrl }, }, ], tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "browser" }],});Manage responses
Section titled “Manage responses”# Retrieve a responseresponse = client.responses.retrieve("resp_abc123")
# Cancel an in-progress responseclient.responses.cancel("resp_abc123")
# Delete a responseclient.responses.delete("resp_abc123")const response = await client.responses.retrieve("resp_abc123");await client.responses.cancel("resp_abc123");await client.responses.delete("resp_abc123");Models
Section titled “Models”| Model | Best for |
|---|---|
tzafon.northstar-cua-fast | Computer-use tasks (optimized for CUA) |
tzafon.sm-1 | General text tasks |
See also
Section titled “See also”- CUA protocol guide — full implementation of a CUA loop
- Agent Tasks — fully managed alternative where the agent runs autonomously
- Computers — session lifecycle and all available actions
- How Lightcone works — how the three API layers relate