---
title: Responses API | Lightcone
description: OpenAI-compatible computer-use agent interface for building your own agent loops.
---

The Responses API is an OpenAI-compatible interface for building **computer-use agent (CUA)** loops. A CUA loop is a pattern where a vision model looks at a screenshot of a computer screen, decides the next action (click, type, scroll), and you execute it — repeating until the task is complete. Unlike [Agent Tasks](/guides/agent-tasks/index.md) where the agent runs autonomously, the Responses API gives you control at every step.

## How it works

1. **Send input** — provide a text instruction and optionally a screenshot
2. **Get output** — the model returns either a text `message` or a `computer_call` action
3. **Execute the action** — perform the click, type, or scroll on a computer session
4. **Feed back the result** — send a screenshot of the new state as `computer_call_output`
5. **Repeat** until the model returns a `message` (it’s done) or you decide to stop

## Input format

The `input` field accepts either a **string** or an **array of message objects**:

- **String** — simplest form, for text-only instructions
- **Array** — when you need to include images (screenshots) or structured multi-turn input

Both formats work identically for text-only requests. Use the array format when you need to attach a screenshot with `input_image`.

## Create a response

The simplest form — pass a string as `input`:

```
from tzafon import Lightcone


client = Lightcone()


response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    input="Go to wikipedia.org and search for 'Alan Turing'",
    tools=[
        {
            "type": "computer_use",
            "display_width": 1280,
            "display_height": 720,
            "environment": "browser",
        },
    ],
)
```

```
import Lightcone from "@tzafon/lightcone";


const client = new Lightcone();


const response = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  input: "Go to wikipedia.org and search for 'Alan Turing'",
  tools: [
    {
      type: "computer_use",
      display_width: 1280,
      display_height: 720,
      environment: "browser",
    },
  ],
});
```

When you need to include a screenshot (common in CUA loops), use the array format:

```
response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Click the search button"},
                {"type": "input_image", "image_url": screenshot_url},
            ],
        },
    ],
    tools=[
        {
            "type": "computer_use",
            "display_width": 1280,
            "display_height": 720,
            "environment": "browser",
        },
    ],
)
```

```
const response = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  input: [
    {
      role: "user",
      content: [
        { type: "input_text", text: "Click the search button" },
        { type: "input_image", image_url: screenshotUrl },
      ],
    },
  ],
  tools: [
    {
      type: "computer_use",
      display_width: 1280,
      display_height: 720,
      environment: "browser",
    },
  ],
});
```

`display_width` and `display_height` are optional. If omitted and you include a screenshot in the input, the server infers the viewport dimensions from the image. If no image is provided, it defaults to 1024×768.

## Process the output

The response `output` is an array of items. Each item has a `type`:

| Type            | Meaning                                                              |
| --------------- | -------------------------------------------------------------------- |
| `computer_call` | The model wants to perform an action (click, type, scroll, and more) |
| `message`       | The model is responding with text (task may be done)                 |
| `reasoning`     | Internal reasoning (if available)                                    |

When you get a `computer_call`, the `action` field tells you what to do:

```
for item in response.output:
    if item.type == "computer_call":
        action = item.action
        print(f"Action: {action.type}")  # e.g., "click", "type", "navigate"
        print(f"Coordinates: ({action.x}, {action.y})")
        print(f"Text: {action.text}")


    elif item.type == "message":
        for block in item.content:
            print(block.text)
```

```
for (const item of response.output ?? []) {
  if (item.type === "computer_call") {
    console.log(`Action: ${item.action?.type}`);
    console.log(`Coordinates: (${item.action?.x}, ${item.action?.y})`);
    console.log(`Text: ${item.action?.text}`);


  } else if (item.type === "message") {
    for (const block of item.content ?? []) {
      console.log(block.text);
    }
  }
}
```

## Action types

The model returns actions with coordinates already scaled to match your `display_width` and `display_height`. You can pass `action.x` and `action.y` directly to `computer.click()` or other session methods without any conversion.

The model can request these actions:

| Action                | Fields                     | Description                                             |
| --------------------- | -------------------------- | ------------------------------------------------------- |
| `click`               | `x`, `y`, `button`         | Click at coordinates                                    |
| `double_click`        | `x`, `y`                   | Double-click                                            |
| `triple_click`        | `x`, `y`                   | Triple-click (select a line)                            |
| `right_click`         | `x`, `y`                   | Right-click                                             |
| `type`                | `text`                     | Type text                                               |
| `key` / `keypress`    | `keys`                     | Press key combination                                   |
| `key_down` / `key_up` | `keys`                     | Hold / release key                                      |
| `scroll`              | `x`, `y`, `scroll_y`       | Scroll vertically                                       |
| `hscroll`             | `x`, `y`, `scroll_x`       | Scroll horizontally                                     |
| `navigate`            | `url`                      | Go to a URL (browser only)                              |
| `drag`                | `x`, `y`, `end_x`, `end_y` | Drag between two points                                 |
| `wait`                | —                          | Wait for the page to settle                             |
| `terminate`           | `status`, `result`         | Task is complete (`status`: `"success"` or `"failure"`) |
| `answer`              | `result`                   | Answer a question with findings                         |
| `done`                | `text`                     | Task is complete (alias)                                |

## Multi-turn chaining

Use `previous_response_id` to chain conversations without resending the full history:

```
# First turn — string input is fine for text-only
response = client.responses.create(
    model="tzafon.northstar-cua-fast",
    input="Navigate to example.com",
    tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "browser"}],
)


# Execute the action, take a screenshot, then continue
followup = client.responses.create(
    model="tzafon.northstar-cua-fast",
    previous_response_id=response.id,
    input=[
        {
            "type": "computer_call_output",
            "call_id": response.output[0].call_id,
            "output": {"type": "input_image", "image_url": screenshot_url},
        },
    ],
    tools=[{"type": "computer_use", "display_width": 1280, "display_height": 720, "environment": "browser"}],
)
```

```
// First turn — string input is fine for text-only
const response = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  input: "Navigate to example.com",
  tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "browser" }],
});


const followup = await client.responses.create({
  model: "tzafon.northstar-cua-fast",
  previous_response_id: response.id!,
  input: [
    {
      type: "computer_call_output",
      call_id: response.output![0].call_id!,
      output: { type: "input_image", image_url: screenshotUrl },
    },
  ],
  tools: [{ type: "computer_use", display_width: 1280, display_height: 720, environment: "browser" }],
});
```

## Manage responses

```
# Retrieve a response
response = client.responses.retrieve("resp_abc123")


# Cancel an in-progress response
client.responses.cancel("resp_abc123")


# Delete a response
client.responses.delete("resp_abc123")
```

```
const response = await client.responses.retrieve("resp_abc123");
await client.responses.cancel("resp_abc123");
await client.responses.delete("resp_abc123");
```

## Models

| Model                       | Best for                               |
| --------------------------- | -------------------------------------- |
| `tzafon.northstar-cua-fast` | Computer-use tasks (optimized for CUA) |
| `tzafon.sm-1`               | General text tasks                     |

For a fully managed agent experience where you don’t need to control the loop, use [Agent Tasks](/guides/agent-tasks/index.md) instead. The Responses API is for when you need full control over every step.

## See also

- [**CUA protocol guide**](/guides/cua-protocol/index.md) — full implementation of a CUA loop
- [**Agent Tasks**](/guides/agent-tasks/index.md) — fully managed alternative where the agent runs autonomously
- [**Computers**](/guides/computers/index.md) — session lifecycle and all available actions
- [**How Lightcone works**](/guides/how-lightcone-works/index.md) — how the three API layers relate