Skip to content
Try out in chatDeveloper dashboardLogin
Using Northstar

Coordinates

How Northstar's coordinate system works and how to convert coordinates to pixel positions.

Northstar sees a screenshot and decides where to act. Internally, it always thinks in a fixed 0–999 grid — regardless of the actual screen resolution. Coordinates come back to your code in this same 0–999 space; you convert them to pixels before clicking.

Every coordinate Northstar outputs is a point on this grid, overlaid on the screen:

(0, 0)(999, 0)(0, 999)(999, 999)(500, 500)Northstar’s view

The model says “click at (500, 500)” meaning “click in the center” — whether the actual screen is 500px wide or 3840px wide.

Northstar always returns coordinates in a 0–999 space. Your code converts them to pixel coordinates before clicking:

pixel_x = int(model_x / 1000 * display_width)
pixel_y = int(model_y / 1000 * display_height)

The model outputs (500, 500). Here’s where that maps to at three different viewport sizes:

(250, 250)500 × 500
(640, 360)1280 × 720
(1920, 1080)3840 × 2160

Same model output, same logical position (center of screen), different pixel values. Apply the formula above, then pass the pixel coordinates to computer.click().

All of them. Coordinates always come back in the 0–999 model space — you always denormalize in your code.

APIWhat you receiveWhat to do
Responses API (with computer_use tool)Raw 0–999 model coordinatesDenormalize to pixels before calling click()
TasksHandled internallyYou don’t interact with coordinates
Chat CompletionsRaw 0–999 model outputDenormalize to pixels before calling click()
Computers API (direct control)N/A — you provide the coordinatesPass pixel coordinates directly

When you get a computer_call, denormalize the coordinates before clicking:

DISPLAY_WIDTH = 1280
DISPLAY_HEIGHT = 720
response = client.responses.create(
model="tzafon.northstar-cua-fast",
input=[{
"role": "user",
"content": [
{"type": "input_text", "text": "Click the search button"},
{"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
],
}],
tools=[{
"type": "computer_use",
"display_width": DISPLAY_WIDTH,
"display_height": DISPLAY_HEIGHT,
"environment": "desktop",
}],
)
for item in response.output:
if item.type == "computer_call":
# Denormalize from 0–999 model space to pixel coordinates
x = int(item.action.x / 1000 * DISPLAY_WIDTH)
y = int(item.action.y / 1000 * DISPLAY_HEIGHT)
computer.click(x, y)
const DISPLAY_WIDTH = 1280;
const DISPLAY_HEIGHT = 720;
const response = await client.responses.create({
model: "tzafon.northstar-cua-fast",
input: [{
role: "user",
content: [
{ type: "input_text", text: "Click the search button" },
{ type: "input_image", image_url: screenshotUrl, detail: "auto" },
],
}],
tools: [{
type: "computer_use",
display_width: DISPLAY_WIDTH,
display_height: DISPLAY_HEIGHT,
environment: "desktop",
}],
});
for (const item of response.output ?? []) {
if (item.type === "computer_call") {
// Denormalize from 0–999 model space to pixel coordinates
const x = Math.floor(item.action!.x! / 1000 * DISPLAY_WIDTH);
const y = Math.floor(item.action!.y! / 1000 * DISPLAY_HEIGHT);
await client.computers.click(id, { x, y });
}
}

The Chat Completions API does not scale coordinates — you receive raw model output and handle the conversion yourself. You need two things: a system prompt template and a scaling function.

Add this to your system prompt, replacing the viewport dimensions with your own:

You are controlling a computer through screenshots and actions.
Screen information:
- Viewport size: {viewport_width}x{viewport_height} pixels.
- Coordinates range from (0,0) at the top-left to (999,999) at the bottom-right.
- All coordinate values must be integers between 0 and 999 inclusive.
When clicking or interacting with elements:
- Look at the screenshot to find the element's position.
- Return coordinates in the 0-999 range. Your code will convert them to pixel coordinates.
- Click elements in their CENTER, not on edges.

Convert the model’s 0–999 coordinates back to pixel coordinates:

def scale_coordinates(
model_x: int, model_y: int,
viewport_width: int, viewport_height: int,
) -> tuple[int, int]:
"""Convert model coordinates (0-999) to actual pixel coordinates."""
x = int(model_x / 1000 * viewport_width)
y = int(model_y / 1000 * viewport_height)
return x, y
# Example: model says click (500, 500) on a 1280x720 viewport
x, y = scale_coordinates(500, 500, 1280, 720) # → (640, 360)
function scaleCoordinates(
modelX: number, modelY: number,
viewportWidth: number, viewportHeight: number,
): [number, number] {
/** Convert model coordinates (0-999) to actual pixel coordinates. */
const x = Math.floor(modelX / 1000 * viewportWidth);
const y = Math.floor(modelY / 1000 * viewportHeight);
return [x, y];
}
// Example: model says click (500, 500) on a 1280x720 viewport
const [x, y] = scaleCoordinates(500, 500, 1280, 720); // → [640, 360]

Putting it together with the OpenAI SDK:

from openai import OpenAI
import json
client = OpenAI(
base_url="https://api.tzafon.ai/v1",
api_key="sk_your_api_key_here",
)
VIEWPORT_WIDTH = 1280
VIEWPORT_HEIGHT = 720
SYSTEM_PROMPT = f"""\
You are controlling a computer through screenshots and actions.
Screen information:
- Viewport size: {VIEWPORT_WIDTH}x{VIEWPORT_HEIGHT} pixels.
- Coordinates range from (0,0) at the top-left to (999,999) at the bottom-right.
- All coordinate values must be integers between 0 and 999 inclusive.
When clicking or interacting with elements:
- Look at the screenshot to find the element's position.
- Return coordinates in the 0-999 range. Your code will convert them to pixel coordinates.
- Click elements in their CENTER, not on edges."""
result = client.chat.completions.create(
model="tzafon.northstar-cua-fast",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{"type": "text", "text": "Click the search button"},
{"type": "image_url", "image_url": {"url": screenshot_url}},
],
},
],
tools=[
{
"type": "function",
"function": {
"name": "click",
"description": "Click at screen coordinates (0-999 range).",
"parameters": {
"type": "object",
"properties": {
"x": {"type": "integer", "description": "X position (0=left edge, 999=right edge)"},
"y": {"type": "integer", "description": "Y position (0=top edge, 999=bottom edge)"},
},
"required": ["x", "y"],
},
},
},
],
)
for choice in result.choices:
for tool_call in choice.message.tool_calls or []:
args = json.loads(tool_call.function.arguments)
pixel_x, pixel_y = scale_coordinates(
args["x"], args["y"], VIEWPORT_WIDTH, VIEWPORT_HEIGHT,
)
print(f"Click at pixel ({pixel_x}, {pixel_y})")
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tzafon.ai/v1",
apiKey: "sk_your_api_key_here",
});
const VIEWPORT_WIDTH = 1280;
const VIEWPORT_HEIGHT = 720;
const SYSTEM_PROMPT = `You are controlling a computer through screenshots and actions.
Screen information:
- Viewport size: ${VIEWPORT_WIDTH}x${VIEWPORT_HEIGHT} pixels.
- Coordinates range from (0,0) at the top-left to (999,999) at the bottom-right.
- All coordinate values must be integers between 0 and 999 inclusive.
When clicking or interacting with elements:
- Look at the screenshot to find the element's position.
- Return coordinates in the 0-999 range. Your code will convert them to pixel coordinates.
- Click elements in their CENTER, not on edges.`;
const result = await client.chat.completions.create({
model: "tzafon.northstar-cua-fast",
messages: [
{ role: "system", content: SYSTEM_PROMPT },
{
role: "user",
content: [
{ type: "text", text: "Click the search button" },
{ type: "image_url", image_url: { url: screenshotUrl } },
],
},
],
tools: [
{
type: "function",
function: {
name: "click",
description: "Click at screen coordinates (0-999 range).",
parameters: {
type: "object",
properties: {
x: { type: "integer", description: "X position (0=left edge, 999=right edge)" },
y: { type: "integer", description: "Y position (0=top edge, 999=bottom edge)" },
},
required: ["x", "y"],
},
},
},
],
});
for (const choice of result.choices) {
for (const toolCall of choice.message.tool_calls ?? []) {
const args = JSON.parse(toolCall.function.arguments);
const [pixelX, pixelY] = scaleCoordinates(
args.x, args.y, VIEWPORT_WIDTH, VIEWPORT_HEIGHT,
);
console.log(`Click at pixel (${pixelX}, ${pixelY})`);
}
}

Here’s coordinate denormalization in action. We ask Northstar to click the search bar on a Wikipedia page. The model responds with (379, 43) in the 0–999 grid, which denormalizes to pixel (485, 30) on a 1280×720 viewport:

Annotated screenshot showing a red crosshair on the Wikipedia search bar, with the label model=(379,43) pixel=(485,30)

After clicking at the scaled pixel coordinates, the search bar is focused:

Screenshot after clicking — the Wikipedia search bar is now active

Full source code for this example
coordinate_example.py
import io
import urllib.request
from PIL import Image, ImageDraw
from tzafon import Lightcone
DISPLAY_WIDTH = 1280
DISPLAY_HEIGHT = 720
client = Lightcone()
def denormalize(model_x, model_y, width, height):
"""Convert 0–999 model coordinates to pixel coordinates."""
return int(model_x / 1000 * width), int(model_y / 1000 * height)
def annotate_screenshot(image_bytes, pixel_x, pixel_y, model_x, model_y, output_path):
"""Draw a red crosshair + label on the screenshot to visualize the click."""
img = Image.open(io.BytesIO(image_bytes)).convert("RGB")
draw = ImageDraw.Draw(img)
r = 15
draw.ellipse((pixel_x - r, pixel_y - r, pixel_x + r, pixel_y + r), outline="red", width=3)
draw.line((pixel_x - r, pixel_y, pixel_x + r, pixel_y), fill="red", width=2)
draw.line((pixel_x, pixel_y - r, pixel_x, pixel_y + r), fill="red", width=2)
label = f"model=({model_x},{model_y}) pixel=({pixel_x},{pixel_y})"
draw.text((pixel_x + r + 5, pixel_y - 10), label, fill="red")
img.save(output_path)
with client.computer.create(kind="browser") as computer:
computer.navigate("https://en.wikipedia.org/wiki/Ada_Lovelace")
computer.wait(3)
result = computer.screenshot()
screenshot_url = computer.get_screenshot_url(result)
with urllib.request.urlopen(screenshot_url) as resp:
screenshot_bytes = resp.read()
# Ask Northstar where to click — coordinates come back in 0–999 space.
response = client.responses.create(
model="tzafon.northstar-cua-fast",
tools=[{
"type": "computer_use",
"display_width": DISPLAY_WIDTH,
"display_height": DISPLAY_HEIGHT,
"environment": "browser",
}],
input=[{
"role": "user",
"content": [
{"type": "input_text", "text": "Click the search bar."},
{"type": "input_image", "image_url": screenshot_url, "detail": "auto"},
],
}],
)
computer_call = next(
(item for item in response.output if item.type == "computer_call"), None
)
model_x, model_y = computer_call.action.x, computer_call.action.y
# Denormalize to pixel coordinates before clicking.
pixel_x, pixel_y = denormalize(model_x, model_y, DISPLAY_WIDTH, DISPLAY_HEIGHT)
print(f"Model: ({model_x}, {model_y}) → Pixel: ({pixel_x}, {pixel_y})")
annotate_screenshot(screenshot_bytes, pixel_x, pixel_y, model_x, model_y, "click_debug.png")
computer.click(pixel_x, pixel_y)
computer.wait(2)

Run it yourself: coordinate_scaling.py

When using the Computers API directly (no model), coordinates are raw pixel positions — you decide where to click:

# You provide pixel coordinates directly
computer.click(640, 360) # center of a 1280x720 screen
computer.scroll(0, 300, 640, 400) # scroll down at position
await client.computers.click(id, { x: 640, y: 360 });
await client.computers.scroll(id, { dx: 0, dy: 300, x: 640, y: 400 });

There’s no scaling involved — what you send is what happens.