--- title: Coordinates | Lightcone description: How Northstar's coordinate system works and how the API scales coordinates to your viewport. --- Northstar sees a screenshot and decides where to act. Internally, it always thinks in a **fixed 0–999 grid** — regardless of the actual screen resolution. The API can scale these to pixel coordinates for you, depending on which API you use. ## The 0–999 grid Every coordinate Northstar outputs is a point on this grid, overlaid on the screen: The model says *“click at (500, 500)”* meaning “click in the center” — whether the actual screen is 500px wide or 3840px wide. ## How scaling works When you use the [Responses API](/guides/responses-api/index.md) or [Tasks](/guides/tasks/index.md) with the `computer_use` tool, the API converts Northstar’s 0–999 coordinates to **pixel coordinates** matching your viewport: ``` pixel_x = model_x × (display_width − 1) / 999 pixel_y = model_y × (display_height − 1) / 999 ``` The model outputs **(500, 500)**. Here’s where that maps to at three different viewport sizes: Same model output, same logical position (center of screen), different pixel values. The API does the math — you just pass `action.x` and `action.y` directly to `computer.click()`. ## Which API scales coordinates? | API | Coordinate scaling | What you receive | | -------------------------------------------- | -------------------------------------------- | ---------------------------------------------- | | **Responses API** (with `computer_use` tool) | Scaled to `display_width` × `display_height` | Pixel coordinates — pass directly to `click()` | | **Tasks** | Handled automatically | You don’t interact with coordinates | | **Chat Completions** | No scaling | Raw 0–999 model output | | **Computers API** (direct control) | N/A — you provide the coordinates | Whatever you send | ### Responses API When you include a `computer_use` tool, the API scales coordinates before returning them: ``` response = client.responses.create( model="tzafon.northstar-cua-fast", input=[{ "role": "user", "content": [ {"type": "input_text", "text": "Click the search button"}, {"type": "input_image", "image_url": screenshot_url, "detail": "auto"}, ], }], tools=[{ "type": "computer_use", "display_width": 1280, # ← tells the API your viewport size "display_height": 720, "environment": "desktop", }], ) for item in response.output: if item.type == "computer_call": # These are pixel coordinates, ready to use computer.click(item.action.x, item.action.y) ``` ``` const response = await client.responses.create({ model: "tzafon.northstar-cua-fast", input: [{ role: "user", content: [ { type: "input_text", text: "Click the search button" }, { type: "input_image", image_url: screenshotUrl, detail: "auto" }, ], }], tools: [{ type: "computer_use", display_width: 1280, // ← tells the API your viewport size display_height: 720, environment: "desktop", }], }); for (const item of response.output ?? []) { if (item.type === "computer_call") { // These are pixel coordinates, ready to use await client.computers.click(id, { x: item.action!.x!, y: item.action!.y! }); } } ``` `display_width` and `display_height` are optional. If you include a screenshot in the input, the API infers the viewport from the image dimensions. If neither is provided, it defaults to 1024×768. ### Chat Completions The Chat Completions API does not scale coordinates — you receive raw model output and handle the conversion yourself. You need two things: a system prompt template and a scaling function. #### System prompt template Add this to your system prompt, replacing the viewport dimensions with your own: ``` You are controlling a computer through screenshots and actions. Screen information: - Viewport size: {viewport_width}x{viewport_height} pixels. - Coordinates range from (0,0) at the top-left to (999,999) at the bottom-right. - All coordinate values must be integers between 0 and 999 inclusive. When clicking or interacting with elements: - Look at the screenshot to find the element's position. - Return coordinates in the 0-999 range. They will be automatically scaled to the actual viewport. - Click elements in their CENTER, not on edges. ``` #### Scaling code Convert the model’s 0–999 coordinates back to pixel coordinates: ``` def scale_coordinates( model_x: int, model_y: int, viewport_width: int, viewport_height: int, ) -> tuple[int, int]: """Convert model coordinates (0-999) to actual pixel coordinates.""" x = int(model_x * (viewport_width - 1) / 999) y = int(model_y * (viewport_height - 1) / 999) return x, y # Example: model says click (500, 500) on a 1280x720 viewport x, y = scale_coordinates(500, 500, 1280, 720) # → (640, 360) ``` ``` function scaleCoordinates( modelX: number, modelY: number, viewportWidth: number, viewportHeight: number, ): [number, number] { /** Convert model coordinates (0-999) to actual pixel coordinates. */ const x = Math.floor(modelX * (viewportWidth - 1) / 999); const y = Math.floor(modelY * (viewportHeight - 1) / 999); return [x, y]; } // Example: model says click (500, 500) on a 1280x720 viewport const [x, y] = scaleCoordinates(500, 500, 1280, 720); // → [640, 360] ``` #### Full example Putting it together with the OpenAI SDK: ``` from openai import OpenAI import json client = OpenAI( base_url="https://api.tzafon.ai/v1", api_key="sk_your_api_key_here", ) VIEWPORT_WIDTH = 1280 VIEWPORT_HEIGHT = 720 SYSTEM_PROMPT = f"""\ You are controlling a computer through screenshots and actions. Screen information: - Viewport size: {VIEWPORT_WIDTH}x{VIEWPORT_HEIGHT} pixels. - Coordinates range from (0,0) at the top-left to (999,999) at the bottom-right. - All coordinate values must be integers between 0 and 999 inclusive. When clicking or interacting with elements: - Look at the screenshot to find the element's position. - Return coordinates in the 0-999 range. They will be automatically scaled to the actual viewport. - Click elements in their CENTER, not on edges.""" result = client.chat.completions.create( model="tzafon.northstar-cua-fast", messages=[ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": [ {"type": "text", "text": "Click the search button"}, {"type": "image_url", "image_url": {"url": screenshot_url}}, ], }, ], tools=[ { "type": "function", "function": { "name": "click", "description": "Click at screen coordinates (0-999 range).", "parameters": { "type": "object", "properties": { "x": {"type": "integer", "description": "X position (0=left edge, 999=right edge)"}, "y": {"type": "integer", "description": "Y position (0=top edge, 999=bottom edge)"}, }, "required": ["x", "y"], }, }, }, ], ) for choice in result.choices: for tool_call in choice.message.tool_calls or []: args = json.loads(tool_call.function.arguments) pixel_x, pixel_y = scale_coordinates( args["x"], args["y"], VIEWPORT_WIDTH, VIEWPORT_HEIGHT, ) print(f"Click at pixel ({pixel_x}, {pixel_y})") ``` ``` import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.tzafon.ai/v1", apiKey: "sk_your_api_key_here", }); const VIEWPORT_WIDTH = 1280; const VIEWPORT_HEIGHT = 720; const SYSTEM_PROMPT = `You are controlling a computer through screenshots and actions. Screen information: - Viewport size: ${VIEWPORT_WIDTH}x${VIEWPORT_HEIGHT} pixels. - Coordinates range from (0,0) at the top-left to (999,999) at the bottom-right. - All coordinate values must be integers between 0 and 999 inclusive. When clicking or interacting with elements: - Look at the screenshot to find the element's position. - Return coordinates in the 0-999 range. They will be automatically scaled to the actual viewport. - Click elements in their CENTER, not on edges.`; const result = await client.chat.completions.create({ model: "tzafon.northstar-cua-fast", messages: [ { role: "system", content: SYSTEM_PROMPT }, { role: "user", content: [ { type: "text", text: "Click the search button" }, { type: "image_url", image_url: { url: screenshotUrl } }, ], }, ], tools: [ { type: "function", function: { name: "click", description: "Click at screen coordinates (0-999 range).", parameters: { type: "object", properties: { x: { type: "integer", description: "X position (0=left edge, 999=right edge)" }, y: { type: "integer", description: "Y position (0=top edge, 999=bottom edge)" }, }, required: ["x", "y"], }, }, }, ], }); for (const choice of result.choices) { for (const toolCall of choice.message.tool_calls ?? []) { const args = JSON.parse(toolCall.function.arguments); const [pixelX, pixelY] = scaleCoordinates( args.x, args.y, VIEWPORT_WIDTH, VIEWPORT_HEIGHT, ); console.log(`Click at pixel (${pixelX}, ${pixelY})`); } } ``` Northstar naturally outputs in a 0–999 grid. Defining custom coordinate ranges (like percentages 0–100) in your tool schema may be ignored — work with the 0–999 grid and scale on your side. If you’re building a computer-use loop from scratch, the [Responses API](/guides/responses-api/index.md) with the `computer_use` tool handles all of this for you — coordinate scaling, viewport management, and action formatting. Chat Completions gives you more flexibility but requires manual setup. ## Worked example Here’s coordinate scaling in action. We ask Northstar to click the search bar on a Wikipedia page. The model responds with `(379, 43)` in the 0–999 grid, which scales to pixel `(485, 30)` on a 1280×720 viewport: ![Annotated screenshot showing a red crosshair on the Wikipedia search bar, with the label model=(379,43) pixel=(485,30)](/_astro/coordinates-annotated.LhA3_GiF_Z1FX4Ne.webp) After clicking at the scaled pixel coordinates, the search bar is focused: ![Screenshot after clicking — the Wikipedia search bar is now active](/_astro/coordinates-after.D8ju1JiZ_LnboR.webp) Full source code for this example coordinate\_example.py ``` from tzafon import Lightcone from PIL import Image, ImageDraw import base64 import json import io VIEWPORT_WIDTH = 1280 VIEWPORT_HEIGHT = 720 SYSTEM_PROMPT = f"""\ You are controlling a computer via screenshots and actions. Screen information: - Viewport size: {VIEWPORT_WIDTH}x{VIEWPORT_HEIGHT} pixels. - Coordinates range from (0,0) at the top-left to (999,999) at the bottom-right. - All coordinate values must be integers between 0 and 999 inclusive. When you want to click on something: - Look at the screenshot to find the element. - Return the coordinates in the 0-999 range. They will be scaled to actual pixels. - Click elements in their CENTER, not on edges. Respond with JSON: {{"action": "click", "coordinate": [x, y], "reason": "..."}}""" def scale_coordinates(model_x, model_y, viewport_width, viewport_height): """Convert model coordinates (0-999) to actual pixel coordinates.""" x = int(model_x * (viewport_width - 1) / 999) y = int(model_y * (viewport_height - 1) / 999) return x, y def annotate_screenshot(image_bytes, pixel_x, pixel_y, model_x, model_y, output_path): """Draw a red crosshair + label on the screenshot to visualize the click.""" img = Image.open(io.BytesIO(image_bytes)).convert("RGB") draw = ImageDraw.Draw(img) r = 15 draw.ellipse((pixel_x - r, pixel_y - r, pixel_x + r, pixel_y + r), outline="red", width=3) draw.line((pixel_x - r, pixel_y, pixel_x + r, pixel_y), fill="red", width=2) draw.line((pixel_x, pixel_y - r, pixel_x, pixel_y + r), fill="red", width=2) label = f"model=({model_x},{model_y}) pixel=({pixel_x},{pixel_y})" draw.text((pixel_x + r + 5, pixel_y - 10), label, fill="red") img.save(output_path) client = Lightcone() with client.computer.create(kind="browser") as computer: computer.navigate("https://en.wikipedia.org/wiki/Ada_Lovelace") computer.wait(3) # Take a screenshot result = computer.screenshot() screenshot_url = computer.get_screenshot_url(result) # Download for local annotation import urllib.request with urllib.request.urlopen(screenshot_url) as resp: screenshot_bytes = resp.read() # Ask the model where to click response = client.responses.create( model="tzafon.northstar-cua-fast", instructions=SYSTEM_PROMPT, input=[{ "role": "user", "content": [ {"type": "input_text", "text": "Click the search bar."}, {"type": "input_image", "image_url": screenshot_url, "detail": "auto"}, ], }], ) # Parse the model's response for item in response.output: if item.type == "message": reply = json.loads(item.content[0].text) break model_x, model_y = reply["coordinate"] pixel_x, pixel_y = scale_coordinates(model_x, model_y, VIEWPORT_WIDTH, VIEWPORT_HEIGHT) print(f"Model: ({model_x}, {model_y}) → Pixel: ({pixel_x}, {pixel_y})") # Annotate the screenshot with a red crosshair annotate_screenshot(screenshot_bytes, pixel_x, pixel_y, model_x, model_y, "click_debug.png") # Click and capture the result computer.click(pixel_x, pixel_y) computer.wait(2) ``` Run it yourself: [`coordinate_scaling.py`](https://github.com/tzafon/lightcone/blob/main/examples/coordinate_scaling.py) ## Direct computer control When using the [Computers API](/guides/computers/index.md) directly (no model), coordinates are **raw pixel positions** — you decide where to click: ``` # You provide pixel coordinates directly computer.click(640, 360) # center of a 1280x720 screen computer.scroll(0, 300, 640, 400) # scroll down at position ``` ``` await client.computers.click(id, { x: 640, y: 360 }); await client.computers.scroll(id, { dx: 0, dy: 300, x: 640, y: 400 }); ``` There’s no scaling involved — what you send is what happens. ## See also - [**Responses API**](/guides/responses-api/index.md) — build a computer-use loop with automatic coordinate scaling - [**Computer-use loop**](/guides/cua-protocol/index.md) — full implementation guide with action dispatch - [**Computers**](/guides/computers/index.md) — direct control with pixel coordinates - [**Chat Completions**](/guides/chat-completions/index.md) — text generation (no coordinate scaling)