When you select or drop a video file, a Video Keyframe Modal opens near-fullscreen (95% of the viewport). This modal lets you scrub through the video and capture specific frames to send to the agent as image attachments — no external tools or ffmpeg required.
When the modal opens, the video file is fetched as a single download and held in browser memory. This means:
The top of the modal shows a video player that fills the available vertical space. The video filename is shown in the header toolbar, using all available space before truncating with an ellipsis.
There are several ways to control video playback:
When toggling play or pause, a circular icon briefly appears in the centre of the video preview — a play triangle or pause bars — then scales up and fades out. This gives clear visual feedback that the state has changed, similar to YouTube or other video players.
Below the video preview, a thumbnail filmstrip replaces a traditional slider. When the video loads, small JPEG thumbnails are extracted at regular intervals (roughly one every 2 seconds, between 10 and 60 thumbnails depending on video length) and displayed as a continuous horizontal strip. This gives you a visual overview of the entire video timeline at a glance.
While the filmstrip is being generated, a loading overlay covers the video preview showing "Generating timeline..." with a progress bar that fills as each thumbnail is extracted. This prevents the video from visually jumping through seek positions during extraction. Thumbnails are generated progressively — each slot starts as a shimmering placeholder and fills in as its thumbnail is ready.
requestAnimationFrame for smooth motion)The scrubber preserves your playback state:
This means you can quickly reposition the playhead during playback without interrupting the flow, or precisely position the playhead while paused for accurate keyframe capture.
Below the filmstrip:
M:SS.TTT format.Each added keyframe appears in an ordered list below the controls, sorted by timestamp. The list dynamically resizes to fit the keyframes — compact when empty, growing as keyframes are added, and never exceeding one third of the modal's height. When the list exceeds that limit it scrolls.
| Element | Description |
|---|---|
| Radio button | Select this keyframe for annotation editing (crop/shapes). Click again to deselect. The selected row is highlighted with an accent border. |
| Thumbnail | A small preview of the captured frame. Click to seek the video to that timestamp. |
| Timestamp | The exact time in M:SS.TTT format |
| Annotation badges | Shows "crop" and/or "1 shape" / "N shapes" when the keyframe has annotations applied. |
| Caption input | A text field where you can describe what the frame shows. Captions are included in the prompt text sent to the agent. Press Enter to finish editing. Focusing a caption automatically selects its keyframe for annotation. |
| Remove button | Removes the keyframe from the list |
When a keyframe is selected via its radio button, the video pauses and seeks to that keyframe's timestamp, and an annotation toolbar appears on the right side of the transport controls. This lets you crop the frame and draw shapes to highlight specific areas before extraction.
The toolbar contains four tool buttons and, when a shape tool is active, a row of colour swatches:
| Tool | Shortcut | Description |
|---|---|---|
| Select | V | Click on shapes to select them, drag to reposition. |
| Crop | C | Draw a crop marquee to define the region of the frame to extract. |
| Rectangle | R | Draw a rectangle outline on the frame. |
| Circle | O | Draw a circle/ellipse outline on the frame. |
The colour swatches (8 preset colours) appear only when the Rectangle or Circle tool is active. Select a colour before drawing a shape.
With the Crop tool active:
With the Rectangle or Circle tool active:
In Select mode:
Each keyframe stores its own crop and shapes independently. Annotation badges on the keyframe row show whether a crop or shapes have been applied.
When you're satisfied with your keyframe selections, click Extract Frames (N) to process them:
<canvas> APIVideo keyframes from "demo.mp4":
1. The login screen with the error banner visible
2. After clicking submit — the spinner appears
3. The success confirmation page
Captions left empty default to the timestamp (e.g. [1:23.400]). The caption text gives the agent context about what each frame shows, so it can reason about the visual content in sequence.
During extraction the submit button shows a spinner and the modal cannot be dismissed. Extraction is sequential (one frame at a time) to avoid excessive memory usage. Once complete, the modal closes automatically.
When the video was browser-uploaded (drag-and-drop in a non-Electron environment), the uploaded video source file is automatically deleted from the session's uploads directory when you either confirm or cancel the keyframe editor. Only browser-uploaded files are cleaned up — videos opened from their original filesystem path (via the file picker or Electron drag-and-drop) are never touched. The server enforces this by rejecting deletion requests for any path outside the session's uploads directory.
All keyboard shortcuts work regardless of which element has focus, except when typing in a caption input field.
| Key | Action |
|---|---|
| Space | Toggle play/pause |
| Enter | Add a keyframe at the current position |
| Arrow Left | Step back by one frame (1/30th of a second) |
| Shift+Arrow Left | Step back by 1 second |
| Arrow Right | Step forward by one frame (1/30th of a second) |
| Shift+Arrow Right | Step forward by 1 second |
| Enter (in caption) | Finish editing caption |
| Escape | Close the modal |
| V | Switch to Select tool (when a keyframe is selected) |
| C | Switch to Crop tool (when a keyframe is selected) |
| R | Switch to Rectangle tool (when a keyframe is selected) |
| O | Switch to Circle tool (when a keyframe is selected) |
| Backspace / Delete | Remove selected shape, or remove crop (when in crop mode) |