The default session recording approach is to capture a screenshot every few seconds. The result is a directory full of identical-looking frames — the user is reading the page, the user is reading the page, the user is still reading the page — interspersed with the actual events that mattered, somewhere in between two snapshots that arrived too late and showed only the consequence, not the action.
Periodic capture also misses the answer to the most important question for any audit or investigation: what did the user do? A snapshot taken 1.5 seconds after a click does not show what they clicked. A snapshot taken 3 seconds after a form submission does not show what they typed. The audit log has events without context; the screenshots have context without events. Connecting them is manual work that scales badly with session count.
ZeroLeak takes a different approach. Screenshots fire on the events themselves, not on a timer. Navigations capture two screenshots — the page before clicking, and the page after the new one loads — so the cause-and-effect is preserved as a pair. The mouse position is drawn into every capture, so you can see what was clicked. Keystrokes are buffered into readable words. Video is recorded continuously underneath so the gaps between captured events can still be replayed if needed.
Three recording subsystems run in parallel for every session: event-driven screenshots at consequential moments, continuous FFmpeg video at a configured frame rate, and a structured event log with word-buffered keystrokes and full clipboard activity. Each can be enabled or disabled per protected service; by default all three run together so reconstruction is always possible.
Captures are triggered by the user's actual actions — clicks, page navigations, key submissions, form submissions, clipboard operations. There is no time-based polling that fills the disk with idle frames. Every screenshot captures a consequential moment; the directory is short and informative, not long and noisy.
Every captured screenshot has the mouse position marked with a visible indicator — a red dot at the exact coordinates of the click or hover. You see what the user clicked, not just the page that resulted from the click. Single-frame reconstruction of intent, not a multi-frame puzzle.
When the user navigates, two screenshots are captured — one of the page before the click, and one of the destination page after the new content has fully loaded. The cause and effect are preserved as a pair. A reviewer sees "user clicked link X, the next page loaded was Y" as two adjacent frames.
Individual key events are accumulated into a buffer that flushes on space, enter, tab, or short pause. The result reads like the text the user typed, not a key-by-key event dump. Backspaces are preserved as [BS] markers so corrections are visible. Repeating keys (held down) are filtered. Clipboard operations are recorded separately with the actual content.
Each capture mechanism is independently configurable per protected service and runs without performance impact on the user session. The three streams (screenshots, video, event log) are timestamp-aligned so a reviewer can move between them seamlessly.
Screenshots fire on user clicks, navigations (with before-and-after pair), form submissions, key submissions (Enter), clipboard operations (copy, cut, paste), manually-triggered captures from the operator console, and several other consequential event types. The exact event list is configurable per protected service.
On navigation, the destination screenshot is captured only after the new page has stopped loading — waiting for network idle plus a short additional render delay. The captured screenshot shows the page as the user would actually see it, not a half-loaded intermediate state.
Underneath the event-driven screenshots, the session is recorded continuously as a video using FFmpeg's x11grab. Frame rate is configurable (default 10 fps for compact files; higher rates available for high-detail capture). Video is segmented for safe streaming and replay; segments are timestamped to align with the screenshot and event log streams.
Key events accumulate into a buffer that flushes on word boundaries (space, enter, tab) or after a short idle pause. The flushed string reads like prose — "hello world [BS][BS][BS][BS][BS]hi world" — preserving the user's intent and corrections without the noise of every individual keydown event.
Copy, cut, and paste operations are recorded separately from keystrokes, including the actual clipboard content involved. A reviewer can see exactly what was copied and what was pasted, not just that a clipboard event occurred.
Screenshots are numbered sequentially (0001, 0002, ...) and stored with the timestamp and event metadata. The operator console lists them with the triggering event so a reviewer can jump directly to the moment of interest — for example, all clipboard-triggered captures in a session, or the navigation pair around a specific URL.
Beyond screenshots and video, a structured event log captures the user's interaction history with full context. Each event has a type, a timestamp, an associated screenshot reference (when applicable), and the relevant payload. The log is what makes the recording searchable and analytically useful, not just visually browsable.
Each click event records the x/y coordinates and the DOM element under the cursor (tag, class, ID, text content when available). A reviewer can search for clicks on a specific button or link across the session log, not just scrub through screenshots looking for one.
Scroll position changes are recorded with throttling to avoid log spam. Enough resolution to reconstruct where the user was looking on long pages without producing thousands of redundant scroll events per minute.
Every navigation — full page loads, single-page-app pushState changes, programmatic location changes — is logged with the source URL, destination URL, trigger (link click, manual, programmatic), and the time it took to complete. SPA navigations that traditional logging misses are captured by ZeroLeak's URL polling layer.
When the user submits a form, the event log captures the form's action URL, method, and field names. The actual field values are captured through the keystroke log (so the typing history is preserved) rather than duplicated in the form event.
The session tracks user activity at the input layer. Transitions between active and idle states are recorded with timestamps so reviewers can see periods of attention vs. inactivity — useful for compliance review and time-based audit questions.
On session end, summary metadata is recorded: total duration, idle-vs-active breakdown, total events of each type, screenshots produced, video file references, and the termination reason (timeout, manual end, error). If a coordinator webhook is configured, this summary fires to it as well.
Regulators asking what specific users did during specific sessions — HIPAA review of patient-record access, financial trading floor compliance, government data-handling audits. Event-driven capture produces a short, informative trail per session that reviewers can navigate quickly.
When a leak or policy violation is suspected, the investigator needs to know exactly what the user did — not just when. Sequential screenshots with mouse markers, word-buffered keystrokes and full clipboard content make the session replayable in detail without watching hours of video.
External users granted view-only access into your environment. Per-session full recording — with the visible watermark identifying the user — gives downstream accountability for everything they saw and every action they took during their access window.
After an unexpected event in a SCADA or operational console, the recording shows exactly what the operator was seeing and which controls they clicked in the moments before. The before-and-after pair on navigation makes diagnosing cascading errors much faster.
We will run a session, click some links, fill some forms, copy some content, and show you the resulting screenshot directory, keystroke log and video segments — and how a reviewer reconstructs the session from them.