Capability

Transparent to the human eye, a nightmare for AI screen readers

Letters in the rendered page are silently swapped with visually-similar siblings from the same alphabet — a turned into e, b turned into p, m turned into w. The area around the user's cursor reveals the originals, so a person reading the page experiences no change. An OCR engine or AI vision model fed a screenshot of that same page reads completely different words.

Pixel-level distortion is being eroded as a defence: modern AI vision models like GPT-4V, Claude Vision and Gemini are getting steadily better at reading text through noise, blur and adversarial patterns. The deeper attack surface is the text itself. ZeroLeak silently substitutes a portion of the letters in the rendered page with visually-similar but semantically different siblings from the same Latin alphabet. Where the user's cursor is, a reveal area flips the substituted text back to original — so the human reads where they look, and the brain's natural pattern recognition fills in the rest. When the same page is captured to a screenshot and fed to an AI model, the model has no reveal area; it reads the substituted characters as the actual content, and what it returns is a different document.

Same-script

Latin letters swapped with Latin neighbours — nothing for OCR normalisers to undo

Cursor-driven

A reveal area around the cursor shows the originals — the user reads where they look

DOM-layer

Runs at the character data, complementary to pixel-level distortion

Pixel-level defences are eroding; AI vision models read through them

Earlier defences against AI screen reading operated at the pixel layer — random noise, frequency-domain perturbation, sub-pixel jitter, micro blur, chromatic shifts. These are still effective against classical OCR engines, and they remain part of the layered defence. But modern Vision-Language Models — GPT-4V, Claude Vision, Gemini and their successors — have tokeniser-aware patch encoders that increasingly reconstruct words from partial visual evidence. The pixels can be noisy and the model still recovers the underlying text.

The next attack surface lives one layer up: the DOM. Inside the headless browser that renders the protected page, we own the actual character data being painted to the screen. We can decide that the letter at position 47 of the third paragraph is no longer the original; it is a visually-similar but different character from the same alphabet. The OCR or VLM looking at a screenshot of that page sees the substituted character and reports it as the truth. The model doesn't know it was substituted; it doesn't have anything to compare against.

The hard part is doing this without breaking reading for the human user. The reveal mechanism — a small area around the cursor that flips substituted text back to original — is the answer. People read by glancing: the eye fixates on one phrase, the brain's pattern-recognition fills in the rest from peripheral vision plus context. The cursor reveal aligns with where the user is actually looking; the rest of the page can stay substituted because the user isn't reading it character-by-character anyway. An AI model looking at the same screenshot has no such cursor, no such reveal — it reads everything as the substituted text.

Substitute the letters, reveal where the user is looking

A script is injected into every protected page by ZeroLeak's headless browser. At boot, the script walks the DOM, picks visually-similar substitutes for a portion of the letters from the same Latin alphabet, and writes them into the page. The cursor's position drives a reveal zone — wherever the user is looking, originals show through. Everything outside the reveal stays substituted.

Letters silently swapped with visually-similar siblings from the same alphabet

A character at one position in the page becomes another character that looks visually similar but is a different letter — a turned into e, b turned into p, m turned into w, n turned into u. The swap is between letters that share a visual family in the Latin alphabet, not between Unicode lookalikes. Why this distinction matters: OCR and AI vision pipelines normalise Unicode homoglyphs (Cyrillic a becomes Latin a) back to canonical Latin. Same-script swaps leave nothing for the normaliser to undo — the model reads the substituted character as the actual letter.

A reveal area around the cursor shows the originals

Wherever the user moves the cursor, an area around it (circle by default, configurable as a horizontal band) flips the substituted characters back to their originals. The user reads where they look; the brain's natural pattern recognition handles the rest from peripheral vision. Outside the reveal, the page stays substituted — which is what an AI model looking at the screenshot sees.

Static cipher — each character gets one substitution that doesn't change

When the page loads, each substituted letter gets one specific replacement that stays stable for the lifetime of the page. There is no flicker, no temporal rotation, no animation in the user's vision — the cipher sits silently behind the cursor reveal. An earlier design rotated the cipher every few frames; user fatigue testing showed it caused measurable reading discomfort, so the final design stays static.

Runs at the DOM layer, complementary to pixel-level defences

Text cipher operates inside the headless browser's DOM — directly on the character data, not on the rendered pixels. Pixel-layer defences against AI vision models keep working underneath; text cipher adds an orthogonal attack surface that pixel-recovery techniques cannot help with. An attacker who breaks the pixel layer still has to read the substituted text correctly. An attacker who somehow recovered the original characters still has to defeat the pixel layer.

What the cipher actually does

Each behaviour below is part of the production design after the empirical-fatigue revision; the live implementation matches this exactly. Configuration is per protected service via the operator console.

Same-script Latin substitution, hand-curated table

Substitutions stay inside the Latin alphabet. The substitution table is curated by visual family: round bowls (a, e, o, c), mirrored bowls (b, p, d, q), narrow verticals (i, l, j, 1), arches (m, n, u, h, w), descenders (g, y, j, q). Each letter in the table has 2-4 visual neighbours; substitution picks one of them. Width-matching is favoured so layout reflow does not shift.

Reveal zone — circle around cursor or horizontal band

The default shape is a circle of configurable radius (200 px default). An alternative band shape covers a horizontal strip at the cursor's height — useful for reading long-line content where eye movement is mostly horizontal. The shape is shared with other ZeroLeak cursor-based effects so operators configure once.

Static cipher persisted in a per-node map

When the script first sees a text node, it stores the original value in a per-node map and writes the substituted value into the DOM. Subsequent updates use the same substitution — there is no per-frame rotation. The user perceives a still page; the cipher is invisible behind the cursor reveal.

Injected via headless-browser script hook

The substitution script is injected by the ZeroLeak engine into every navigated document inside the headless Chromium. The protected web application is not modified; the cipher runs as a page-side helper between the rendered document and the user's viewing layer. No coordination with the protected application's code is required.

User-typed form input is excluded

Text the user types into inputs, textareas, or contentEditable regions is excluded from the substitution. The protected application sees clean input as the user wrote it. Search boxes, message composition, form submission — all unaffected.

Only visible text in the viewport is touched

An IntersectionObserver tracks which text nodes are actually visible. Off-screen text is not substituted (the user cannot see it anyway). When the user scrolls a hidden section into view, substitution is applied just in time. This keeps the running cost proportional to what is on screen, not to total page size.

The substitution table and what stays untouched

Same-script visual substitution is the heart of the technique. The table below is a sample of the actual production mapping; the full table covers the full lowercase, uppercase, and selected digits.

Round bowls — a, e, o, c

These four characters all share the closed-bowl shape; substituting one for another preserves the silhouette at reading distance. A word like 'data' might become 'doto' in the substituted form — a human glancing at it through the cursor reveal reads 'data' instantly, an OCR or AI model reading the substituted form returns 'doto'.

Mirrored bowls — b, p, d, q

These four are visual mirrors of each other; substituting one for another preserves the vertical-stem + bowl pattern. The word 'database' might become 'patabose' in cipher form — visually close enough that the brain's pattern recognition recovers the original, semantically unrelated enough that an AI model reading it returns the wrong word.

Arches — m, n, u, h, w

These five share the arch / inverted-arch / repeated-arch pattern; substitution within this family preserves the overall rhythm of the text. 'human' might become 'wuwon' — readable at a glance under the cursor, unrecognisable to an AI reading the substituted text.

Why not Unicode homoglyphs (Cyrillic, Greek)

Earlier proposals used Unicode confusables (Cyrillic а for Latin a, Greek ο for Latin o). These were rejected because OCR pipelines and AI vision models normalise these back to canonical Latin — Tesseract with a Russian language pack on Latin-Cyrillic mixed text returns clean Latin output, because the language-model second pass projects Cyrillic homoglyphs onto their Latin equivalents. Same-script substitution leaves nothing to normalise away.

What stays untouched — canvas, SVG text, form input

Text rendered inside HTML5 canvas or SVG is not part of the DOM text-node set; the cipher does not touch it. Likewise, text the user types into inputs and textareas stays clean. These coverage gaps are deliberate: canvas and SVG content are handled by the pixel-layer defences in parallel, and form input must stay clean for the protected application to function.

Where text cipher closes the path

AI vision models capturing the screen

Users with personal AI assistants on their phone or alongside their workstation — anyone can paste a screenshot into GPT-4V or Claude Vision and ask it to summarise. With text cipher in place, the AI summary is built on the substituted text — it returns plausible-looking output that is, on inspection, different from what was on the original screen.

Financial statements and deal-room documents

Documents read on screen but not meant to be exfiltrated through an AI tool. A screenshot taken and fed into an AI for analysis returns garbled numbers and altered names — the AI confidently reports content that does not match the actual document.

Patient records analysed by AI assistants

Medical staff with view-only access to patient records cannot meaningfully use an external AI to summarise or query the data — the AI sees substituted text. Clinical insight stays inside the protected environment; the AI ingestion path returns a different document.

Government and intelligence consoles

Classified content viewed by analysts. Any AI tool consulted from outside the protected environment reads substituted text, not the original classified material. The disclosure boundary holds at the AI-ingestion path the same way it holds at the screenshot path.

Common questions

Does the substitution affect how easily a human can read the page?

Reading happens at the cursor. The reveal area shows originals where the user is looking; the brain's natural pattern recognition handles the rest from peripheral vision, where readability is anyway lower. Empirical reading-fatigue testing led to the current static (non-rotating) cipher; the earlier rotating design caused measurable discomfort, the static version does not.

What about touch devices that do not have a cursor?

Touch-only access is uncommon in the typical ZeroLeak deployment (enterprise operators, analysts, contractors on desktop workstations), but for touch contexts the reveal mechanism reverts to a tap-and-hold model where the touched area is the reveal zone. For workflows that are predominantly touch and reading-heavy, text cipher may be disabled per protected service in favour of the pixel-layer defences alone.

Why not use Cyrillic or Greek lookalikes?

OCR engines and AI vision models normalise Unicode homoglyphs back to canonical Latin during their language-model stage. A Cyrillic 'а' inserted into Latin text gets folded back to a regular Latin 'a' by the time the output is produced — the substitution leaves no trace. Same-script Latin substitution (a turned into e) has nothing to normalise away; the substituted character is the canonical letter the model reads.

What does the cipher NOT cover?

Text rendered inside HTML5 canvas or SVG is not part of the DOM text-node set; the cipher does not touch it. Pixel-layer defences cover those surfaces. Text inside form inputs and textareas that the user typed is also excluded so the protected application receives clean input. Images of text (photographs, screenshots embedded in the page) are also outside the DOM-text scope.

How does this combine with anti-OCR pixel defences?

The two are complementary, not overlapping. Pixel-layer defences (anti-OCR protection) disturb how the rendered image is read by character recognition. Text cipher works one layer up — even if an AI vision model bypasses every pixel-level defence and reads the page cleanly, it reads the substituted text. To defeat both, an attacker would need to break the pixel layer to see clean characters AND recover the original underlying letters from the substituted ones — two orthogonal problems.

What is the performance impact on the user session?

The substitution script runs inside the headless browser's renderer, only on text nodes that are actually visible in the viewport. Cursor-driven reveal updates throttle to the browser's animation frame and use a movement threshold so small mouse movements do not trigger redraws. On commodity hardware the cipher adds a small fraction of a millisecond per frame; user-visible interaction stays smooth.

See the cipher in a live demo

We will load a page, move the cursor over the text so you see normal reading, take a screenshot, and feed the screenshot to Tesseract, GPT-4V and Claude Vision — and show you the very different text each one returns.