Earlier defences against AI screen reading operated at the pixel layer — random noise, frequency-domain perturbation, sub-pixel jitter, micro blur, chromatic shifts. These are still effective against classical OCR engines, and they remain part of the layered defence. But modern Vision-Language Models — GPT-4V, Claude Vision, Gemini and their successors — have tokeniser-aware patch encoders that increasingly reconstruct words from partial visual evidence. The pixels can be noisy and the model still recovers the underlying text.
The next attack surface lives one layer up: the DOM. Inside the headless browser that renders the protected page, we own the actual character data being painted to the screen. We can decide that the letter at position 47 of the third paragraph is no longer the original; it is a visually-similar but different character from the same alphabet. The OCR or VLM looking at a screenshot of that page sees the substituted character and reports it as the truth. The model doesn't know it was substituted; it doesn't have anything to compare against.
The hard part is doing this without breaking reading for the human user. The reveal mechanism — a small area around the cursor that flips substituted text back to original — is the answer. People read by glancing: the eye fixates on one phrase, the brain's pattern-recognition fills in the rest from peripheral vision plus context. The cursor reveal aligns with where the user is actually looking; the rest of the page can stay substituted because the user isn't reading it character-by-character anyway. An AI model looking at the same screenshot has no such cursor, no such reveal — it reads everything as the substituted text.
A script is injected into every protected page by ZeroLeak's headless browser. At boot, the script walks the DOM, picks visually-similar substitutes for a portion of the letters from the same Latin alphabet, and writes them into the page. The cursor's position drives a reveal zone — wherever the user is looking, originals show through. Everything outside the reveal stays substituted.
A character at one position in the page becomes another character that looks visually similar but is a different letter — a turned into e, b turned into p, m turned into w, n turned into u. The swap is between letters that share a visual family in the Latin alphabet, not between Unicode lookalikes. Why this distinction matters: OCR and AI vision pipelines normalise Unicode homoglyphs (Cyrillic a becomes Latin a) back to canonical Latin. Same-script swaps leave nothing for the normaliser to undo — the model reads the substituted character as the actual letter.
Wherever the user moves the cursor, an area around it (circle by default, configurable as a horizontal band) flips the substituted characters back to their originals. The user reads where they look; the brain's natural pattern recognition handles the rest from peripheral vision. Outside the reveal, the page stays substituted — which is what an AI model looking at the screenshot sees.
When the page loads, each substituted letter gets one specific replacement that stays stable for the lifetime of the page. There is no flicker, no temporal rotation, no animation in the user's vision — the cipher sits silently behind the cursor reveal. An earlier design rotated the cipher every few frames; user fatigue testing showed it caused measurable reading discomfort, so the final design stays static.
Text cipher operates inside the headless browser's DOM — directly on the character data, not on the rendered pixels. Pixel-layer defences against AI vision models keep working underneath; text cipher adds an orthogonal attack surface that pixel-recovery techniques cannot help with. An attacker who breaks the pixel layer still has to read the substituted text correctly. An attacker who somehow recovered the original characters still has to defeat the pixel layer.
Each behaviour below is part of the production design after the empirical-fatigue revision; the live implementation matches this exactly. Configuration is per protected service via the operator console.
Substitutions stay inside the Latin alphabet. The substitution table is curated by visual family: round bowls (a, e, o, c), mirrored bowls (b, p, d, q), narrow verticals (i, l, j, 1), arches (m, n, u, h, w), descenders (g, y, j, q). Each letter in the table has 2-4 visual neighbours; substitution picks one of them. Width-matching is favoured so layout reflow does not shift.
The default shape is a circle of configurable radius (200 px default). An alternative band shape covers a horizontal strip at the cursor's height — useful for reading long-line content where eye movement is mostly horizontal. The shape is shared with other ZeroLeak cursor-based effects so operators configure once.
When the script first sees a text node, it stores the original value in a per-node map and writes the substituted value into the DOM. Subsequent updates use the same substitution — there is no per-frame rotation. The user perceives a still page; the cipher is invisible behind the cursor reveal.
The substitution script is injected by the ZeroLeak engine into every navigated document inside the headless Chromium. The protected web application is not modified; the cipher runs as a page-side helper between the rendered document and the user's viewing layer. No coordination with the protected application's code is required.
Text the user types into inputs, textareas, or contentEditable regions is excluded from the substitution. The protected application sees clean input as the user wrote it. Search boxes, message composition, form submission — all unaffected.
An IntersectionObserver tracks which text nodes are actually visible. Off-screen text is not substituted (the user cannot see it anyway). When the user scrolls a hidden section into view, substitution is applied just in time. This keeps the running cost proportional to what is on screen, not to total page size.
Same-script visual substitution is the heart of the technique. The table below is a sample of the actual production mapping; the full table covers the full lowercase, uppercase, and selected digits.
These four characters all share the closed-bowl shape; substituting one for another preserves the silhouette at reading distance. A word like 'data' might become 'doto' in the substituted form — a human glancing at it through the cursor reveal reads 'data' instantly, an OCR or AI model reading the substituted form returns 'doto'.
These four are visual mirrors of each other; substituting one for another preserves the vertical-stem + bowl pattern. The word 'database' might become 'patabose' in cipher form — visually close enough that the brain's pattern recognition recovers the original, semantically unrelated enough that an AI model reading it returns the wrong word.
These five share the arch / inverted-arch / repeated-arch pattern; substitution within this family preserves the overall rhythm of the text. 'human' might become 'wuwon' — readable at a glance under the cursor, unrecognisable to an AI reading the substituted text.
Earlier proposals used Unicode confusables (Cyrillic а for Latin a, Greek ο for Latin o). These were rejected because OCR pipelines and AI vision models normalise these back to canonical Latin — Tesseract with a Russian language pack on Latin-Cyrillic mixed text returns clean Latin output, because the language-model second pass projects Cyrillic homoglyphs onto their Latin equivalents. Same-script substitution leaves nothing to normalise away.
Text rendered inside HTML5 canvas or SVG is not part of the DOM text-node set; the cipher does not touch it. Likewise, text the user types into inputs and textareas stays clean. These coverage gaps are deliberate: canvas and SVG content are handled by the pixel-layer defences in parallel, and form input must stay clean for the protected application to function.
Users with personal AI assistants on their phone or alongside their workstation — anyone can paste a screenshot into GPT-4V or Claude Vision and ask it to summarise. With text cipher in place, the AI summary is built on the substituted text — it returns plausible-looking output that is, on inspection, different from what was on the original screen.
Documents read on screen but not meant to be exfiltrated through an AI tool. A screenshot taken and fed into an AI for analysis returns garbled numbers and altered names — the AI confidently reports content that does not match the actual document.
Medical staff with view-only access to patient records cannot meaningfully use an external AI to summarise or query the data — the AI sees substituted text. Clinical insight stays inside the protected environment; the AI ingestion path returns a different document.
Classified content viewed by analysts. Any AI tool consulted from outside the protected environment reads substituted text, not the original classified material. The disclosure boundary holds at the AI-ingestion path the same way it holds at the screenshot path.
We will load a page, move the cursor over the text so you see normal reading, take a screenshot, and feed the screenshot to Tesseract, GPT-4V and Claude Vision — and show you the very different text each one returns.