Every well-run organisation already controls the easy data paths. Copy/paste is monitored, file downloads are logged, emails are scanned, printing is restricted. These controls handle the bulk of accidental exposure.
The one path they don't handle is the screenshot. A user with legitimate view-only access takes a picture of the screen — with the phone in their hand, with the operating system's screenshot key, with a screen-recording tool on a separate device. The image leaves your environment as pixels. Pixels are not a copy-paste event, not a file download, not an email — none of the existing controls see them as a data exit.
What a user can do with that picture has changed dramatically in the last few years. Classical OCR engines (Tesseract, AWS Textract and similar) extract text from a rendered image at production accuracy. Modern AI vision models — GPT-4V, Claude Vision, Gemini — go further: they read the picture as a whole, infer context, reconstruct tables and forms, sometimes recovering more from a screenshot than a trained human would.
The defences commonly proposed for this — visible watermarks, copy-disabled DOM, screen-recording detection, MDM clipboard control — all rest on assumptions that no longer hold. A visible watermark does not stop OCR from reading the text around it. A copy-disabled DOM does not stop a phone camera. Screen-recording detection does not see a second device.
The only surface that genuinely closes this path is the pixel layer itself — you modify what the user's screen actually shows so the image, once extracted, no longer contains recoverable text.
ZeroLeak runs the protected web application inside a server-side browser and applies pixel-level modifications to the rendered frames before they reach the user. The user's browser receives a pixel stream — not HTML, not DOM, not JavaScript — so the only way to extract content is to screenshot what is on the screen. That image, by design, is unreadable to OCR and AI vision models.
The protected web application runs on ZeroLeak's servers, inside a headless browser. The user's browser never receives the page's HTML, JavaScript or DOM — only a pixel stream of the rendered page, like watching a video. Clicks and keyboard input flow back to the server-side browser, and the resulting render flows back to the user.
Every page rendered on the server passes through eight independent pixel modifications before it leaves for the user. Tiny noise the human eye cannot detect, thin lines placed at character boundaries, colour-channel separation and related techniques are applied together. Each one targets a different stage of how OCR engines work; layered, what one engine survives, the next does not.
Models like GPT-4V, Claude Vision and Gemini look at an image holistically — not letter by letter, but reading layout, headings and tables at once. Letter-level pixel changes alone are not enough against them. ZeroLeak divides the rendered image into small tiles that rotate faster than the human eye can detect, and pixelates each tile with an independent pattern. The human visual system fuses the rotation into a stable, readable image; an AI vision model cannot extract stable text or layout from any single frame.
Anti-OCR runs alongside ZeroLeak's other display-layer protections inside the same configuration. A page can carry a hidden trace ID embedded into the pixels that survives cropping and re-scaling (forensic watermark), and can substitute DOM characters with visually-equivalent ones so copy-paste returns nonsense (text cipher). Each layer is enabled or disabled independently per protected service.
Each technique below is applied to the rendered page on the server before the pixel stream reaches the user's browser. Each one targets a different weakness in how OCR engines and AI vision models work. Layered together, the surface an attacker has to defeat in the same image is much larger than any single technique alone.
Every pixel receives a very small, randomised change in intensity. A human reading the page experiences it as faintly textured at most — nothing that affects readability. The clean, consistent pixel boundaries that OCR engines depend on for character recognition, however, are no longer there.
Classical and modern OCR engines find characters by detecting edges and stroke widths in a specific spatial frequency band. A targeted perturbation is applied in that band. The overall image structure stays visually clean for a human reader; the OCR's edge detector cannot find reliable character boundaries.
Text rendering is varied slightly across the colour channels so the same character is reconstructed differently in red, green and blue. The human eye fuses the channels into one clear character. Most OCR pipelines first reduce the image to grayscale — and in doing so lose the cross-channel stroke information they needed.
Knowing where one character ends and the next begins is a critical step for OCR. Thin lines, looking like faint background texture to a human, are placed exactly at these character-segmentation boundaries. The human reader skips them as background; OCR misreads two letters as one or one as two, segmenting the text incorrectly.
Every character is displaced by a sub-pixel amount the human eye cannot perceive. Reading is unaffected. OCR engines that rely on consistent baselines and character positioning for recognition lose accuracy.
Inside each character's own area, a bounded amount of pixel permutation is applied. At reading distance the character looks the same to the eye. The per-pixel statistics OCR uses to classify whether a glyph is 'A' or 'R' are degraded.
Engineered patterns nearly invisible to the human eye are placed in the non-text regions of the page. They interfere with the stage where OCR first locates text regions — the contrast between text and background drops in the OCR's perception, so it cannot reliably find where the text begins.
A representative OCR or AI vision model is used as the target; pixel modifications are computed against that model's internal gradient. The result generalises to other models in the same family. Layered with the seven techniques above, the combined effect is greater than any single technique alone.
AI vision models (GPT-4V, Claude Vision, Gemini and similar) read an image differently from OCR. Instead of letter by letter, they read the page as a whole and reconstruct layout, tables and forms. The eight pixel techniques above stop classical OCR reliably, but they don't fully neutralise these holistic readers. Tile rotation is the layer built for that audience.
Every rendered image is divided into an invisible grid (typically 3×3). Each tile is pixelated with its own pattern and its own pattern phase. No two tiles share structure.
The pixelation pattern rotates between tiles faster than the human visual system can pick up, but slow enough that the eye still fuses the frames into a stable, readable image. The user reads the page normally. An AI vision model looking at a single frame, or a short sequence, cannot find a stable text or layout structure.
Tiles do not share patterns or phases between them. An AI model that breaks one tile's pixelation cannot apply that knowledge to its neighbours. Solving the whole image means solving every tile separately — work that scales linearly with tile count.
Tile rotation is not a standalone defence — it sits on top of the eight-technique pixel layer. Even an AI model that partly defeats the tile layer still faces an eight-technique layer underneath targeting character-level recognition. The attacker has to defeat two different classes of defence in the same image.
The grid density, rotation rate, pattern intensity and grid geometry are configurable per protected web service. Highly sensitive content (legal files, financial statements) can be tightened; routine content can run at baseline settings.
Visible watermarks and copy-disabled DOM elements can be cropped out or filtered — there is still a clean image underneath. Tile rotation modifies the actual pixels of the content itself; no clean image is hiding underneath.
Clinical staff need to read patient data on screen as part of their work, but the data must not leave the institution. With ZeroLeak the data is readable inside the session, and any screenshot taken yields no recoverable information. Aligns with HIPAA's minimum-necessary disclosure for view-only roles.
Content that has to be read but should not end up in someone's phone — financial statements, contract drafts, due-diligence files. Anti-OCR and tile rotation together keep documents readable on screen while making any screenshot useless.
Classified content in front of analysts who need to see it but must not extract it. Pixel-level modification closes the screenshot path at the same boundary as the access policy itself.
External users granted temporary view-only access into a customer dashboard, audit interface or research console. The user reads the content; any screenshot they take leaves with no usable text.
Researchers need to read trial data, patient records and lab results. The disclosure boundary often forbids extraction. Anti-OCR converts that boundary from a policy into a technical control.
Insider risk programmes can no longer assume that an extracted screenshot is harmless. With AI vision models in everyone's pocket, any visible content on screen is a potential exfiltration vector. Anti-OCR and tile rotation bring the risk profile back to what it used to be in pre-AI, humans-only environments.
See ZeroLeak's anti-OCR pixel layer and tile rotation in a live demo. We'll run the same page through Tesseract, AWS Textract, GPT-4V and Claude Vision — and show what comes out the other side.