Capability

Sensitive content stays readable on screen — and unreadable in a screenshot

When someone with view-only access takes a picture of the screen, the image leaves your environment as pixels — and today's AI can extract text from pixels in seconds. ZeroLeak modifies those pixels so the person sitting at the screen reads the content easily, but the same image fed to an OCR engine or AI vision model produces nonsense.

Most organisations already control the obvious data-exit paths — copy/paste, file downloads, email forwarding. The one path that stays open is the screenshot. A user takes a picture of the screen with the phone in their hand, with the operating system's screenshot key, or with a second device — and none of the traditional DLP controls see it. Extracting information from that picture used to require human effort. Today, models like GPT-4V and Claude Vision look at a screenshot whole and reconstruct tables, contracts and forms in seconds. ZeroLeak closes this path at the pixel itself.

8 layers
Pixel-level techniques applied to every page rendered on the server
AI vision models
Tile rotation is the layer built specifically for them
Server-rendered
The user's browser receives pixels, not HTML, JavaScript, or DOM

The path traditional DLP cannot close

Every well-run organisation already controls the easy data paths. Copy/paste is monitored, file downloads are logged, emails are scanned, printing is restricted. These controls handle the bulk of accidental exposure.

The one path they don't handle is the screenshot. A user with legitimate view-only access takes a picture of the screen — with the phone in their hand, with the operating system's screenshot key, with a screen-recording tool on a separate device. The image leaves your environment as pixels. Pixels are not a copy-paste event, not a file download, not an email — none of the existing controls see them as a data exit.

What a user can do with that picture has changed dramatically in the last few years. Classical OCR engines (Tesseract, AWS Textract and similar) extract text from a rendered image at production accuracy. Modern AI vision models — GPT-4V, Claude Vision, Gemini — go further: they read the picture as a whole, infer context, reconstruct tables and forms, sometimes recovering more from a screenshot than a trained human would.

The defences commonly proposed for this — visible watermarks, copy-disabled DOM, screen-recording detection, MDM clipboard control — all rest on assumptions that no longer hold. A visible watermark does not stop OCR from reading the text around it. A copy-disabled DOM does not stop a phone camera. Screen-recording detection does not see a second device.

The only surface that genuinely closes this path is the pixel layer itself — you modify what the user's screen actually shows so the image, once extracted, no longer contains recoverable text.

How ZeroLeak closes the screenshot path

ZeroLeak runs the protected web application inside a server-side browser and applies pixel-level modifications to the rendered frames before they reach the user. The user's browser receives a pixel stream — not HTML, not DOM, not JavaScript — so the only way to extract content is to screenshot what is on the screen. That image, by design, is unreadable to OCR and AI vision models.

The application runs inside ZeroLeak, not in the user's browser

The protected web application runs on ZeroLeak's servers, inside a headless browser. The user's browser never receives the page's HTML, JavaScript or DOM — only a pixel stream of the rendered page, like watching a video. Clicks and keyboard input flow back to the server-side browser, and the resulting render flows back to the user.

Eight kinds of pixel-level changes confuse OCR engines

Every page rendered on the server passes through eight independent pixel modifications before it leaves for the user. Tiny noise the human eye cannot detect, thin lines placed at character boundaries, colour-channel separation and related techniques are applied together. Each one targets a different stage of how OCR engines work; layered, what one engine survives, the next does not.

A separate technique built specifically to defeat today's AI vision models

Models like GPT-4V, Claude Vision and Gemini look at an image holistically — not letter by letter, but reading layout, headings and tables at once. Letter-level pixel changes alone are not enough against them. ZeroLeak divides the rendered image into small tiles that rotate faster than the human eye can detect, and pixelates each tile with an independent pattern. The human visual system fuses the rotation into a stable, readable image; an AI vision model cannot extract stable text or layout from any single frame.

Works with the forensic watermark and copy-paste protection as one configuration

Anti-OCR runs alongside ZeroLeak's other display-layer protections inside the same configuration. A page can carry a hidden trace ID embedded into the pixels that survives cropping and re-scaling (forensic watermark), and can substitute DOM characters with visually-equivalent ones so copy-paste returns nonsense (text cipher). Each layer is enabled or disabled independently per protected service.

The eight pixel-level techniques applied to every page

Each technique below is applied to the rendered page on the server before the pixel stream reaches the user's browser. Each one targets a different weakness in how OCR engines and AI vision models work. Layered together, the surface an attacker has to defeat in the same image is much larger than any single technique alone.

Tiny random pixel noise the human eye cannot detect

Every pixel receives a very small, randomised change in intensity. A human reading the page experiences it as faintly textured at most — nothing that affects readability. The clean, consistent pixel boundaries that OCR engines depend on for character recognition, however, are no longer there.

Perturbation in the frequency band where OCR looks for character strokes

Classical and modern OCR engines find characters by detecting edges and stroke widths in a specific spatial frequency band. A targeted perturbation is applied in that band. The overall image structure stays visually clean for a human reader; the OCR's edge detector cannot find reliable character boundaries.

The same letter looks different across red, green and blue channels

Text rendering is varied slightly across the colour channels so the same character is reconstructed differently in red, green and blue. The human eye fuses the channels into one clear character. Most OCR pipelines first reduce the image to grayscale — and in doing so lose the cross-channel stroke information they needed.

Thin line overlay placed at character boundaries

Knowing where one character ends and the next begins is a critical step for OCR. Thin lines, looking like faint background texture to a human, are placed exactly at these character-segmentation boundaries. The human reader skips them as background; OCR misreads two letters as one or one as two, segmenting the text incorrectly.

Each character shifted by an amount smaller than a pixel

Every character is displaced by a sub-pixel amount the human eye cannot perceive. Reading is unaffected. OCR engines that rely on consistent baselines and character positioning for recognition lose accuracy.

Bounded pixel rearrangement inside each character

Inside each character's own area, a bounded amount of pixel permutation is applied. At reading distance the character looks the same to the eye. The per-pixel statistics OCR uses to classify whether a glyph is 'A' or 'R' are degraded.

Low-intensity interference patterns placed in the background

Engineered patterns nearly invisible to the human eye are placed in the non-text regions of the page. They interfere with the stage where OCR first locates text regions — the contrast between text and background drops in the OCR's perception, so it cannot reliably find where the text begins.

Perturbation computed against an OCR or AI model's own internal structure

A representative OCR or AI vision model is used as the target; pixel modifications are computed against that model's internal gradient. The result generalises to other models in the same family. Layered with the seven techniques above, the combined effect is greater than any single technique alone.

A separate layer for modern AI vision models — rotating tile pixelation

AI vision models (GPT-4V, Claude Vision, Gemini and similar) read an image differently from OCR. Instead of letter by letter, they read the page as a whole and reconstruct layout, tables and forms. The eight pixel techniques above stop classical OCR reliably, but they don't fully neutralise these holistic readers. Tile rotation is the layer built for that audience.

01

The image is divided into small tiles, each pixelated with its own pattern

Every rendered image is divided into an invisible grid (typically 3×3). Each tile is pixelated with its own pattern and its own pattern phase. No two tiles share structure.

02

Patterns rotate faster than the human eye can perceive

The pixelation pattern rotates between tiles faster than the human visual system can pick up, but slow enough that the eye still fuses the frames into a stable, readable image. The user reads the page normally. An AI vision model looking at a single frame, or a short sequence, cannot find a stable text or layout structure.

03

Each tile is independent — solving one does not help with the next

Tiles do not share patterns or phases between them. An AI model that breaks one tile's pixelation cannot apply that knowledge to its neighbours. Solving the whole image means solving every tile separately — work that scales linearly with tile count.

04

Layered on top of the eight pixel techniques

Tile rotation is not a standalone defence — it sits on top of the eight-technique pixel layer. Even an AI model that partly defeats the tile layer still faces an eight-technique layer underneath targeting character-level recognition. The attacker has to defeat two different classes of defence in the same image.

05

Tile count, rotation rate and intensity are configured per protected service

The grid density, rotation rate, pattern intensity and grid geometry are configurable per protected web service. Highly sensitive content (legal files, financial statements) can be tightened; routine content can run at baseline settings.

06

Works where visible watermarks and DOM-based controls fail

Visible watermarks and copy-disabled DOM elements can be cropped out or filtered — there is still a clean image underneath. Tile rotation modifies the actual pixels of the content itself; no clean image is hiding underneath.

Where this matters

Patient and clinical data displays for medical staff

Clinical staff need to read patient data on screen as part of their work, but the data must not leave the institution. With ZeroLeak the data is readable inside the session, and any screenshot taken yields no recoverable information. Aligns with HIPAA's minimum-necessary disclosure for view-only roles.

Financial statements, contracts and deal-room documents

Content that has to be read but should not end up in someone's phone — financial statements, contract drafts, due-diligence files. Anti-OCR and tile rotation together keep documents readable on screen while making any screenshot useless.

Operator consoles in government and intelligence work

Classified content in front of analysts who need to see it but must not extract it. Pixel-level modification closes the screenshot path at the same boundary as the access policy itself.

Contractor and third-party access into your environment

External users granted temporary view-only access into a customer dashboard, audit interface or research console. The user reads the content; any screenshot they take leaves with no usable text.

Research and clinical-trial data rooms

Researchers need to read trial data, patient records and lab results. The disclosure boundary often forbids extraction. Anti-OCR converts that boundary from a policy into a technical control.

Insider risk programmes in the AI era

Insider risk programmes can no longer assume that an extracted screenshot is harmless. With AI vision models in everyone's pocket, any visible content on screen is a potential exfiltration vector. Anti-OCR and tile rotation bring the risk profile back to what it used to be in pre-AI, humans-only environments.

Common questions

Do the anti-OCR techniques affect how easily a human can read the screen?
No. Every technique is calibrated to stay below the threshold the human visual system can detect. The random noise is small enough not to register, the sub-pixel shifts are below human displacement perception, the tile rotation rate sits above the visual system's flicker-fusion threshold. The user reads the page normally; OCR engines and AI vision models see a degraded input.
Does this work against GPT-4V, Claude Vision and Gemini Pro Vision?
The eight pixel techniques reliably stop classical OCR engines (Tesseract, AWS Textract, Google Cloud Vision, Azure Computer Vision). AI vision models are more resistant to character-only techniques because they read holistically — which is why tile rotation exists as a separate layer. Tile rotation breaks the layout reconstruction these models rely on; with the eight-technique layer underneath, even a model that partly defeats one layer still has to defeat the other.
Is the page still interactive — clicks, scrolls, forms?
Yes. ZeroLeak streams pixels to the browser but accepts input events back. Mouse clicks, keyboard input, scroll and form submissions flow back to the server-side headless browser, where they run against the protected application. The user experience is that of a regular browser tab; what changes is where the rendering and DOM live.
Could a determined attacker train an OCR model specifically against ZeroLeak's perturbations?
An attacker would need to collect a large dataset pairing perturbed and original images. The perturbation parameters rotate over time, the character-boundary line overlay is proprietary to TR7 and is not produced by any public OCR pipeline, and tile rotation's per-tile independence pushes the training set size far above what the pixel layer alone would require. In practice, building such a counter-model needs sustained access to the protected service and a budget comparable to the defender's — a non-trivial barrier.
What about text the user types into form fields — is that protected too?
Yes. The form fields render on the server-side browser; what the user sees is the pixelated rendering of the field, with the same anti-OCR techniques applied. A screenshot of a half-typed form does not yield recoverable text.
How does this relate to forensic watermarking and text cipher?
Anti-OCR makes a screenshot unreadable. Forensic watermarking makes a screenshot attributable — even after cropping and rescaling, a trace ID embedded into the pixels lets the operator identify which session produced the leak. Text cipher targets the copy-paste path: the DOM text characters are substituted with visually-equivalent ones, so the rendered text reads correctly to the eye but copies as nonsense. Each layer is independent; most deployments combine all three.
What is the performance impact?
The pixel techniques and tile rotation run inside the server-side rendering pipeline. Latency cost is small — a few milliseconds per frame on commodity hardware — and is parallelisable across rendering instances. Capacity scales with rendering infrastructure, not directly with user count, because the work happens where the rendering already happens.
Does the user need a special browser or plug-in?
No. The user opens any standard browser and connects to the protected URL. The pixel stream is delivered using standard web technologies; no native client, extension, or proprietary protocol on the user side.

Close the screenshot extraction path

See ZeroLeak's anti-OCR pixel layer and tile rotation in a live demo. We'll run the same page through Tesseract, AWS Textract, GPT-4V and Claude Vision — and show what comes out the other side.