device-usecase-privacy

Common PDF Redaction Mistakes (And How to Avoid Them)

2026-05-17 9 min read

Why PDF Redaction Fails More Often Than People Expect

Redacting a PDF sounds straightforward: cover the sensitive text, save the file, done. In practice, dozens of government agencies, law firms, and corporations have learned the hard way that it is anything but simple. The U.S. Department of Justice released a redacted court filing in 2019 where the underlying text of blacked-out passages was trivially recoverable by copying and pasting into a text editor. A similar incident in 2021 exposed names of confidential informants in a federal case. These were not amateur mistakes — they were made by people who thought they had followed the right steps. The core problem is that a PDF is not a photograph. It is a structured document containing layers: text streams, image objects, metadata, annotations, form fields, and embedded fonts. When most people 'redact' a PDF, they add a black rectangle on top of the text. That rectangle is a cosmetic overlay. The original text data still exists underneath it in the file's content stream, fully readable by anyone who removes the overlay or copies the raw text. True redaction means permanently destroying the underlying data — not hiding it. This article covers the most common mistakes people make when redacting PDFs, explains the technical reasons those mistakes are dangerous, and gives you concrete steps to avoid them. Some of these fixes require dedicated redaction software; others are simple workflow changes anyone can make.

Mistake #1: Using Drawing Tools or Black Boxes Instead of Real Redaction

This is the most widespread mistake, and it is the one that causes the most serious data leaks. When you open a PDF in Adobe Acrobat, Preview on macOS, or most browser-based PDF editors and draw a filled black rectangle over sensitive text, you are creating an annotation layer. The text underneath remains intact in the document's content stream. To verify this yourself: open any PDF with a black box drawn over text, press Ctrl+A to select all, then Ctrl+C to copy, and paste into Notepad or TextEdit. In many cases the hidden text appears in plain sight. Alternatively, open the file in a tool like pdftotext (a free command-line utility) and the text streams are dumped with no regard for any visual overlays. The correct approach in Adobe Acrobat Pro is to use the dedicated Redact tool, found under Tools > Redact > Mark for Redaction. After marking all content, you must click 'Apply Redactions' — this step actually removes the data from the content stream and flattens the document. Skipping 'Apply' and just saving the file leaves the marks as annotations, not permanent deletions. Acrobat will prompt you to also sanitize the document, which strips metadata; accept that prompt. If you are using a tool that only lets you draw shapes, you do not have a redaction tool — you have a drawing tool. Stop there and find proper software before sharing the file.

Mistake #2: Ignoring Metadata, XMP Data, and Document Properties

Even when the visible text is correctly redacted, the PDF file itself carries a substantial amount of hidden information in its metadata. This includes the document title, the author's name, the software used to create it, revision history, creation and modification timestamps, and sometimes the name of the original file. In legal and investigative contexts, this metadata can be just as sensitive as the content you blacked out. PDF metadata lives in two places: the Document Information Dictionary (accessible in Acrobat under File > Properties > Description) and the XMP metadata packet embedded in the file. Both need to be cleared. Acrobat Pro's Sanitize Document function (Tools > Redact > Sanitize Document) handles both in one step and also removes embedded scripts, hidden layers, and form field data. A concrete example: a law firm redacts a client's Social Security number from a PDF but forgets to strip metadata. The document properties still show the original filename — 'Johnson_SSN_Verification_2025.pdf' — which reveals the client's name and the nature of the document to anyone who checks File > Properties in any PDF reader. For users working with tools outside of Acrobat, the open-source tool ExifTool can strip PDF metadata from the command line: exiftool -all= yourfile.pdf. This does not redact content, but it handles the metadata layer. CocoConvert's PDF conversion tools will strip many common metadata fields during conversion, but we are transparent that this is a side effect of the conversion process, not a purpose-built redaction or sanitization feature. Do not rely on file conversion alone as a redaction strategy.

Mistake #3: Redacting Scanned PDFs Without Checking the Text Layer

Many PDFs are scanned documents that have been processed with optical character recognition (OCR). When a scanner or OCR tool like Adobe Acrobat, ABBYY FineReader, or Google Drive's scan feature processes a page, it creates two layers: the visible scanned image and an invisible text layer positioned underneath. This text layer is what makes scanned documents searchable and copy-pasteable. If you redact only the image layer — for example, by editing the JPEG or PNG that the scan is based on — the text layer may still contain the sensitive words in full. Someone searching the PDF for a social security number or a name could find it even though the image appears blacked out. The safest approach for scanned PDFs is to flatten the document to a pure image (removing the text layer entirely) before or after redaction, then re-apply OCR only to the non-sensitive portions if searchability is needed. In Acrobat Pro, you can flatten a PDF by printing to a PDF printer, which discards the text layer, or by using the Flatten tool in the Print Production menu. Alternatively, use redaction software that explicitly handles OCR text layers. Nuance Power PDF and Kofax Redact both detect and remove text layer content during the redaction process. Always confirm by running pdftotext on the output file and checking that redacted terms do not appear in the result. A 30-second check can prevent a serious disclosure.

Mistake #4: Partial Redaction — Leaving Enough Context to Re-Identify

Redacting specific data points while leaving surrounding context intact can make redaction useless in practice. This is sometimes called the 'mosaic effect' in privacy law: individually innocuous pieces of information combine to identify a person or reveal sensitive details. A real pattern seen in court documents: a filing redacts a witness's name but leaves their job title, employer, city, and the date they gave testimony. In a small organization or specialized field, those four data points narrow the identity down to one or two people. Similarly, redacting a bank account number but leaving the bank name, branch location, and account holder's state is often insufficient. Before finalizing any redaction, read through the document as if you were an adversary with no prior knowledge and ask what you could infer from what remains. This is especially important in medical records, where diagnosis codes, treatment dates, and physician specialties can re-identify a patient even without their name. For structured data like tables, be aware that redacting one cell does not hide the row and column headers that give it meaning. If a table lists 'Employee ID | Salary | Performance Rating' and you redact only the salary column, you have still confirmed that a specific employee ID has a performance rating of 'Below Expectations.' Consider whether the entire row or the entire table needs to be removed. There is no software solution for this — it requires human judgment and a second reviewer who was not involved in the original redaction.

Mistake #5: Redacting PDFs That Were Converted From Word or Excel Without Checking the Source File

When a document originates in Microsoft Word, Excel, or PowerPoint and is exported to PDF, the PDF inherits more than just the visible content. Track Changes revisions, comments, hidden rows and columns in Excel, speaker notes in PowerPoint, and document revision history can all survive the export process and appear in the resulting PDF in ways that are not immediately visible on screen. A particularly dangerous scenario: a lawyer drafts a settlement agreement in Word with Track Changes enabled, showing earlier proposed figures that were negotiated down. The document is exported to PDF and the final numbers are redacted. But the PDF's content stream may still contain the revision markup showing the original higher figures, depending on how the export was handled. The correct workflow is to clean the source file before creating the PDF. In Word, go to Review > Track Changes > Accept All Changes, then use the Document Inspector (File > Info > Check for Issues > Inspect Document) to remove comments, revisions, hidden text, and personal information. Only after the source file is clean should you export to PDF and apply redactions. CocoConvert can convert Word and Excel files to PDF, and during that conversion some revision data is not carried over — but this is not guaranteed for all document types or all versions of Office formatting. We do not position our conversion service as a redaction or sanitization tool. If your document contains Track Changes or hidden content, clean it at the source first.

Building a Reliable Redaction Workflow

Redaction done correctly is a process, not a single action. The following workflow applies whether you are redacting one page or five hundred. First, work on a copy. Never redact your only copy of a document. Keep the unredacted original in a secure, access-controlled location and work exclusively on the copy. Second, use purpose-built redaction software. Adobe Acrobat Pro (approximately $20/month), Foxit PDF Editor Pro, or the free Sejda Desktop application all have genuine redaction functions that remove content from the data layer. For high-volume or legally sensitive work, dedicated tools like Relativity Redact or OpenText Axcelerate are worth the investment. Third, apply redactions and sanitize in sequence. In Acrobat: mark content with the Redact tool, click Apply Redactions, then immediately run Sanitize Document. Do not skip the sanitize step. Fourth, verify the output. Open the redacted PDF in a different application than the one you used to create it — Preview on Mac or a web browser's built-in PDF viewer work well. Try to select and copy text in the redacted areas. Run pdftotext on the command line and grep for any sensitive terms. Check File > Properties for residual metadata. Fifth, have a second person review. Fresh eyes catch what familiarity hides. This is especially important when redacting large documents where pattern fatigue sets in. File conversion services like CocoConvert are useful at the beginning of this workflow — converting source formats to PDF before redaction begins — and potentially at the end, if you need to deliver the final document in a specific format. But the redaction steps themselves require dedicated tools and careful human oversight that no conversion service can substitute for.

← Browse all articles