Skip to content
Back to Blog
informational

What Is File Metadata? (And Why You Should Strip It Before Sharing)

2026-05-17 9 min read

What File Metadata Actually Is

Every file you create carries two layers of information: the content you can see, and a hidden layer of descriptive data that most software writes automatically. That hidden layer is metadata — structured information about the file itself rather than the file's subject matter. The word comes from the Greek prefix 'meta', meaning 'about'. So metadata is, literally, data about data. A JPEG photo contains pixel data you can view, but it also contains EXIF metadata: the camera model, lens focal length, shutter speed, ISO setting, and — critically — the GPS coordinates where the shot was taken, accurate to within a few meters. A Word document stores your name, your company name (pulled from your Office license), the total editing time in minutes, and a revision history that can include deleted text. Metadata formats vary by file type. Images typically use EXIF (Exchangeable Image File Format) and IPTC (International Press Telecommunications Council) standards. PDFs use XMP (Extensible Metadata Platform) and their own internal document properties. Office files like DOCX and XLSX store metadata in an XML file called core.xml, tucked inside the ZIP container that makes up the .docx format. Audio files use ID3 tags. Video files use a mix of container-level metadata (MOV, MP4) and codec-level data. None of this is written maliciously. Software engineers build metadata in because it serves real purposes: photo management apps use EXIF data to sort pictures by date and location; document management systems use author fields to track ownership; audio players use ID3 tags to display album art. The problem arises when files travel outside the context they were created in.

The Specific Data Fields That Can Expose You

Not all metadata is equally sensitive. Knowing that a file was saved at 96 DPI is harmless. But several common metadata fields carry real privacy and security implications. GPS coordinates in photos are the most dramatic example. When you take a photo on an iPhone with Location Services enabled for the Camera app, iOS writes your latitude and longitude into the EXIF GPS tags. A photo posted publicly with those tags intact lets anyone with a free tool like ExifTool or Jeffrey's Exif Viewer pinpoint where you live, work, or regularly visit. In 2012, a Vice journalist located John McAfee's hiding place in Guatemala partly by examining GPS metadata in a photo published with an interview. Author and organization fields in Office documents are pulled from your software license or system settings. If you draft a contract in Microsoft Word, the file's core.xml will contain your full name and often your company name under the <dc:creator> and <cp:lastModifiedBy> tags. Send that document to a counterparty in negotiations and they know exactly who wrote the first draft. Revision history and tracked changes in Word documents can expose deleted text, comments, and the names of every person who edited the file. Law firms have accidentally sent opposing counsel documents with tracked changes revealing their internal strategy. For PDFs, the XMP block often contains the software used to create the file (which can reveal your operating system version and patch level — useful to an attacker), the original author, and sometimes the source file path, which might look like: C:\Users\sarah.johnson\Documents\ClientProposals\AcmeCorp_draft3.pdf. That path alone confirms an employee's name and internal folder structure. Finally, thumbnail previews embedded in some RAW image formats and older Office files can contain a rendered preview of the document at an earlier stage — meaning deleted content might still be visible in the thumbnail even after you removed it from the main file.

Who Actually Reads File Metadata (And How)

You might assume that reading metadata requires technical skill. It doesn't. Free, widely available tools make it trivial. ExifTool, written by Phil Harvey, runs on Windows, macOS, and Linux and can read metadata from over 100 file formats. The command 'exiftool filename.jpg' dumps every readable tag to the terminal in seconds. There are also GUI wrappers for non-technical users. Browser-based tools like Jimpl.com or MetaPicz let anyone upload a photo and see its EXIF data without installing anything. For Office documents, you can simply rename a .docx file to .zip, open it, and navigate to word/document.xml or docProps/core.xml to read the raw XML. No special software required — just a text editor. Journalists routinely check metadata on documents received from sources or leaked to them. Lawyers and e-discovery professionals use metadata as evidence in litigation — courts have accepted EXIF timestamps as evidence of when a photo was actually taken, overriding claims made by the person who submitted it. Corporate intelligence analysts examine metadata on files obtained through OSINT (open-source intelligence) to map organizational structures and software environments. Law enforcement uses metadata extensively. In the early 2000s, the BTK serial killer was caught partly because a floppy disk he sent to police contained a deleted Word document whose metadata pointed to 'Christ Lutheran Church' and a user named 'Dennis' — Dennis Rader, the killer. The point is not to be alarmist. Most people sharing a recipe photo or a project proposal face no serious threat. But the risk scales with the sensitivity of the content and the audience receiving it. A freelancer sending a portfolio PDF to a client they've never met is in a different position than someone sharing a personal photo on a public forum.

How to Strip Metadata Before You Share

There are several approaches, each with trade-offs in convenience and thoroughness. **For images on Windows:** Right-click the file, select Properties, go to the Details tab, and click 'Remove Properties and Personal Information' at the bottom. You can choose to create a sanitized copy or remove properties from the original. This handles most EXIF fields but does not always remove all XMP data. **For images on macOS:** Preview does not strip metadata well. The reliable option is ImageOptim (free, open-source), which removes EXIF, IPTC, and XMP data while also compressing the file. Alternatively, export from Photos.app with location data disabled under Photos > Preferences > iCloud > 'Include location information for published items' turned off. **For Word and Excel files:** Go to File > Info > Check for Issues > Inspect Document. The Document Inspector will scan for and offer to remove comments, revisions, personal information, hidden data, and embedded content. Run it before every external send. Note that removing revision history is irreversible — keep your own copy with history intact if you need it. **For PDFs:** Adobe Acrobat Pro (paid) has a Redact > Sanitize Document function that strips metadata, hidden layers, embedded content, and scripts. It's the most thorough option for PDFs. If you don't have Acrobat, printing to a new PDF via macOS's built-in PDF printer removes most metadata, though not always all of it. **Using CocoConvert:** When you convert a file through CocoConvert — say, converting a DOCX to PDF, or a JPEG to PNG — the conversion process itself discards most format-specific metadata because we're rebuilding the file in a new container. EXIF GPS data, Word author fields, and revision history do not survive a format conversion intact. This is a practical side effect of conversion, not a dedicated privacy tool, and we want to be clear about that distinction: CocoConvert is not a metadata sanitizer, and you should not rely on conversion alone as your only privacy measure for highly sensitive documents. For those, use the dedicated tools described above first, then convert if needed.

What Conversion Does (and Doesn't) Remove

Because CocoConvert is a file conversion service, it's worth being specific about what happens to metadata during a typical conversion job. When you convert a JPEG to PNG, the output file is a new PNG encoded from the pixel data. PNG has its own metadata chunks (tEXt, iTXt, zTXt), but they are not automatically populated from the source EXIF. In practice, GPS coordinates, camera model, and lens data do not appear in the converted PNG. The same applies to JPEG-to-WebP conversions. When you convert a DOCX to PDF, the resulting PDF is generated from the rendered content of the document. The author field in the PDF's XMP metadata will typically reflect the conversion software, not the original Word document author. Tracked changes and revision history are not included because the PDF represents a final rendered state. However, there are important caveats. If your source file contains embedded images with their own EXIF data, and the output format supports embedded images (as PDFs do), those embedded images may retain their original metadata. A PDF created from a Word document that contained inserted photos might still carry GPS data from those photos inside the PDF container. Also, some metadata is semantic rather than format-specific. If your document's text contains your address, phone number, or other identifying information, no conversion process will remove that — it's content, not metadata. For audio files, converting MP3 to AAC will not carry ID3 tags from the source unless the conversion tool explicitly copies them. CocoConvert does not copy ID3 tags by default. The honest summary: conversion through CocoConvert reduces metadata exposure significantly for most common use cases, but it is not a substitute for explicit metadata removal when the stakes are high.

Metadata in Professional and Legal Contexts

If you work in law, finance, healthcare, or any regulated industry, metadata is not just a privacy concern — it can be a compliance issue. Under HIPAA, protected health information (PHI) includes metadata that could identify a patient. A medical image with GPS coordinates pointing to a specific clinic, combined with a patient name in the EXIF artist field, constitutes PHI even if the image content itself is anonymized. The HHS Office for Civil Rights has issued guidance specifically noting that metadata must be considered when de-identifying records. In legal proceedings, metadata is discoverable. Federal Rule of Civil Procedure 34 in the US explicitly covers electronically stored information (ESI), and courts have consistently held that metadata is part of that ESI. If you're involved in litigation and you alter or strip metadata from documents after a litigation hold is in place, that can constitute spoliation of evidence — a serious legal problem. For journalists and their sources, metadata hygiene is a physical safety issue. The Freedom of the Press Foundation maintains SecureDrop partly because it strips metadata from submitted documents before they reach reporters. The New York Times, Washington Post, and Guardian all use similar systems. If you are a source submitting sensitive documents, assume that any file you send carries identifying metadata unless you have explicitly removed it. In corporate M&A due diligence, metadata on documents exchanged in data rooms can reveal negotiating positions, internal valuations, and the identities of advisors — information that sophisticated counterparties will look for. Major law firms now include metadata review as a standard step in document preparation for transactions. For most people, the professional stakes are lower. But the principle is the same: know what your files contain before they leave your control.

A Practical Checklist Before You Share Any File

Rather than memorizing every format-specific rule, a short checklist covers the majority of situations. **1. Identify the file type and its metadata risks.** Photos carry GPS and camera data. Office documents carry author, revision, and comment data. PDFs carry creation software, author, and sometimes source paths. Audio files carry ID3 tags. Video files carry GPS, device model, and creation timestamps. **2. Assess your audience.** Sending a photo to a close friend is different from posting it publicly or sending it to someone you're in a dispute with. Scale your effort to the actual risk. **3. Use the right tool for the job.** For images, use Windows' built-in property remover or ImageOptim on Mac. For Office files, run the Document Inspector. For PDFs, use Acrobat's Sanitize function or re-print to PDF. For bulk jobs or format changes, CocoConvert's conversion process will incidentally remove most format-specific metadata as a byproduct. **4. Verify the output.** After stripping or converting, check the result. On Windows, right-click > Properties > Details. On Mac, open in Preview and go to Tools > Show Inspector > EXIF. Use ExifTool from the command line for a complete dump: 'exiftool -all filename.jpg'. Don't assume the strip worked — confirm it. **5. Remember that content is not metadata.** If your document text, image content, or audio recording contains sensitive information, no metadata tool addresses that. Review the actual content independently. **6. For high-stakes situations, use dedicated tools.** MAT2 (Metadata Anonymisation Toolkit 2) is an open-source tool used by security professionals that handles dozens of file formats and is more thorough than most consumer options. It's available on Linux and via the Tails operating system, which is designed for high-risk use cases. Metadata is not inherently dangerous — it's a feature that became a liability when files started traveling far beyond the environments they were created in. Understanding what your files carry, and taking thirty seconds to remove it before sharing, is a small habit with a disproportionate effect on your privacy.

What Is File Metadata? (And Why You Should Strip It Before Sharing) | CocoConvert Blog