Skip to content
Back to Blog
informational

What Is File Metadata? (And Why You Should Strip It Before Sharing)

2026-05-17 9 min read

What File Metadata Actually Is

Every file you create has two parts: the content you see, and a secret layer of data most software adds automatically. This hidden layer is metadata—structured information about the file, not what's in the file. The name comes from the Greek 'meta', meaning 'about'. So it’s literally data about data. A JPEG photo isn't just pixels; it also contains EXIF metadata detailing the camera model, lens, shutter speed, ISO, and—most importantly—the exact GPS coordinates where you took it. A Word document quietly stores your name, your company (from your Office license), how many minutes you spent editing, and even a history of deleted text. The format changes depending on the file type. Images often use EXIF (Exchangeable Image File Format) and IPTC (International Press Telecommunications Council) standards. PDFs have their own properties and also use the XMP (Extensible Metadata Platform). Office files like DOCX and XLSX are actually ZIP archives, and they store metadata in an XML file called core.xml inside. Audio files use ID3 tags to hold album art and track info. Video files use a mix of container-level (MOV, MP4) and codec-level data. None of this is a conspiracy. Software engineers add metadata for good reasons: photo apps use it to sort your pictures by date and location, and your music player needs ID3 tags to show album art. The trouble starts when these files leave the nest and travel far beyond their original context.

The Specific Data Fields That Can Expose You

Let's be clear: not all metadata is a problem. Knowing a file was saved at 96 DPI is useless trivia. But some common metadata fields have serious privacy and security consequences. GPS coordinates in photos are the most famous example. Take a photo on your iPhone with Location Services turned on, and iOS embeds your precise latitude and longitude into the file's EXIF tags. Post that photo online, and anyone with a free tool like ExifTool or Jeffrey's Exif Viewer can figure out where you live, work, or hang out. This isn't hypothetical. In 2012, a Vice journalist found John McAfee's hideout in Guatemala partly by analyzing the GPS data in a photo published with an interview. Author and organization fields in Office files are pulled from your software license. If you draft a contract, the file's internal XML will list your full name and company. Send that to the other side in a negotiation, and they know exactly who wrote the first draft and when. Revision history and tracked changes are another landmine, potentially exposing deleted text, private comments, and the names of every editor. Law firms have famously sent opposing counsel documents with their entire strategy accidentally revealed in the tracked changes. For PDFs, the XMP block can contain the software used to create the file (revealing your OS and patch level to an attacker), the author, and even the original file path, like `C:\Users\sarah.johnson\Documents\ClientProposals\AcmeCorp_draft3.pdf`. That path alone gives up an employee's name and your internal folder structure. And don't forget embedded thumbnail previews. In some RAW images and older Office files, these can show a snapshot of the document from an earlier stage, meaning content you thought you deleted might still be visible.

Who Actually Reads File Metadata (And How)

You might think reading metadata requires some kind of hacker skill. It doesn't. Free, common tools make it ridiculously easy. ExifTool by Phil Harvey is the gold standard; it runs on every major OS and reads metadata from over a hundred file formats. Just type 'exiftool filename.jpg' in a terminal, and you'll see everything. For those who prefer a GUI, there are wrappers and browser-based tools like Jimpl.com or MetaPicz that let you upload a photo and see its data instantly. For Office documents, it's even simpler. You don't need special software at all. Just rename a .docx file to .zip, open the archive, and browse to the docProps/core.xml file with a plain text editor. The raw data is right there. So who is actually looking? More people than you'd think. Journalists check every document they get from sources. Lawyers use metadata as courtroom evidence; EXIF timestamps have been used to prove when a photo was really taken, torpedoing a witness's testimony. Corporate spies use it to map out a competitor's organization. Law enforcement relies on it heavily. The BTK serial killer was identified in the early 2000s partly because a floppy disk he sent to police contained metadata in a deleted Word document that pointed to 'Christ Lutheran Church' and a user named 'Dennis'—the killer, Dennis Rader. This isn't meant to be alarmist. Most people sharing a recipe aren't in danger. But the risk grows with the sensitivity of the content. A freelancer sending a portfolio to a new client has a different risk profile than someone sharing a family photo on a private chat.

How to Strip Metadata Before You Share

Let's get practical. Here’s how you can strip metadata from your files, platform by platform. It’s easier than you think. **For images on Windows:** Right-click the file, go to Properties, then the Details tab. At the bottom, click 'Remove Properties and Personal Information'. This lets you create a clean copy and handles most EXIF data, though it can sometimes miss XMP tags. **For images on macOS:** Do not rely on the built-in Preview app; it's notoriously bad at this. The best choice is ImageOptim, a free and open-source tool that thoroughly removes EXIF, IPTC, and XMP data while also compressing the file. Alternatively, you can export from the Photos app, but first make sure you've disabled location data under Photos > Preferences > iCloud > 'Include location information for published items' turned off. **For Word and Excel files:** Before you send anything externally, get in the habit of going to File > Info > Check for Issues > Inspect Document. The Document Inspector will find and offer to remove comments, revisions, author information, and other hidden data. This is essential. Just be aware that removing revision history is permanent, so save a master copy for yourself if you need it. **For PDFs:** Anyone who has fought with a stubborn PDF knows they have a life of their own. For metadata, the most robust solution is the Redact > Sanitize Document function in the paid Adobe Acrobat Pro. If you don't have Acrobat, a decent workaround is to 'print' the file to a new PDF, as macOS’s built-in PDF printer strips most, but not always all, metadata. **Using CocoConvert:** When you convert a file using CocoConvert, like turning a DOCX into a PDF or a JPEG into a PNG, the process naturally sheds most of the original metadata. We're building a brand new file, so things like EXIF GPS data and Word author fields don't get carried over. Think of it as a beneficial side effect, not a dedicated security feature. For truly sensitive files, use a dedicated sanitizer first. We're a conversion tool, not a forensics tool, and we want to be honest about that.

What Conversion Does (and Doesn't) Remove

Since CocoConvert is a file converter, let's be specific about what happens to metadata during a job. When you convert a JPEG to a PNG, we create a new PNG file from the source pixels. PNG has its own way of storing metadata (in tEXt, iTXt, and zTXt chunks), but we don't copy the original EXIF data into them. In practice, this means the GPS coordinates, camera model, and lens info from your JPEG are gone in the final PNG. The same is true for JPEG-to-WebP conversions. When you convert a DOCX to a PDF, we're generating the PDF from the final, rendered look of the document. The author field in the new PDF will usually list the conversion software, not the original author from Word. All your tracked changes and revision history are flattened and gone, because a PDF only represents that one final state. But there are a few traps. The big one is embedded files. If your source Word document contains an inserted photo with its own EXIF data, that photo might retain its metadata when it's embedded inside the final PDF. So the PDF could still contain GPS data from that one picture. Also, this should be obvious, but we'll say it anyway: conversion doesn't remove sensitive information from the content of your file. If your address is typed out in the document, it will still be there. That's content, not metadata. For audio files, converting an MP3 to AAC with CocoConvert does not copy the ID3 tags by default. The bottom line: conversion with CocoConvert is a great first line of defense that significantly reduces metadata exposure for everyday use. Just don't mistake it for a dedicated, high-security sanitization tool.

Metadata in Professional and Legal Contexts

If you work in law, finance, healthcare, or another regulated industry, metadata isn't just a privacy thought experiment—it's a compliance minefield. Under HIPAA, for instance, metadata can be part of protected health information (PHI). A medical scan might have GPS data pointing to a clinic, and an artist tag with the patient's name. That combination is PHI, even if the image itself is anonymized. The HHS Office for Civil Rights specifically requires that metadata be considered when de-identifying records. In legal proceedings, metadata is fully discoverable. Federal Rule of Civil Procedure 34 in the US covers all electronically stored information (ESI), and courts have repeatedly confirmed that metadata is part of that ESI. If you're told to preserve documents for litigation and you strip metadata, that is spoliation of evidence. It's a disastrous mistake that can lose you the case. For journalists and their sources, this isn't theoretical; it's a matter of physical safety. This is why tools like SecureDrop, used by the Freedom of the Press Foundation, The New York Times, and The Guardian, exist—they automatically strip metadata from submissions to protect sources. If you are a source, you must assume every file you send is tagged with your identity unless you have personally scrubbed it. In the world of corporate mergers and acquisitions, metadata in a data room can reveal negotiating strategies, private valuations, and advisor identities. Smart counterparties absolutely look for this information. Major law firms now treat metadata review as a mandatory step for any transaction. For most of us, the professional stakes are lower. But the principle is the same: know what your files are saying about you before they leave your control.

A Practical Checklist Before You Share Any File

You don't need to memorize every obscure rule. For 99% of situations, this practical checklist is all you need before you hit 'send' or 'upload'. **1. Identify the file type and its metadata risks.** Just remember the big ones. Photos can have GPS data. Office documents can have author and revision history. PDFs can have author data and creation paths. Audio files carry ID3 tags. Video files carry GPS, device model, and creation timestamps. **2. Assess your audience.** Who is this for? Sending a family photo to your mom is low-risk. Posting a photo to a public forum or sending a proposal to a new client is higher risk. Match your effort to the actual threat. **3. Use the right tool for the job.** On Windows, use the built-in property remover or ImageOptim on Mac. For Office files, run the Document Inspector. For PDFs, use Acrobat's Sanitize function or re-print to PDF. For bulk jobs or format changes, CocoConvert's conversion process will incidentally remove most format-specific metadata as a byproduct. **4. Verify the output.** After stripping or converting, check the result. On Windows, right-click > Properties > Details. On Mac, open in Preview and go to Tools > Show Inspector > EXIF. Use ExifTool from the command line for a complete dump: 'exiftool -all filename.jpg'. Don't assume the strip worked — confirm it. **5. Remember that content is not metadata.** This is critical. No tool will remove your social security number if you've typed it into the document. That's a content problem, and you need to review the visible parts of your file separately. **6. For high-stakes situations, use dedicated tools.** MAT2 (Metadata Anonymisation Toolkit 2) is an open-source tool used by security professionals that handles dozens of file formats and is more thorough than most consumer options. It's available on Linux and via the Tails operating system, which is designed for high-risk use cases. Metadata isn't evil. It's a useful feature that became a liability when our files started traveling the globe at the click of a button. Understanding what your files carry—and taking 30 seconds to clean them before sharing—is a small habit that dramatically improves your privacy.