What Is EPUB? The Open eBook Standard
The Basics: What EPUB Actually Is
EPUB stands for Electronic Publication, but what it really is is a universal standard for digital books. It's an open format, now managed by the World Wide Web Consortium (W3C) since they took over from the IDPF in 2017. At its heart, an .epub file is just a ZIP archive. Inside, you'll find the building blocks of a modern webpage: HTML for the text, CSS for styling, images, and some XML files to orchestrate how it all comes together in your e-reader. The key difference between EPUB and something like a PDF is its reflowable layout. This is its superpower. The text automatically wraps and resizes to fit any screen, whether it’s a tiny 6-inch Kobo, a spacious iPad, or a huge desktop monitor. As a reader, you have control. You can change the font size, typeface, spacing, and even the background color, and the book just adapts. Because it stores text as actual text—not as static images of words like a scanned PDF—a 400-page novel can be a tiny 500 KB file. The format has evolved over time. EPUB 2, from 2007, laid the groundwork. Then came EPUB 3, first finalized in 2011 and updated as recently as version 3.3 in 2023. This modern version brought in a ton of web technologies: HTML5, CSS3, JavaScript for interactivity, MathML for complex equations, and even embedded audio and video. It also introduced powerful accessibility features like ARIA landmarks. While most modern devices and apps handle EPUB 3 just fine, you'll still find some older e-readers that fall back to EPUB 2 for simple text rendering. Let's get one thing straight: EPUB is not the same as MOBI or AZW3. Those are Amazon's proprietary Kindle formats. Buy a book from Amazon, and you get a file locked to their ecosystem. Buy from Kobo, Google Play Books, Apple Books, or nearly any independent bookstore, and you're getting an EPUB.
Inside an EPUB File: The Structure Explained
Here's a neat trick: take any .epub file, change its extension to .zip, and extract it. What you'll find inside is a perfectly organized folder structure. Right at the top, you'll always see a file named `mimetype`. This tiny file contains just one line—`application/epub+zip`—and it has to be the very first thing in the archive, uncompressed. This lets software instantly recognize the file as an EPUB without having to dig through it. Next, look inside the `META-INF` folder. You'll find a `container.xml` file. Its only job is to point to the main package document, which is usually called `content.opf` or `package.opf`. This OPF file is the book's central nervous system. It's a master list of every content file, it defines the chapter reading order, and it holds all the crucial metadata: title, author, language, ISBN, publication date, and publisher. The book's actual content—the text and images—lives in a folder, usually called `OEBPS` or `Content`. This is where you'll find the individual XHTML files for each chapter, the CSS files that control the book's look and feel, and a directory for images. You'll also spot a `toc.ncx` file (for older EPUB 2 readers) and a `nav.xhtml` file (for modern EPUB 3). These two files power the table of contents you use to jump between chapters on your e-reader. So why does this structure matter? Because if an EPUB is broken, you can often fix it yourself. Anyone who has ever been defeated by a glitchy file knows the frustration. With EPUB, you can pop the hood. Just open the archive, find the bad XHTML file, fix the code in a text editor, and then re-zip it all (remembering to put `mimetype` in first, uncompressed!) before renaming it back to .epub. There's a real satisfaction in that. You can even use free tools like the W3C's EPUBCheck to pinpoint the exact file and line number causing the problem. For developers, this open structure is also what makes EPUB so flexible. Want to add a custom font? Just drop a `.woff2` file into the archive and call it from your CSS using a standard `@font-face` rule.
EPUB vs. PDF: Choosing the Right Format
The EPUB vs. PDF debate is a classic, but it's based on a false premise. They aren't really competitors; they're tools designed for completely different jobs. One isn't 'better' than the other—they just excel in different contexts. PDF is all about preserving a fixed visual layout. Think of academic papers with two columns, glossy magazine spreads, or government forms that need to be filled out. These *must* be PDFs. The page dimensions are locked, fonts are embedded, and the document you see on screen is exactly what will come out of the printer. That static predictability is PDF's entire reason for being. EPUB, on the other hand, prioritizes readability on any screen. Novels, long-form articles, and manuals you need to read on your phone are perfect for EPUB. Its reflowable text means a reader can crank the font up to 24pt for better visibility, and the words simply rearrange themselves to fit. Try that with a PDF, and you're stuck in a nightmare of pinching, zooming, and horizontal scrolling that makes reading impossible. Sometimes, the platform makes the choice for you. Apple Books on iOS and macOS is built for EPUB; while it can *open* a PDF, you lose all the best reading features like font controls, night mode, and cross-device syncing. Amazon's Kindle ecosystem is the opposite. It has completely abandoned native EPUB support. You either have to convert your EPUBs to AZW3 or use the Send to Kindle service, which does the conversion on Amazon's servers. When it comes to accessibility, a well-made EPUB 3 is unbeatable. Screen readers can use the book's semantic HTML structure to navigate by chapter, heading, or landmark. While a 'tagged PDF' can theoretically do this, in the real world, that tagging is often broken or missing entirely. The EPUB Accessibility 1.1 spec gives publishers a clear standard to aim for. The one exception is fixed-layout EPUB. While the format exists, reader support is a minefield. My advice? If you absolutely need pixel-perfect layout, stick with PDF and make it as accessible as you can. Don't try to force EPUB into a role it wasn't built for.
DRM, Distribution, and What 'Open' Actually Means
When we say EPUB is an 'open standard,' it means the blueprint is free for anyone to use. The specification is public, it costs nothing to implement, and no single company owns it. This is why a healthy ecosystem of EPUB apps has flourished. You have a huge range of choices, from power-user tools like Calibre and Thorium Reader to the built-in apps from Apple, Google, and Kobo, plus niche options like Foliate for Linux. But 'open format' does not mean 'DRM-free.' This is a crucial distinction. Publishers and retailers frequently wrap their EPUB files in a layer of Digital Rights Management before selling them. The most common system is Adobe's ADEPT DRM, which you'll find on ebooks borrowed from public libraries via OverDrive or Libby. Kobo and Apple have their own proprietary DRM as well. The resulting file is still an EPUB under the hood, but it’s a locked one that can only be opened on authorized devices with authorized apps. For file conversion, DRM is a brick wall. CocoConvert can easily convert unprotected EPUBs to and from PDF, DOCX, HTML, and other formats. But it cannot, and will not, touch a DRM-protected file. Trying to strip DRM to enable conversion is illegal under laws like the DMCA in the U.S. and the EU Copyright Directive. If you own a book with DRM and want to read it on a different device, your only legal options are to see if the store offers a DRM-free download or to simply use the retailer's designated app. The good news is that DRM-free EPUBs are more common than you might think. Major publishers like Tor Books and O'Reilly have built their reputations on selling DRM-free files. Most open-access academic publishers do, too. You can also find them on stores like Smashwords and Humble Bundle, or by buying directly from an author's website. These are the files you truly own—you can back them up, convert them, and read them in any app you choose, forever.
Creating and Editing EPUB Files
Creating an EPUB from scratch can be as simple or as complex as you want it to be, depending on your tools. For those comfortable with basic HTML, the free and open-source editor Sigil is the classic starting point; it has a visual interface and a built-in validator to catch errors. Self-publishing authors on macOS often swear by Vellum, a paid app that produces beautifully formatted books from templates, though it comes with a hefty $199.99 price tag. And many writers already use Scrivener, which can compile a manuscript directly to EPUB 3 right from its `File > Compile` menu. Developers and technical writers have their own set of powerful tools. Sphinx, the engine behind much of Python's documentation, can generate an EPUB 3 file just as easily as it generates HTML or a PDF. Then there's Pandoc, the Swiss Army knife of document conversion. It can create an EPUB from almost anything—Markdown, DOCX, LaTeX—with a simple command line instruction like `pandoc input.docx -o output.epub --epub-cover-image=cover.jpg`. Editing an existing EPUB is where things get interesting. If you've got a file with wonky formatting or chapters in the wrong order, you can use Sigil to pop the hood. Its Book Browser shows you the entire file structure, letting you dive into the specific XHTML or CSS file to fix the problem directly. Calibre also has a powerful ebook editor that offers similar features. For just tweaking metadata, though, nothing beats Calibre's main interface. Fixing an author's name, adding a series tag, or correcting a publication year is a quick right-click away. It can even fetch correct metadata automatically using an ISBN, which is a massive time-saver. Be warned, however: if you're trying to edit a fixed-layout EPUB, like a children's picture book or a complex magazine layout, you're in for a challenge. These files often use intricate CSS and JavaScript that can't be untangled with a simple visual editor. You'll need a deep understanding of the EPUB spec and web development to make changes without breaking everything.
Converting EPUB Files: What Works and What Doesn't
Converting EPUB files is a common task, but the quality of the result depends entirely on what you're converting from and to. It's not a one-size-fits-all process. Converting from EPUB to PDF is usually a safe bet, especially for text-focused books. A tool like CocoConvert will render the EPUB's content into a clean, paginated PDF that's perfect for printing or archiving novels and reports. The process hits a snag with more complex files. Fancy CSS layouts, non-embedded fonts, and any JavaScript-based interactivity from an EPUB 3 file will be lost in translation to a static PDF. The layout might even break, requiring you to clean it up manually. Turning an EPUB into a DOCX file is the best way to get the text into Microsoft Word for editing. The conversion will preserve the essential structure—headings, paragraphs, bold and italics, basic images—but that's about it. Don't expect fancy CSS, drop caps, or custom layouts to survive the trip. The best way to think about the resulting DOCX file is as a clean, editable draft, not a finished, formatted document. Going from PDF to EPUB is by far the hardest conversion, a true 'your mileage may vary' situation. If the PDF was exported from a text-based source like Word, a converter like CocoConvert can often extract the text cleanly and structure it into a usable EPUB. But if you have a scanned PDF—which is just a collection of images of pages—you're in for a much rougher ride. This requires Optical Character Recognition (OCR) to turn those images back into text, a process that is never perfect. CocoConvert's OCR is good, but its accuracy hinges on the scan quality. Even with a crisp, 300 DPI scan, 98% character accuracy still means dozens of typos in a 300-page book that you'll have to find and fix. Finally, converting HTML to EPUB is usually straightforward, with one big caveat: garbage in, garbage out. If your source is clean, semantic HTML—like a well-structured web article—it will map beautifully into EPUB chapters. If you feed the converter a mess of tangled HTML with inline styles and layouts built from tables, you're going to get a messy, tangled EPUB on the other end.
EPUB Accessibility and the Current State of the Standard
Accessibility is where EPUB 3 truly shines, and it's arguably the format's single most important feature. By building on web standards, it supports semantic HTML5 elements (`nav`, `aside`, etc.), ARIA roles for assistive tech, proper alt text for images, and metadata that defines a logical reading order. This ensures that a screen reader navigates the book as the author intended, not just following the visual layout on the page. This isn't just a loose collection of best practices. The official EPUB Accessibility 1.1 specification (a W3C Recommendation as of May 2023) lays out the concrete requirements. An accessible EPUB must have a complete table of contents, a logical reading order, alt text, and proper color contrast. Conforming publishers can even embed metadata in the file to certify that they meet a specific standard, like WCAG 2.1 AA. In the real world, however, the quality of EPUB accessibility is all over the map. Major academic and trade publishers have gotten much better, thanks to legal and regulatory pressure from things like the Marrakesh Treaty and the European Accessibility Act (which took full effect in June 2025). But a huge number of books, especially from smaller presses and self-published authors, are still released with glaring accessibility holes: missing alt text, no declared reading order, and incomplete navigation. The spec is only as good as its implementation. For readers who need these features, the choice of app matters. On desktop, the free Thorium Reader is the gold standard for accessibility, with excellent support for text-to-speech, sentence highlighting, and navigation by ARIA landmarks. On mobile, Apple Books on iOS does a very good job of honoring accessible EPUB features when used with the VoiceOver screen reader. The work isn't done. The W3C's EPUB working group is still actively developing the standard. Right now, they're focused on improving support for audiobooks, providing clearer guidance on using scripts, and tackling the thorny problem of fixed-layout accessibility. That last one is a particularly tough nut to crack, and the spec doesn't have a perfect solution yet.