informational

What Is EPUB? The Open eBook Standard

2026-05-17 9 min read

The Basics: What EPUB Actually Is

EPUB — short for Electronic Publication — is an open standard file format for digital books and publications, maintained by the World Wide Web Consortium (W3C) after they absorbed the International Digital Publishing Forum (IDPF) in 2017. The format uses the .epub file extension and is, at its core, a ZIP archive containing HTML files, CSS stylesheets, images, and a handful of XML metadata files that tell reading software how to assemble everything. What makes EPUB distinct from, say, a PDF is its reflowable layout. The text wraps and resizes to fit whatever screen it lands on — a 6-inch Kobo, a 10-inch iPad, or a desktop app running on a 27-inch monitor. Readers can change the font size, typeface, line spacing, and background color without breaking the document. A 400-page novel in EPUB format might weigh only 500 KB because it stores text as actual text, not as rendered images of text the way a scanned PDF does. The format has gone through three major versions. EPUB 2, released in 2007, established the foundation. EPUB 3, finalized in 2011 and significantly updated through EPUB 3.3 in 2023, added support for HTML5, CSS3, JavaScript interactivity, MathML for equations, audio and video embedding, and robust accessibility features including ARIA landmarks. Most modern e-readers and reading apps support EPUB 3, though some older devices still default to EPUB 2 rendering for basic content. One thing worth clarifying early: EPUB is not the same as MOBI or AZW3, which are Amazon's proprietary formats. If you buy a book from Amazon, you get a Kindle-specific file. If you buy from Kobo, Google Play Books, Apple Books, or most independent stores, you almost certainly get an EPUB.

Inside an EPUB File: The Structure Explained

Rename any .epub file to .zip, extract it, and you'll find a predictable directory structure. There's always a file called mimetype at the root — it contains the single line application/epub+zip and must be the first file in the archive, uncompressed, so software can identify the format without reading the whole thing. Inside the META-INF folder sits container.xml, which points to the package document (usually named content.opf or package.opf). That package document is the spine of the whole operation: it lists every content file in the book, declares the reading order, and stores Dublin Core metadata — title, author, language, identifier (often an ISBN), publication date, and publisher. The actual content lives in a folder that's commonly named OEBPS or Content. Inside you'll find individual XHTML files for each chapter, a CSS file or two for styling, and an images directory. There's also a toc.ncx file (for EPUB 2 compatibility) and a nav.xhtml file (the EPUB 3 navigation document) that generates the table of contents readers see when they tap the TOC button on their device. This structure matters for a practical reason: if an EPUB is corrupted or misbehaving, you can often fix it manually. Open the archive, locate the malformed XHTML file, correct the markup in a text editor, repackage the ZIP (making sure mimetype goes in first, uncompressed), and rename it back to .epub. Tools like EPUBCheck — a free Java-based validator from the W3C — can identify exactly which file and line number contains an error before you even attempt a fix. For publishers and developers, understanding this structure also explains why EPUB is so extensible. Adding a custom font is as simple as dropping a .woff2 file into the archive and referencing it in CSS with a standard @font-face declaration.

EPUB vs. PDF: Choosing the Right Format

The EPUB-versus-PDF question comes up constantly, and the honest answer is that they solve different problems. Neither format is universally better. PDF preserves exact visual layout. A two-column academic paper, a magazine spread with precise image placement, a tax form with fillable fields — these belong in PDF. The page dimensions are fixed, fonts are embedded or outlined, and what you see on screen is exactly what prints. That predictability is the whole point. EPUB prioritizes readability across variable screen sizes. A novel, a how-to guide, a corporate report meant to be read on a phone — these benefit from reflowable text. A reader using a large-print setting on their Kindle can bump the font to 24pt and the text reflows cleanly. The same document in PDF would require horizontal scrolling or zooming that makes reading genuinely uncomfortable. There are cases where the choice is made for you. Apple Books on iOS and macOS reads EPUB natively and does not support PDF with the same feature set — no font customization, no night mode syncing, no reading position sync across devices. Amazon Kindle, on the other hand, dropped native EPUB support entirely; you need to convert to AZW3 or send a PDF, or use Amazon's Send to Kindle service which converts the file server-side. For accessibility, EPUB 3 has a clear advantage when implemented correctly. Screen readers like NVDA or VoiceOver can navigate by heading, chapter, or landmark using semantic HTML structure. A tagged PDF can achieve similar results, but the tagging is often incomplete in practice. The EPUB Accessibility 1.1 specification (also a W3C standard) provides a formal checklist publishers can follow and certify against. One limitation worth noting: fixed-layout EPUB exists as a format option, but support is inconsistent across reading systems. If you need pixel-perfect layout with EPUB, test extensively on your target devices before distributing.

DRM, Distribution, and What 'Open' Actually Means

EPUB being an open standard means the specification is publicly available, free to implement, and not controlled by a single company. Any developer can write an EPUB reader or creator without paying licensing fees. This is why there are dozens of capable EPUB reading applications: Calibre, Thorium Reader, Apple Books, Google Play Books, Kobo's app, Adobe Digital Editions, Foliate on Linux, and many others. However, 'open format' does not mean 'DRM-free.' Publishers and retailers regularly apply Digital Rights Management to EPUB files before distribution. Adobe's ADEPT DRM is the most common in the library and independent retailer ecosystem — it's what you encounter when you borrow an ebook from a public library through OverDrive or Libby. Kobo and Apple use their own DRM systems. A DRM-protected EPUB is still technically an EPUB, but it can only be opened in authorized applications on authorized devices. From a conversion standpoint, DRM is a hard wall. CocoConvert can convert unprotected EPUB files to and from PDF, DOCX, HTML, and other formats. It cannot process DRM-protected files, and attempting to remove DRM to enable conversion would be a legal violation under laws like the U.S. Digital Millennium Copyright Act and the EU Copyright Directive. If you have a DRM-protected EPUB you legitimately own and want to read on a different device, the correct path is to check whether the retailer offers a DRM-free version, or use the retailer's own app ecosystem. DRM-free EPUBs are more common than many readers realize. Publishers like Tor Books, O'Reilly (for technical titles), and most academic open-access publishers distribute without DRM. Stores like Smashwords, Humble Bundle, and many author direct-sales sites sell DRM-free by default. These files can be freely converted, backed up, and read in any application.

Creating and Editing EPUB Files

If you need to create an EPUB from scratch, the tools range from beginner-friendly to developer-grade. Sigil is a free, open-source EPUB editor with a visual interface and a built-in EPUBCheck validator — it's a reasonable starting point for anyone who knows basic HTML. Vellum is a macOS-only paid application favored by self-publishing authors for its polished output and template system, though it costs $199.99 for unlimited exports. Scrivener can compile manuscripts directly to EPUB 3, with settings accessible under File > Compile > Output Format. For technical documentation, Sphinx (the Python documentation generator) can output EPUB 3 alongside HTML and PDF. Pandoc, the universal document converter, handles EPUB creation from Markdown, DOCX, LaTeX, and several other formats via command line — a typical invocation looks like pandoc input.docx -o output.epub --epub-cover-image=cover.jpg. Editing an existing EPUB is where things get more nuanced. If you receive an EPUB with formatting errors, missing metadata, or incorrect chapter order, Sigil lets you open the file, navigate to the specific XHTML or CSS file causing the problem, edit it directly, and save. The built-in Book Browser panel shows the full file structure. Calibre's ebook editor (separate from its conversion engine) offers similar functionality with a cleaner interface for metadata editing. For metadata specifically — fixing an author name, adding a series tag, correcting the publication year — Calibre's metadata editor (accessed via Edit Metadata in the right-click menu) is faster than any other tool. It can also fetch metadata automatically from ISBN lookups against databases like Open Library and Amazon. One realistic limitation: if you receive a fixed-layout EPUB built for a children's picture book or a heavily designed magazine, editing it requires understanding both the EPUB spec and the specific CSS and JavaScript used to create the layout. It's not a task for a simple GUI editor.

Converting EPUB Files: What Works and What Doesn't

EPUB conversion is one of the most requested file operations, and the results vary significantly depending on the source and target format. EPUB to PDF is generally reliable for text-heavy content. CocoConvert handles this conversion by rendering the EPUB's HTML content with consistent typography and generating paginated PDF output. The result works well for novels, reports, and documentation. Where it gets complicated is with complex CSS layouts, custom fonts that aren't embedded, or EPUB 3 content that uses JavaScript for interactivity — that interactivity simply doesn't translate to a static PDF, and the layout may shift in ways that require manual cleanup. EPUB to DOCX is useful when you want to edit the content in Microsoft Word. The conversion preserves headings, paragraphs, bold and italic formatting, and basic image placement. It does not preserve advanced CSS styling, drop caps, custom fonts, or sidebar layouts. Treat the output as an editable draft, not a finished document. PDF to EPUB is the hardest direction. A PDF created from a Word document (with actual text data) converts reasonably well — CocoConvert can extract the text and structure it into EPUB chapters. A scanned PDF, or a PDF created by flattening a design file, contains no text data at all, only images of text. Converting that to EPUB requires OCR (optical character recognition), which introduces errors and requires proofreading. CocoConvert applies OCR to scanned PDFs during conversion, but the accuracy depends on scan quality, font clarity, and language. For a cleanly scanned English-language document at 300 DPI or higher, expect 95–98% character accuracy — which still means errors in a 300-page book. HTML to EPUB works well when the HTML is clean and semantic. If you're converting a web article or a documentation site, the structure maps naturally to EPUB chapters. Messy HTML with inline styles and table-based layouts produces messy EPUB output.

EPUB Accessibility and the Current State of the Standard

Accessibility is arguably EPUB 3's most important advancement over its predecessor and over PDF as a format. The specification supports semantic HTML5 elements (nav, aside, figure, section), ARIA roles and landmarks, alt text for images, extended descriptions for complex visuals, and reading order metadata that ensures screen readers follow the intended sequence rather than the visual layout. The EPUB Accessibility 1.1 specification, which became a W3C Recommendation in May 2023, formalizes what an accessible EPUB must include: a complete table of contents, logical reading order, text alternatives for non-text content, and sufficient color contrast. Publishers who certify conformance include a metadata field in the package document declaring which WCAG level they meet — typically WCAG 2.1 AA. In practice, accessibility quality varies enormously. Academic publishers and large trade publishers have improved significantly under pressure from the Marrakesh Treaty (which mandates accessible formats for print-disabled readers in signatory countries) and from the European Accessibility Act, which came into full effect in June 2025 and applies to ebooks sold in EU markets. Smaller publishers and self-published authors often produce EPUBs with missing alt text, no logical reading order declaration, and navigation documents that don't cover all content. For readers who rely on assistive technology, Thorium Reader on Windows and macOS has the most complete EPUB accessibility support, including TTS (text-to-speech) with sentence highlighting, DAISY support, and proper ARIA landmark navigation. Apple Books on iOS also handles accessible EPUBs well when VoiceOver is active. The W3C's EPUB working group continues active development. The current focus areas include better support for audiobooks (a separate but related EPUB format), improved scripting guidance, and clarifications around fixed-layout accessibility — a genuinely hard problem that the spec hasn't fully solved yet.

← Browse all articles