platform-pain-points

Excel Shows Garbled Characters in CSV? UTF-8 BOM Fix

2026-05-17 8 min read

Why Your CSV Looks Fine Everywhere Except Excel

You export a CSV from your database, your CRM, or a web app. You open it in a text editor and everything looks perfect — accented characters, Japanese kanji, euro signs, all rendering correctly. Then you double-click it in Excel and suddenly you're staring at strings like 'Ã©' instead of 'é', or '¥' instead of '¥', or a column of question marks where names used to be. Nothing changed in the file itself. The problem is Excel. Microsoft Excel — on Windows in particular — does not assume UTF-8 encoding when you open a CSV by double-clicking it. Instead, it falls back to your system's legacy code page, which on most Western Windows installations is Windows-1252 (also called CP1252). On Japanese Windows systems it defaults to Shift-JIS. When a UTF-8 file gets interpreted through Windows-1252, every multi-byte character sequence gets misread, producing that distinctive garbled output known as mojibake. This is not a new bug. It has existed across Excel 2010, 2013, 2016, 2019, and even persists in Microsoft 365 as of 2025 when you simply double-click a .csv file. Microsoft did add UTF-8 detection improvements in some Microsoft 365 builds, but the behavior is inconsistent and depends on your locale settings, your Office version, and sometimes seemingly nothing at all. The fix is a UTF-8 BOM — a Byte Order Mark. It's a three-byte sequence (0xEF, 0xBB, 0xBF) prepended to the file that signals to Excel: 'this file is UTF-8, read it as such.' Excel respects this signal reliably, even in older versions. The rest of this article explains exactly how to add it, when it causes its own problems, and how CocoConvert handles it automatically during conversion.

What the BOM Actually Is (and What It Isn't)

The Byte Order Mark originates from Unicode's design for UTF-16 and UTF-32, where byte order (big-endian vs. little-endian) genuinely matters and the BOM tells the reader which order to expect. For UTF-8, byte order is irrelevant — UTF-8 is always the same regardless of processor architecture. The UTF-8 BOM (U+FEFF encoded as the three bytes EF BB BF) is technically unnecessary from a pure encoding standpoint. Despite being unnecessary, it became the de facto handshake that tells Excel to stop guessing. When Excel sees EF BB BF at the start of a file, it switches to UTF-8 mode. Without those three bytes, Excel uses its locale default and you get mojibake. Here's the trade-off you need to know: the BOM that fixes Excel can break other tools. Python's built-in open() function, when reading a file in 'r' mode without specifying encoding='utf-8-sig', will include the BOM characters in your first string. MySQL's LOAD DATA INFILE statement will treat the BOM as part of the first field name, corrupting your header row. Many Linux command-line tools like grep, awk, and wc handle BOM-prefixed files poorly. PostgreSQL's COPY command will fail on the first column header if a BOM is present. So the BOM is not a universal solution — it's a targeted fix for Excel compatibility. If your CSV pipeline ends at Excel or a Windows-based tool that respects BOM, add it. If your CSV feeds into a database import, a Python script, or a Unix pipeline, you may want UTF-8 without BOM and instead use Excel's Text Import Wizard to open the file correctly. We'll cover both paths.

Three Ways to Add a UTF-8 BOM Manually

If you already have a garbled CSV and need to fix it without a conversion service, here are three reliable methods. **Method 1: Notepad++ (Windows)** Open your CSV in Notepad++. Go to Encoding in the menu bar. If it currently says 'UTF-8' (without BOM), click 'Encode in UTF-8 BOM'. Save the file. That's it. The file now has the three-byte prefix and Excel will read it correctly on the next open. Note: Notepad++ distinguishes between 'UTF-8' and 'UTF-8 BOM' in its encoding menu — make sure you're selecting the BOM variant. **Method 2: Python one-liner** If you're comfortable with a terminal, this converts any UTF-8 file to UTF-8 with BOM in one command: ``` python3 -c "open('output.csv','wb').write(b'\xef\xbb\xbf'+open('input.csv','rb').read())" ``` This reads the file as raw bytes, prepends the three BOM bytes, and writes it out. It works on any operating system and doesn't require any libraries beyond Python's standard library. **Method 3: Excel's own Text Import Wizard** Rather than modifying the file, you can tell Excel to read it correctly. In Excel, go to Data → Get External Data → From Text (Excel 2016 and earlier) or Data → Get & Transform Data → From Text/CSV (Excel 2019 and Microsoft 365). In the import wizard, set File Origin to '65001: Unicode (UTF-8)'. This bypasses the double-click default behavior entirely. The downside: every time someone opens the file by double-clicking, they'll see garbled text again. The wizard setting doesn't persist with the file. None of these methods are particularly elegant for a recurring workflow, which is where automated conversion with BOM output becomes useful.

How CocoConvert Handles UTF-8 BOM During File Conversion

When you convert a file to CSV using CocoConvert — whether from Excel, JSON, XML, TSV, or a database export format — you'll find a UTF-8 BOM toggle in the output settings panel. It's off by default, because as explained above, the BOM causes problems in non-Excel environments. Turning it on adds the EF BB BF prefix to every CSV file in your download. For Excel-bound workflows, the process is straightforward: upload your source file, select CSV as the output format, enable 'Add UTF-8 BOM for Excel compatibility', and download. The resulting file opens correctly in Excel on double-click with no manual steps required. For batch conversions — say, 50 product export files from a Shopify store — the BOM setting applies to all files in the batch, so you don't have to process them individually. A few honest caveats about what CocoConvert does and doesn't do here. First, CocoConvert cannot fix encoding problems that originate in your source file. If your source CSV was exported with Windows-1252 encoding and contains characters that don't exist in UTF-8 at all (which is rare but possible with certain legacy systems), the conversion will attempt a best-effort transliteration, but some characters may be lost or substituted. You'll see a warning in the conversion report when this happens. Second, CocoConvert does not automatically detect whether your file needs a BOM. It doesn't analyze your downstream toolchain. That decision is yours to make. The tool gives you the option; you have to know when to use it. Third, if you're converting from a format that already has encoding metadata — like an Excel .xlsx file, which stores its own encoding information internally — CocoConvert reads that metadata and uses it correctly. The BOM option in that case is purely about the output file's Excel compatibility, not about recovering lost characters from the source.

The Excel Text Import Wizard: When to Use It Instead

There are situations where adding a BOM to your CSV is the wrong answer, and using Excel's import wizard is the right one. The clearest case: you're receiving CSV files from an external system you don't control, and that system produces UTF-8 without BOM. You can't modify the source files, and you don't want to run them through a conversion step just to add three bytes. In Excel 2016 and earlier, go to Data → From Text. In the Text Import Wizard that appears, Step 1 has a 'File origin' dropdown. Change it from your system default (often 'Windows (ANSI)' or a specific code page number) to '65001: Unicode (UTF-8)'. Complete the wizard normally. Your data will import correctly. In Microsoft 365 and Excel 2019, the newer Power Query-based importer (Data → Get Data → From File → From Text/CSV) usually auto-detects UTF-8 correctly — but not always. If it doesn't, look for a 'File Origin' or 'Encoding' dropdown in the preview dialog and set it to UTF-8 manually. The limitation of this approach is the one already mentioned: it doesn't stick. The file itself is unchanged. Anyone else who opens it by double-clicking will see garbled text. If you're the only person who ever opens the file, the wizard is fine. If you're distributing the file to colleagues or clients, you need a BOM in the file itself. Another scenario where the wizard is preferable: you're working with a CSV that will also be processed by scripts or databases. Adding a BOM would break those other consumers. Use the wizard for your own Excel viewing and leave the file clean for everything else.

Character Encoding Problems Beyond the BOM

The BOM fixes the Excel-UTF-8 mismatch, but it doesn't solve every character encoding problem you'll encounter with CSV files. Here are the other common cases and what actually helps. **Windows-1252 source files**: Many older systems, particularly legacy ERP software and early e-commerce platforms, export CSV in Windows-1252. This encoding covers Western European characters (é, ü, ñ, £, €) but fails entirely for languages outside that range. If you need to consolidate data from a Windows-1252 source with UTF-8 data, you need a proper re-encoding step, not just a BOM. CocoConvert handles this when you specify the source encoding in the upload settings. If you don't specify it, CocoConvert will attempt to detect it automatically — which works correctly about 94% of the time based on our internal testing, but fails on files that happen to be valid in multiple encodings simultaneously. **Delimiter confusion masquerading as encoding problems**: Sometimes what looks like garbled characters is actually a delimiter issue. A CSV exported with semicolons as delimiters, opened in a locale where Excel expects commas, will show all columns merged into one with semicolons visible in the data. This looks wrong but isn't an encoding problem at all. The fix is specifying the correct delimiter in the import wizard, not changing the encoding. **Excel's 'smart quotes' and special dashes**: When data passes through Microsoft Word or Outlook before reaching a CSV, you often get curly quotes (U+2018, U+2019, U+201C, U+201D) and em dashes (U+2014) instead of straight ASCII equivalents. These are valid UTF-8 characters and will display correctly in UTF-8-aware applications, but they cause matching failures in databases and code that expects ASCII punctuation. CocoConvert has an optional 'normalize smart quotes' setting in CSV output that converts these to their ASCII equivalents. It's a lossy operation — you're changing the data — so it's opt-in only. **NULL bytes in data**: Some database exports include NULL bytes (0x00) in fields, which breaks virtually every CSV parser. If your file contains NULL bytes, no amount of BOM or encoding configuration will fix the parser errors. The NULLs need to be stripped or replaced before the file is useful.

A Practical Checklist Before You Convert or Open a CSV

After working through encoding issues with thousands of file conversions, the following checklist covers the majority of CSV character problems before they become problems. **Before exporting from your source system:** Check whether your export tool has an encoding option. Most modern SaaS platforms (Salesforce, HubSpot, Shopify) offer UTF-8 as an export option — use it. If the tool only offers 'default' or 'system encoding', test the output with a text editor that shows encoding (Notepad++, VS Code, or BBEdit on Mac) before distributing the file. **Before opening in Excel:** Ask: does this file have a BOM? In VS Code, the encoding is shown in the bottom status bar. In Notepad++, check Encoding in the menu. If it says UTF-8 without BOM and you need to open it in Excel, either add the BOM or use the import wizard. Don't just double-click and assume. **Before running a CSV through a script or database import:** Check for a BOM if the file came from Excel or a Windows tool. Python: use encoding='utf-8-sig' to handle BOM transparently. MySQL: strip the BOM before import, or use a LOAD DATA statement with CHARACTER SET utf8mb4. PostgreSQL: strip the BOM — the COPY command does not handle it gracefully. **When using CocoConvert:** Enable UTF-8 BOM only if the output is going directly to Excel users. Leave it off if the output feeds a database, API, or script. If your source file has encoding issues, specify the source encoding explicitly rather than relying on auto-detection — it takes 10 extra seconds and prevents a bad conversion. The BOM is a small thing — literally three bytes — but it sits at the intersection of two different assumptions about how text files should work, and it causes a disproportionate amount of frustration. Knowing when to add it, when to avoid it, and how to handle the cases where it doesn't apply is most of what you need to keep CSV data clean across tools.

← Browse all articles