platform-pain-points

Excel Shows Garbled Characters in CSV? UTF-8 BOM Fix

2026-05-17 8 min read

Why Your CSV Looks Fine Everywhere Except Excel

You export a CSV from your database or CRM. You pop it open in a text editor, and it's perfect. Accented characters, Japanese kanji, euro signs—all present and correct. Then you double-click to open it in Excel, and chaos. You're staring at garbled strings like 'Ã©' instead of 'é', or '¥' instead of '¥', or a whole column of question marks. The file itself hasn't changed. The problem is Excel. When you open a CSV by double-clicking, Microsoft Excel—especially on Windows—doesn't assume it's UTF-8. It falls back to your system's old-school legacy code page. For most people in the West, that's Windows-1252 (also CP1252). For users in Japan, it's Shift-JIS. When a UTF-8 file gets forced through a Windows-1252 interpretation, every character that uses more than one byte gets mangled, producing the gibberish known as mojibake. This isn't some new bug. It's a long-standing frustration that has plagued Excel 2010, 2013, 2016, 2019, and still pops up in Microsoft 365 as of 2025. If you just double-click a plain UTF-8 CSV, you're rolling the dice. While Microsoft has added some better UTF-8 detection in recent M365 builds, the behavior is wildly inconsistent, depending on your locale, Office version, and sometimes, it seems, the phase of the moon. The reliable fix is a UTF-8 BOM—a Byte Order Mark. It's a special, invisible three-byte sequence (0xEF, 0xBB, 0xBF) at the very beginning of the file that acts as a signal to Excel, saying 'Hey! This file is UTF-8, so read it that way.' Excel respects this signal, even in older versions. The rest of this article explains how to add it, when *not* to add it, and how CocoConvert can handle it for you.

What the BOM Actually Is (and What It Isn't)

The Byte Order Mark originally came from the world of UTF-16 and UTF-32, where byte order (big-endian vs. little-endian) is a real concern. The BOM tells a program which order the bytes are in. But for UTF-8, byte order is a non-issue; it's always the same. So, from a purely technical standpoint, the UTF-8 BOM (the character U+FEFF encoded as three bytes: EF BB BF) is completely unnecessary. It's unnecessary, but it became the secret handshake that gets Excel to behave. When Excel sees those three bytes at the start of a file, it immediately switches into UTF-8 mode. Without them, it defaults to its regional settings, and you get that familiar mojibake. Here's the catch: the BOM that fixes Excel can break a lot of other software. This is the part that trips up so many automated data pipelines. Python's standard `open()` function, if you forget to specify `encoding='utf-8-sig'`, will read the BOM as part of your first data field. MySQL's `LOAD DATA INFILE` statement will think the BOM is part of the first column's name, corrupting your header. Many classic Linux command-line tools like `grep`, `awk`, and `wc` just don't handle BOM-prefixed files well. PostgreSQL's `COPY` command is even stricter and will fail right on the first column header. My rule of thumb is simple: only add a BOM if you know the file's final destination is a user double-clicking it in Excel. If your CSV is headed for a database import, a Python script, or a Unix pipeline, you want clean UTF-8 *without* a BOM. You can still open it correctly in Excel, you just have to use the Text Import Wizard, which we'll cover.

Three Ways to Add a UTF-8 BOM Manually

If you're stuck with a garbled CSV and need to fix it right now, you don't need a fancy service. Here are three reliable ways to add the BOM yourself. **Using Notepad++ on Windows:** This is often the quickest fix. Open your CSV in Notepad++. Go to the `Encoding` menu. You'll likely see it's already set to 'UTF-8'. That's the problem—it's UTF-8 *without* the BOM. Click the option to 'Encode in UTF-8 BOM' and then save the file. Done. The file now has the magic three-byte prefix and Excel will open it correctly. **With a Python one-liner:** If you're comfortable in a terminal, this single command is a powerful way to convert any UTF-8 file to UTF-8 with a BOM. It works on any OS with Python 3. ``` python3 -c "open('output.csv','wb').write(b'\xef\xbb\xbf'+open('input.csv','rb').read())" ``` This command reads your `input.csv` as raw bytes, sticks the three BOM bytes onto the front, and writes everything to `output.csv`. No extra libraries needed. **With Excel's own Text Import Wizard:** Instead of changing the file, you can just tell Excel how to read it properly. Go to `Data → Get & Transform Data → From Text/CSV` (in modern Excel) or `Data → Get External Data → From Text` (in older versions). The key step is finding the 'File Origin' setting in the import dialog and changing it to `65001: Unicode (UTF-8)`. This forces Excel to use the right encoding. The downside is a big one: this fix is temporary and only applies to your import session. The next person to double-click the file will see the same garbled mess. None of these manual methods are great for a repeatable process. That's where automating the conversion, with the BOM as an option, really starts to make sense.

How CocoConvert Handles UTF-8 BOM During File Conversion

When you use CocoConvert to change a file into a CSV—whether it's coming from Excel, JSON, XML, or something else—we give you direct control over this. In the output settings, you'll find a 'Add UTF-8 BOM for Excel compatibility' toggle. We leave it off by default, because as we've seen, the BOM can cause as many problems as it solves in non-Excel environments. But if you need it, just flip the switch. For any workflow that ends with someone in accounting opening a file, the process is simple. Upload your source file, pick CSV for the output, enable the BOM toggle, and download. The resulting CSV will open perfectly in Excel with a simple double-click, no manual import wizard needed. This setting also applies to batch conversions, so if you have 50 product export files from a Shopify store, you can process them all at once and have them all Excel-ready. It's important to be clear about what our tool does and doesn't do. CocoConvert can't magically fix encoding problems that were baked into your source file. If a legacy system gives you a CSV already broken by a bad Windows-1252 export, we'll try our best to transliterate it, but some data might be lost. You'll get a warning if that happens. We also don't guess whether you need a BOM; that's your call, based on where the file is going. The tool provides the option, but you have to know your own workflow. Finally, if you're converting a format that already knows its encoding, like an XLSX file, we read that information correctly. The BOM toggle in that case is purely about making the *output* CSV compatible with Excel, not about fixing the source.

The Excel Text Import Wizard: When to Use It Instead

Sometimes, adding a BOM to your CSV is the wrong move, and Excel's own import wizard is the right one. The most common scenario is when you're getting CSVs from an outside system you don't control. If that system generates clean UTF-8 files *without* a BOM, you shouldn't have to run them all through a separate tool just to add three bytes. In Excel 2016 and older versions, navigate to `Data → From Text`. When the Text Import Wizard launches, the first step has a 'File origin' dropdown. You need to change this from the default (usually 'Windows (ANSI)') to `65001: Unicode (UTF-8)`. After that, complete the wizard as usual, and your data will show up correctly. In Microsoft 365 and Excel 2019, the path is `Data → Get Data → From File → From Text/CSV`. This newer Power Query importer is better at auto-detecting UTF-8, but it's not perfect. If the preview looks wrong, find the 'File Origin' or 'Encoding' dropdown in the dialog and manually set it to UTF-8. The major limitation, as we've mentioned, is that this fix doesn't stick. The file itself remains unchanged. If you email it to a colleague, they'll double-click it and see the same garbled text. The wizard is a great tool if you're the only one touching the file. If you're distributing it, you really need to embed the BOM in the file itself. The wizard is also the correct choice when your CSV needs to be clean for other processes, like a database import, but you just need a quick look in Excel.

Character Encoding Problems Beyond the BOM

Fixing the UTF-8 BOM issue solves the most common Excel character problem, but it's far from the only encoding headache you'll encounter with CSVs. Here are a few other culprits to watch out for. **Windows-1252 source files**: Many older systems, especially legacy ERPs and first-generation e-commerce platforms, still export data in Windows-1252. This encoding handles Western European characters like é, ü, and ñ just fine, but it completely falls apart for any languages outside that set. If you're trying to merge this data with a UTF-8 source, you need a true re-encoding step, not just a BOM. CocoConvert can handle this if you specify the source encoding, or it will try to auto-detect it—which our tests show works about 94% of the time. The failures happen with files that are technically valid in multiple encodings at once. **Delimiter confusion**: Anyone who has spent an hour debugging an "encoding" problem only to discover it was a semicolon instead of a comma knows this pain. If a CSV uses semicolons as delimiters but your Excel locale expects commas, all the data will get crammed into the first column. It looks like a garbled mess, but it's not an encoding issue. The fix is to use the import wizard and specify the correct delimiter. **Excel's 'smart quotes' and special dashes**: When data has passed through Microsoft Word or Outlook, it often picks up curly "smart" quotes and long em dashes. These are valid UTF-8 characters and look fine in most modern apps, but they will break database queries and scripts that expect simple ASCII punctuation. CocoConvert offers an optional 'normalize smart quotes' feature for CSV output that replaces them with their plain ASCII versions. It's a destructive change to your data, so we make it opt-in. **NULL bytes in data**: Some database exports can embed NULL bytes (0x00) into text fields. These are an absolute showstopper for nearly every CSV parser on the planet. No amount of encoding wizardry will fix a file with NULL bytes; they must be stripped out or replaced before the file can be used.

A Practical Checklist Before You Convert or Open a CSV

After wrestling with encoding issues across thousands of file conversions, we've found this checklist helps catch the vast majority of CSV character problems before they start. **Before you export from a source system:** Look for an encoding option. Modern platforms like Salesforce, HubSpot, and Shopify all let you choose UTF-8 for exports. Use it. If the only option is 'default' or 'system encoding', be suspicious. Pop the output file into a text editor like VS Code or Notepad++ that shows the encoding before you send it to anyone. **Before you open a CSV in Excel:** Ask yourself: does this file have a BOM? In VS Code, the encoding is right there in the status bar. In Notepad++, check the Encoding menu. If it says 'UTF-8' and you need to use Excel, your choices are to add a BOM yourself or use the import wizard. Never just double-click and hope for the best. **Before you feed a CSV to a script or database:** Be on the lookout for a BOM, especially if the file came from a Windows user. In Python, using `encoding='utf-8-sig'` is the cleanest way to handle it automatically. For MySQL, you'll need to strip the BOM before import or use a `LOAD DATA` statement that specifies `CHARACTER SET utf8mb4`. For PostgreSQL, just strip it; the `COPY` command is not forgiving. When using CocoConvert, remember the rule: enable the UTF-8 BOM toggle only if you know the file is going directly to an Excel user who will double-click it. For any other destination—a database, an API, a script—leave it off. If you suspect your source file has issues, take the extra ten seconds to specify its encoding explicitly. It's much faster than fixing a bad conversion. The BOM is a tiny thing—just three bytes. But it sits right at the fault line between different assumptions about how text files should work, causing a disproportionate amount of frustration. Knowing when to use it, when to avoid it, and how to work around it is the key to keeping your CSV data flowing cleanly between tools.

← Browse all articles