Skip to content
Back to Blog
how-to-convert

How to Convert XLSX to CSV (Avoid Encoding Headaches)

2026-05-17 8 min read

Why XLSX to CSV Conversions Go Wrong

They seem so similar. XLSX and CSV are both tables of data, right? But underneath, they are completely different animals. An XLSX file is a complex package—a zip archive of XML documents holding multiple sheets, rich cell formatting, formulas, and even images. A CSV is just text. Rows end with a newline, columns are split by a character like a comma. When you flatten that rich XLSX structure into plain text, you're inviting trouble. The biggest headaches come from encoding mismatches. If your spreadsheet contains any special characters—accented letters like é or ñ, currency symbols like €, or non-Latin text—the output CSV absolutely must use the correct character encoding, which is almost always UTF-8. Many tools default to older encodings like Windows-1252, which will mangle those characters into question marks. A customer name like 'José García' becomes 'Jos? Garc?a', and your CRM import falls over. Delimiter conflicts are another classic problem. If your CSV uses commas to separate columns, what happens to a cell that contains a comma, like an address '14 Oak Street, Suite 200'? A well-behaved CSV writer wraps that value in double quotes. But if the data itself contains double quotes, they must be escaped by doubling them up ('He said "hello"' becomes '"He said ""hello"""'). Knowing where these landmines are buried before you start converting will save you hours of painful debugging.

The Encoding Decision: UTF-8, UTF-8 BOM, or Something Else

Before you convert anything, make one crucial decision: what encoding does your destination system need? Get this right, and you'll sidestep the vast majority of conversion problems. For nearly every modern system, UTF-8 is the answer. Full stop. Databases, Python scripts, web APIs—they all expect and handle UTF-8 perfectly. If you’re piping the CSV into pandas with `pd.read_csv()`, UTF-8 is the default and it just works. The same goes for PostgreSQL's `COPY` command or MySQL's `LOAD DATA INFILE` when you specify `CHARACTER SET utf8mb4`. Then there's UTF-8 with BOM (Byte Order Mark). This variant adds three invisible bytes (0xEF, 0xBB, 0xBF) to the start of the file. Why does it exist? To help Microsoft Excel. On Windows, Excel often assumes a CSV is encoded with Windows-1252 unless it sees this BOM, garbling any special characters. So, here's the simple rule: if a human will open the CSV directly in Excel, use UTF-8 BOM. If a script or program will read it, use plain UTF-8, because that BOM can choke many parsers. Older encodings like Windows-1252 (or ANSI) and ISO-8859-1 (Latin-1) are relics. They only support Western European characters and will corrupt any data from other languages. You might run into them with legacy systems, but you should never choose them for a new workflow. CocoConvert's [XLSX to CSV converter](/convert/xlsx-to-csv) lets you pick your encoding right up front, a critical feature that many free tools inexplicably omit. Just pick UTF-8 for systems and UTF-8 BOM for Excel.

Step-by-Step: Converting XLSX to CSV with CocoConvert

The process is simple, but the devil is in the details. A few settings here will make or break your final output. 1. Head to [CocoConvert's XLSX to CSV page](/convert/xlsx-to-csv) and upload your file. You can process files up to 50 MB without an account, and larger files just require a free registration. If your XLSX is a true monster—some enterprise exports can be hundreds of megabytes—you'll need to use a desktop tool like LibreOffice Calc or a script, which we cover later. 2. Select the sheet to export. An XLSX file can hold many sheets, but a CSV can only represent one. CocoConvert lists all the sheets from your file in a dropdown. If you need to export all sheets, you have to convert them one at a time. 3. Choose your delimiter. The comma is standard, but it's a terrible choice if your data is full of them (think free-text descriptions or addresses). In Europe, the semicolon is a common and often safer alternative. Tab-separated (TSV) is another great option if your text contains both commas and semicolons. 4. Set the encoding. You already know the drill: UTF-8 for programmatic use, UTF-8 BOM if the file is destined for someone's Excel. 5. Decide how to handle formulas. CocoConvert exports the calculated *value* of a cell, not the formula itself. A cell with `=SUM(A1:A10)` that shows `4500` will become `4500` in the CSV. This is exactly what you want 99% of the time. If you actually need the formula text, you shouldn't be converting to CSV in the first place. 6. Download and verify. **Do not open the CSV in Excel to check it.** Trust me on this. Open it in a real text editor first, like Notepad++ on Windows or BBEdit on Mac. These tools show you the raw text and let you confirm the encoding is correct. Find a row with special characters and make sure they look right before you even think about importing it into your target system.

Doing It in Excel and LibreOffice (When You Need Local Control)

Sometimes you have to convert locally. Maybe the file contains sensitive data that can't be uploaded, or it's a 200 MB behemoth that chokes online tools. Both Excel and LibreOffice can get the job done, but each has its own quirks. In Microsoft Excel (Office 365 / Excel 2019+): Open your XLSX, navigate to File → Save As, and in the 'Save as type' dropdown, pick 'CSV UTF-8 (Comma delimited) (*.csv)'. This option, which Microsoft *finally* added in Excel 2016, correctly saves with UTF-8 BOM encoding. Be careful not to choose the plain 'CSV (Comma delimited)' option; it saves using the ancient Windows-1252 encoding, a trap that has ensnared countless developers. Excel will warn you that only the active sheet is saved. Click OK and repeat for any other sheets you need. In LibreOffice Calc (a fantastic, free, cross-platform tool): Open the file, go to File → Save a Copy... (using 'Save As' would change your current file). Select 'Text CSV (.csv)' as the format and confirm you want to Keep Current Format. This brings up a powerful dialog with explicit options. You can set the Field Delimiter, the Text Delimiter (leave it as double quote), and most importantly, the Character Set. Choose 'Unicode (UTF-8)'. LibreOffice correctly generates a plain UTF-8 file without a BOM, making it ideal for programmatic use. Here's the limitation of both tools: neither gives you control over how nulls, empty cells, or booleans are represented. If your downstream process needs empty strings for nulls instead of just nothing, or expects `1`/`0` instead of `TRUE`/`FALSE`, you're out of luck. You'll have to script a cleanup pass on the CSV later or use a more powerful tool from the start.

Scripted Conversion with Python (For Repeatable Workflows)

When you have to convert the same kind of XLSX file over and over—like a weekly report from the accounting system—manual clicks are a recipe for error. A simple Python script is faster, more reliable, and runs in seconds. First, you'll need the `openpyxl` and `pandas` libraries. Install them with `pip install openpyxl pandas`. Then, the code is remarkably clean: ```python import pandas as pd df = pd.read_excel('report.xlsx', sheet_name='Sales Data', engine='openpyxl') df.to_csv('report.csv', index=False, encoding='utf-8') ``` The `sheet_name` parameter can be the sheet's name as a string or its zero-based index number. Don't forget `index=False`; it prevents pandas from writing its own row numbers into the first column, which will definitely confuse most import tools. For the `encoding` parameter, `'utf-8'` creates a standard, BOM-less UTF-8 file. If you know the output is going directly to an Excel user, use `'utf-8-sig'` to include the BOM. Pandas is smart about dates, too, parsing them into `datetime64` objects. It writes them in the ISO 8601 format (`YYYY-MM-DD HH:MM:SS`), which most databases love. If your target system is picky and needs something like `MM/DD/YYYY`, just add the `date_format='%m/%d/%Y'` argument to your `to_csv()` call. For huge files (50,000+ rows), `openpyxl` loads the entire file into memory, so `pd.read_excel()` can be slow. Writing can also be a bottleneck. If you're running into memory limits, you can write the CSV in chunks. And if you're dealing with ancient `.xls` files, the `xlrd` engine is faster, but for modern `.xlsx` files, stick with `openpyxl`. If you have multiple sheets to export, `pd.read_excel('report.xlsx', sheet_name=None)` is a lifesaver. It returns a dictionary where keys are sheet names and values are the DataFrames. You can then loop through this dictionary and save each sheet to its own CSV file—the perfect way to automate a full workbook export.

Validating Your CSV Before It Hits Your Database

A CSV can look perfect in a preview and still shatter your database import. Anyone who has watched an import job fail silently after 20 minutes knows the pain. A few quick validation steps can catch these gremlins early. First, check the row count. If your XLSX had 12,847 data rows and a header, your CSV should have exactly 12,848 lines. On Linux or macOS, `wc -l report.csv` gives you the answer instantly. If the numbers don't match, a cell probably contained a newline character—a common issue in 'notes' fields where users just hit Enter. A proper CSV writer should quote this correctly, but it's a frequent failure point. Next, spot-check your special characters. Find a row in the original XLSX with an accent, a currency symbol, or non-Latin text. Now open the CSV in a text editor that shows the file's encoding (VS Code and Notepad++ both do). Confirm the encoding is what you chose and that the characters rendered correctly. Test the delimiter. If you used a comma, search the raw text for commas that aren't inside quotes. A quick `grep` command on Linux (`grep -P '(?<!\" ),[^\"\\n]*(?<!\" ),' report.csv`) can help find rows where quoting might have been missed, though you may need to tweak the regex for your specific data. Finally, validate a few column data types. If a column is supposed to be all integers, check it. In Python, you can load the CSV and check `pd.read_csv('report.csv')['order_id'].dtype`. It should say `int64`. If it says `object`, you have a problem—a stray text character, a `#N/A` error you missed, or a currency symbol that snuck into a numeric cell. Taking five minutes for these checks will save you hours of debugging a failed data load.

Common Edge Cases and Honest Limitations

Even with the right tools, some XLSX features just don't map cleanly to the simple world of CSV. Here are a few common traps. Merged cells: Excel's merged cells are a visual trick. When you convert to CSV, only the value from the top-left cell of the merged area survives; the other cells become empty. Let's be clear: there is no clean, automated fix for this. You must unmerge the cells in Excel *before* converting (Select All, Format → Cells → Alignment, uncheck Merge cells) and manually fill in the now-empty cells. Multiple header rows: It's common to see reports with two or three header rows—a main category, a subcategory, and then the actual column names. CSV parsers are built for exactly one header row. This is another manual cleanup job; you'll have to simplify the headers in the XLSX or delete the extra rows from the CSV before it's useful. Formula errors: If a cell in Excel shows `#REF!`, `#DIV/0!`, or `#VALUE!`, that exact text is what gets written to your CSV. Those strings will poison any numeric column and cause type errors during import. Clean them up in Excel first, either by fixing the source data or wrapping formulas in an `IFERROR()` function. Very large files: CocoConvert is generous, handling files up to 50 MB for free and 200 MB with an account. For anything larger, an online tool isn't practical. Switch to the Python/pandas method or use LibreOffice's command-line tool (`libreoffice --headless --convert-to csv filename.xlsx`). Password-protected XLSX files: No online tool, including CocoConvert, can open a password-protected file. You have to remove the password yourself in Excel (File → Info → Protect Workbook → Encrypt with Password), save the file, and then upload it. Images and charts: These are silently discarded. They simply don't exist in the CSV format. If your sheet was mostly a chart with a small data table, the chart will vanish into the ether, leaving only the text. It's not an error, just a fundamental limitation of CSV.

Ready to convert?

Try it now — fast, secure, and private.

Convert Now →