Skip to content
Back to Blog
how-to-convert

How to Convert XLSX to CSV (Avoid Encoding Headaches)

2026-05-17 8 min read

Why XLSX to CSV Conversions Go Wrong

XLSX and CSV are both tabular formats, but they store data in fundamentally different ways. An XLSX file is a zipped collection of XML documents that can hold multiple sheets, cell formatting, formulas, images, and named ranges. A CSV file is plain text — rows separated by newlines, columns separated by a delimiter (usually a comma). The moment you collapse that rich structure into plain text, things can break. The most common failure points are encoding mismatches, delimiter conflicts, and multi-sheet data loss. Encoding is the biggest culprit. If your spreadsheet contains accented characters (é, ü, ñ), currency symbols (€, £, ¥), or any non-Latin script, the output CSV needs to use the right character encoding — typically UTF-8. If the tool defaults to Windows-1252 or Latin-1 instead, those characters get mangled or replaced with question marks. A customer name like 'José García' becomes 'Jos? Garc?a', and suddenly your CRM import fails. Delimiter conflicts are subtler. A CSV that uses commas as delimiters will break any cell that contains a comma — say, an address field like '14 Oak Street, Suite 200'. Proper CSV writers wrap such values in double quotes, but not every tool does this consistently. And if your data contains double quotes itself, those need to be escaped by doubling them ('He said "hello"' becomes '"He said ""hello"""'). Understanding these failure modes before you convert saves a lot of debugging time downstream.

The Encoding Decision: UTF-8, UTF-8 BOM, or Something Else

Before you convert a single file, decide which encoding your destination system expects. This single decision prevents the majority of post-conversion problems. UTF-8 is the right choice for almost every modern system — databases, Python scripts, web applications, and APIs all handle it cleanly. If you're piping the CSV into pandas with pd.read_csv(), UTF-8 is the default and it just works. Same with PostgreSQL's COPY command and MySQL's LOAD DATA INFILE when you specify CHARACTER SET utf8mb4. UTF-8 with BOM (Byte Order Mark) is a variant that prepends three invisible bytes (0xEF, 0xBB, 0xBF) to the file. Microsoft Excel on Windows reads UTF-8 BOM files correctly and displays accented characters without issue. Without the BOM, Excel often misinterprets a UTF-8 CSV as Windows-1252, garbling anything outside basic ASCII. So: if humans will open the CSV in Excel, use UTF-8 BOM. If software will read it programmatically, use plain UTF-8 — many parsers choke on the BOM. Windows-1252 (sometimes called ANSI) only covers Western European characters. If your data includes Arabic, Chinese, Japanese, Korean, or Eastern European characters like Polish ł or Czech ř, Windows-1252 will corrupt them. Avoid it for any new workflow. ISO-8859-1 (Latin-1) is similar to Windows-1252 and has the same limitations. You might encounter it when working with legacy European data exports. CocoConvert's [XLSX to CSV converter](/convert/xlsx-to-csv) lets you select the output encoding before converting, which is the feature most free tools skip entirely. Pick UTF-8 for programmatic use, UTF-8 BOM if the file is going straight into Excel.

Step-by-Step: Converting XLSX to CSV with CocoConvert

The process is straightforward, but a few settings deserve attention. 1. Go to [CocoConvert's XLSX to CSV page](/convert/xlsx-to-csv) and upload your file. Files up to 50 MB are supported without an account; larger files require a free registration. If your XLSX is bigger than that — some enterprise exports run into hundreds of megabytes — you'll need a desktop tool like LibreOffice Calc or a scripted solution (covered in the next section). 2. Select which sheet to export. XLSX files can contain dozens of sheets; CSV can only represent one at a time. CocoConvert shows a dropdown of all sheet names detected in your file. If you need all sheets as separate CSVs, you'll need to run the conversion once per sheet or use a batch approach. 3. Choose your delimiter. Comma is standard, but if your data contains lots of commas (addresses, descriptions, free-text fields), semicolon is a safer choice for European locales where it's already the convention. Tab-delimited (TSV) is useful when the data contains both commas and semicolons. 4. Set the encoding. As discussed above, UTF-8 or UTF-8 BOM depending on your use case. 5. Decide how to handle formulas. By default, CocoConvert exports the calculated value, not the formula string. A cell containing =SUM(A1:A10) that evaluates to 4500 will appear as 4500 in the CSV. This is almost always what you want. If you need the formula text preserved, that's not a CSV use case — keep it in XLSX. 6. Download and verify. Open the CSV in a text editor (not Excel) first. Notepad++ on Windows or BBEdit on Mac will show you the raw bytes and let you confirm the encoding. Look at a row with special characters and confirm they rendered correctly before importing the file into your target system.

Doing It in Excel and LibreOffice (When You Need Local Control)

Sometimes you need to convert locally — maybe the file contains sensitive data you can't upload to a web service, or it's 200 MB and no online tool will accept it. Both Excel and LibreOffice handle this, with some caveats. In Microsoft Excel (Office 365 / Excel 2019 and later): Open the XLSX file, go to File → Save As, choose a location, and in the 'Save as type' dropdown select 'CSV UTF-8 (Comma delimited) (*.csv)'. This option, finally added in Excel 2016, saves with UTF-8 BOM encoding. Older versions only offered 'CSV (Comma delimited)' which saves as Windows-1252 — a trap that catches a lot of people. Click Save, confirm the 'only the active sheet will be saved' warning, and you're done. Repeat for each sheet you need. In LibreOffice Calc (free, cross-platform): Open the file, go to File → Save a Copy (not Save As, which changes your working file), select 'Text CSV (.csv)' as the format, and click Keep Current Format. A dialog appears with explicit options: Field Delimiter (comma, semicolon, tab, or custom), Text Delimiter (double quote by default — leave it), and Character Set. Select 'Unicode (UTF-8)' from the Character Set dropdown. LibreOffice does not add a BOM by default, which is usually correct for programmatic use. One honest limitation worth noting: neither Excel nor LibreOffice gives you fine-grained control over how NULL values, empty cells, or boolean TRUE/FALSE values are represented in the output. If your pipeline expects empty strings for nulls rather than nothing, or '1'/'0' instead of 'TRUE'/'FALSE', you'll need to clean the CSV afterward or use a scripted approach.

Scripted Conversion with Python (For Repeatable Workflows)

If you're converting XLSX files regularly — say, a weekly data export from an accounting system that feeds a database — a Python script is more reliable than a manual process and runs in seconds. The openpyxl and pandas libraries handle this cleanly. Install them with pip install openpyxl pandas, then: import pandas as pd df = pd.read_excel('report.xlsx', sheet_name='Sales Data', engine='openpyxl') df.to_csv('report.csv', index=False, encoding='utf-8') The sheet_name parameter accepts the sheet's string name or a zero-based integer index. Set index=False to suppress the row numbers pandas adds by default — they'll confuse most import tools. For the encoding parameter: 'utf-8' gives plain UTF-8; 'utf-8-sig' gives UTF-8 with BOM. Use 'utf-8-sig' if the output is going to Excel users. If your XLSX has date columns, pandas will parse them as datetime64 objects. The default CSV output format is 'YYYY-MM-DD HH:MM:SS', which is ISO 8601 and fine for databases. If your target system expects 'MM/DD/YYYY', add date_format='%m/%d/%Y' to the to_csv() call. For large files (50,000+ rows), consider chunking the read: use pd.read_excel() once (openpyxl loads the whole file anyway) but write in chunks if memory is a constraint. Alternatively, xlrd is faster for older .xls files, though it doesn't support .xlsx — always use openpyxl for .xlsx. This approach also lets you handle multi-sheet exports programmatically: xl = pd.read_excel('report.xlsx', sheet_name=None) returns a dictionary of DataFrames keyed by sheet name, which you can then iterate over and write to separate CSV files.

Validating Your CSV Before It Hits Your Database

A CSV that looks fine in Excel can still cause silent failures when imported into a database or data pipeline. A few validation steps catch problems early. Check the row count first. If your XLSX had 12,847 data rows plus a header, the CSV should have 12,848 lines. On Linux/Mac, wc -l report.csv gives you the line count in seconds. A mismatch usually means a cell contained a literal newline character — common in 'notes' or 'comments' fields where users pressed Enter inside a cell. Proper CSV quoting should handle this, but not every tool does it correctly. If you see fewer lines than expected, open the CSV in a hex editor or run a quick Python check: count rows with csv.reader and compare. Spot-check special characters. Find a row in the original XLSX that contains accented characters, currency symbols, or non-Latin text. Open the CSV in a text editor with encoding display (Notepad++ shows it in the status bar; VS Code shows it bottom-right). Confirm the encoding matches what you selected, and visually verify those characters look correct. Test the delimiter. If you used comma as delimiter, search the CSV for any unquoted commas inside data fields. A quick grep -P '(?<!"),[^"\n]*(?<!"),' report.csv (on Linux) can surface rows where quoting may have been missed, though the regex needs tuning for your specific data. Validate data types in a few columns. If a column should be all integers, run a quick check: in Python, pd.read_csv('report.csv')['order_id'].dtype should return int64. If it returns object, something in that column isn't a clean integer — a stray header row, a '#N/A' error that wasn't cleaned up in the XLSX, or a currency symbol that got included in a numeric cell. These checks take five minutes and prevent hours of debugging after a failed database import.

Common Edge Cases and Honest Limitations

A few scenarios come up often enough to address directly. Merged cells: XLSX files frequently use merged cells for visual formatting — a header spanning four columns, for instance. When converted to CSV, only the top-left cell of a merged range retains its value; the rest become empty. There's no clean automated fix for this. You need to unmerge cells in the XLSX first (select all, Format → Cells → Merge & Center → Unmerge) and fill in the values before converting. Multiple header rows: Some reports have two or three rows of headers — a category row, a subcategory row, and then column names. CSV parsers expect exactly one header row. You'll need to manually clean this up before the file is useful in a database or data tool. Formula errors: Cells showing #REF!, #DIV/0!, or #VALUE! in Excel will export as those literal strings in the CSV. If your pipeline isn't expecting them, they'll cause type errors. Clean them in Excel first using Find & Replace or an IFERROR() wrapper. Very large files: CocoConvert handles files up to 50 MB without an account and up to 200 MB with one. Beyond that, the Python/pandas approach or LibreOffice command-line conversion (libreoffice --headless --convert-to csv filename.xlsx) is more practical. Password-protected XLSX files: CocoConvert cannot convert password-protected files — neither can any other online tool without the password. Remove the password in Excel first via File → Info → Protect Workbook → Encrypt with Password, clear the field, and save. Images and charts embedded in sheets: These simply don't transfer to CSV. If the sheet is mostly a chart with a small data table, make sure the data table is in its own sheet or range before converting. The chart itself will be silently dropped with no error message — which can be confusing if you're not expecting it.

Ready to convert?

Try it now — fast, secure, and private.

Convert Now →
How to Convert XLSX to CSV (Avoid Encoding Headaches) | CocoConvert Blog