> You took on exactly the responsibilities the article author said you'd have to...

sheetjs · on May 15, 2020

It's even worse than the author is suggesting. For most people, "RFC4180" is meaningless, all that matters is what Excel does. And that means you need to handle a bunch of cases if you are reading AND if you are writing files. A few cases not discussed in the blog post:

- if your file starts with \x49\x44 ("ID"), Excel will interpret the file as their symbolic link .SLK format. So if you're writing files, the ID should be wrapped in double quotes even if it isn't necessary according to RFC4180

- Excel will proactively try to "evaluate" fields that start with \x3d ("="). You can see this in action with the sample file

    1,2,3
    =SUM(A1:C1)

- Excel will aggressively interpret values as dates when possible. For example, SEPT1 issues https://genomebiology.biomedcentral.com/articles/10.1186/s13...

CSV parsing / writing certainly isn't going to be a value driver for most companies (if you're supporting user imports, you really care about XLSX/XLSB/XLS files and Google Sheets import), but it's not a trivial problem.

btilly · on May 15, 2020

"What does “malformed” mean? The good/bad thing about CSV is that virtually every text file is valid! The only malformation I can think of is an open quote with no matching close quote (so the entire rest of the document is one value). My implementation is streaming, so there’s no great way to flag it: any future data could have the other quote!*

An example of improper csv code is this:

hello,world,"And this, contains "bad quotes"...",1,2,3

Lest you think this is made up, I ran across this when someone cut and paste Excel into a text field.

I also have seen batch processing of user files break hard when a quote issue like this caused a hand-rolled CSV parser to conclude that half the file was a single very long field.

rovolo · on May 15, 2020

I think you have one of the best use cases for rolling your own parser. Your tool's purpose is to read and parse arbitrary data (and then transform and display it). But, I think you're misinterpreting how much work the post is saying CSV will take to implement:

> Easy right? You can write the code yourself in just a few lines.

My take-away from the post is: if you are parsing arbitrary CSV files, you need to make parsing configurable because there's no one, true CSV format. If you are writing CSV files, you may need to escape your fields in a weird, outdated manner.

P.S. By "malformed", I meant whether the 2D matrix of byte arrays is read exactly as intended. It could be caused by an open quote, but it could be incorrect escaping or inconsistent delimiters. Since there's no inline schema saying which CSV parsing configuration is being used, you must ask the user to configure the CSV parser and validate the output.