Wei Guang's Blog

Handling the BOM (<U+FEFF>) in CSV

Recently, one of my colleagues encountered an issue while trying to parse a CSV into a JSON response. They noticed that there was an invisible character at the beginning of every key.

To replicate the issue, let's assume we have two CSV files with the same content: demo1.csv and demo2.csv.

We also have a basic Express setup for reading and parsing CSV files. In this example, I'm using csv-parser to parse the CSV into JSON:

const express = require('express');
const csvParser = require('csv-parser');
const fs = require('fs');

const app = express();
const port = 3000;

app.get('/csv1', (req, res) => {
  const results = [];
  fs.createReadStream('./demo1.csv')
    .pipe(csvParser())
    .on('data', (data) => results.push(data))
    .on('end', () => {
      res.send(results);
    });
});

app.get('/csv2', (req, res) => {
  const results = [];
  fs.createReadStream('./demo2.csv')
    .pipe(csvParser())
    .on('data', (data) => results.push(data))
    .on('end', () => {
      res.send(results);
    });
});

app.listen(port, () => {
  console.log('Running...');
});

When you request http://localhost:3000/csv1, it works fine. However, if you visit http://localhost:3000/csv2, you will notice that each key has a dot in front of it.

Let's copy the response into an editor to see what's actually there:

Finally, we see what the issue is: a <U+FEFF> character appears before each key. So, why do the responses of the two routes differ? What's wrong with demo2.csv?

Troubleshooting

Upon further investigation, we discovered that the encodings of demo1.csv and demo2.csv are different. demo2.csv is encoded in UTF-8.

The invisible character <U+FEFF> is also known as a Byte Order Marker (BOM), which is used to indicate that a file is in UTF-8 encoding.

Solution

Now that we know this, we can easily handle the BOM by replacing it with an empty string.

For the example above, a possible workaround is as follows:

// ...

app.get('/csv2', (req, res) => {
  const results = [];
  fs.createReadStream('./demo2.csv')
    .pipe(
      csvParser({
        mapHeaders: ({ header }) => header.replace(/^\uefeff/, ''), // <- Add this
      })
    )
    .on('data', (data) => results.push(data))
    .on('end', () => {
      res.send(results);
    });
});

Alternatively, you can import an external library like strip-bom-stream:

const fs = require('fs');
const csv = require('csv-parser');
const stripBom = require('strip-bom-stream');

fs.createReadStream('data.csv')
  .pipe(stripBom())
  .pipe(csv());

Good night.