For today's post, I'm going to tackle the common task of processing CSV files. My fiancee recently spent a lot of time writing one-off CSV parsers for her work, where she had to read in a file, make a few conditional changes to some values, and write the modifications back out. The canonical way of processing a CSV file without any higher-level assistance goes something like this:
- Read a line of the input file.
- Split it.
- Change one value in the line.
- Change another value in the line, then another, until complete.
- Write the line back out.
But after watching my fiancee's specific issues, what I really wanted to be able to do was process columns of values instead of rows. Something like this:
- Read the entire input file to an in-memory data structure.
- Change all of the values in one column.
- Change the values in another column, etc. until complete.
- Write the file back out.
(Granted, this requires reading the entire file into memory, but for many applications, this is not an issue.)
My intuition was that column-oriented processing would make the business logic clearer and less error-prone in the code. To further this goal, I wanted to be able to do this with Java 8 syntax, closures and the like.
My proof-of-concept code is available here. It uses Apache Commons CSV to read input into a data structure, the "CSVMaster". It contains a list of rows whose values are mapped by the column headers for easy access. Many CSV frameworks support that. What's slightly new is how the row list's iterator and stream are exposed. So you can write code like this:
// Change each zip value to a default zip+4.
master.forEach(row -> row.set("zip", row.get("zip") + " -0000"));
// Get all of the rows from a specific zip code.
specialRows = master.stream().filter(row -> row.get("zip").startsWith("95610"))
Provided this turns out to be useful and usable, I will continue to enhance and extend it. Additional thoughts welcome.