CSV Merge

About

This tool was designed to merge datafiles with only one or two columns, but millions of lines. It uses ruby's CSV parsing, so is compatible with most CSV intricacies.

How it Merges

The algorithm used to merge files is quite naive, but works quickly so long as the source file fits in RAM.

  1. Load file 1 into RAM, indexed by key
  2. For every entry in file 2, merge the line and output to the output file

This means that it'll only output a file as long as the second input file, and it'll use as much RAM as the size of the first file. Both of these suited my original purpose just fine, but are not necessarily desirable. It also merges based on a single key only, and doesn't support compound keys.

Download

Download the tool and get merging!

Instructions

The script takes four arguments:

  1. Input file 1 ("base file")
  2. Input file 2 ("augmentation file")
  3. Output file
  4. Key Field

For example: ./merge_csv.rb sample_orig.csv sample_addit.csv sample_output.csv key.