csvkit, jq, command line

Add csvkit to your Terminal toolbox

Jamie Wen
2 min readJul 29, 2020

Enjoy a lean way to deal with csv files…

Do you enjoy open csv files in Excel on Mac? I am not. I’ve never enjoyed open a giant csv file in Excel on Mac. It is just a nightmare. As a developer, for most of the cases, I just do very basic browsing, filtering, sorting, etc, all I need is a handy tool to help me get the job done.

Photo by Mika Baumeister on Unsplash

Give it a quick go

  1. Download the data

Go to https://lmip.gov.au/ and download the latest employment data.

2. Convert to csv, rename column names you want to

in2csv --sheet "SA4 based regions" ER\ -\ 2019\ Employment\ Projections\ by\ Industry.xlsx | tee region.csv

3. Check the csv file is correct

$> csvclean region.csv
No errors.
# errors can be found in _err.csv file, if there is any
1,"Expected 5 columns, found 6 columns",...

4. Check columns and its type

$> csvstat region.csv --type
1. ER Name: Text
2. State/Territory: Text
3. Industry: Text
4. ('000): Number
5. (%): Number

5. Have a quick look

$> csvlook region.csv | less -S
| ER Name | State/Territory | Industry
| Capital Region | NSW/ACT | Mining
| Capital Region | NSW/ACT | Manufacturing
...

6. Have a quick stat

$> csvstat  region.csv
1. "ER Name"
Type of data: Text
Contains null values: True (excluded from calculations)
Unique values: 46
Longest value: 37 characters
Most common values: Capital Region (20x)
Central West (20x)
Far West Orana (20x)
Hunter (20x)
Illawarra South Coast (20x)

...other columns...

Wanna more insights

Get certain columns and convert to json

$> csvcut region.csv -c 4 | csvjson -i 2 | head
[
{
"('000)": 0.066585541
},
{
"('000)": 0.152634913
},
{
"('000)": 0.12452649
},

Search by regions and compare the difference

$> csvgrep -c 1 -m Ballarat  region.csv | csvstat -c 4,5 --median
4. ('000): 0.118
5. (%): 4.09

$> csvgrep -c 1 -m "Inner Metropolitan Melbourne" region.csv | csvstat -c 4,5 --median
4. ('000): 2.594
5. (%): 6.928

The most common usage for a Developer

  • in2csv data.xlsx > data.csv quickly convert an excel to a csv without open the file in Excel
  • csvclean data.csv to check if there is an error in the csv file
  • csvformat -U 1 data.csv double quote every cell
  • csvlook data.csv | head quick look a csv in a pretty way
  • csvcut -n data.csv show all column names
  • csvstat data.csv the quickest way to get stats of all columns
  • jq -c ”[.]” a.json convert json array to json newline
  • in2csv a.geojson -f geojson convert geojson to csv
  • in2csv a.ndjson -f ndjson convert json newline to csv

--

--