csvkit, jq, command line
Enjoy a lean way to deal with csv files…
Do you enjoy open csv files in Excel on Mac? I am not. I’ve never enjoyed open a giant csv file in Excel on Mac. It is just a nightmare. As a developer, for most of the cases, I just do very basic browsing, filtering, sorting, etc, all I need is a handy tool to help me get the job done.
Give it a quick go
- Download the data
Go to https://lmip.gov.au/ and download the latest employment data.
2. Convert to csv, rename column names you want to
in2csv --sheet "SA4 based regions" ER\ -\ 2019\ Employment\ Projections\ by\ Industry.xlsx | tee region.csv
3. Check the csv file is correct
$> csvclean region.csv
No errors.# errors can be found in _err.csv file, if there is any
1,"Expected 5 columns, found 6 columns",...
4. Check columns and its type
$> csvstat region.csv --type
1. ER Name: Text
2. State/Territory: Text
3. Industry: Text
4. ('000): Number
5. (%): Number
5. Have a quick look
$> csvlook region.csv | less -S
| ER Name | State/Territory | Industry
| Capital Region | NSW/ACT | Mining
| Capital Region | NSW/ACT | Manufacturing
...
6. Have a quick stat
$> csvstat region.csv
1. "ER Name"Type of data: Text
Contains null values: True (excluded from calculations)
Unique values: 46
Longest value: 37 characters
Most common values: Capital Region (20x)
Central West (20x)
Far West Orana (20x)
Hunter (20x)
Illawarra South Coast (20x)
...other columns...
Wanna more insights
Get certain columns and convert to json
$> csvcut region.csv -c 4 | csvjson -i 2 | head
[
{
"('000)": 0.066585541
},
{
"('000)": 0.152634913
},
{
"('000)": 0.12452649
},
Search by regions and compare the difference
$> csvgrep -c 1 -m Ballarat region.csv | csvstat -c 4,5 --median
4. ('000): 0.118
5. (%): 4.09
$> csvgrep -c 1 -m "Inner Metropolitan Melbourne" region.csv | csvstat -c 4,5 --median
4. ('000): 2.594
5. (%): 6.928
The most common usage for a Developer
in2csv data.xlsx > data.csv
quickly convert an excel to a csv without open the file in Excelcsvclean data.csv
to check if there is an error in the csv filecsvformat -U 1 data.csv
double quote every cellcsvlook data.csv | head
quick look a csv in a pretty waycsvcut -n data.csv
show all column namescsvstat data.csv
the quickest way to get stats of all columnsjq -c ”[.]” a.json
convert json array to json newlinein2csv a.geojson -f geojson
convert geojson to csvin2csv a.ndjson -f ndjson
convert json newline to csv