Saturday, April 6, 2013

Linux scripts: we use grep, sed, awk, sort, uniq, pipe

Tab delimited file with 3d column consisting of comma-separated integers.
Goal: using bash script
pick all records matching pattern=PATTERN
count all occurrences of each integer in the file
sort desc by occurrences

Highlights:
>tr -s ',' '\n' ==or== sed 's/,/\n/g'
>sort -r -n -b ----- reverse, numeric, omit trailing blanks
>uniq  ------- for counting pre-sorted records
> | -------- piping output from one process into another (all interim files are not necessary, everything could be piped through!

#!/bin/bash
grep PATTERN | sed 's/\t*PATTERN\t*,/\t/g' > matching_records.csv
cat matching_records.csv | awk -F '\t' '{print $2}' > csv_column.csv
cat csv_column.csv | tr -s ',' '\n' | sort -n | uniq -c | sort -r -n -b

No comments:

Post a Comment