Tab delimited file with 3d column consisting of comma-separated integers.
Goal: using bash script
pick all records matching pattern=PATTERN
count all occurrences of each integer in the file
sort desc by occurrences
Highlights:
>tr -s ',' '\n' ==or== sed 's/,/\n/g'
>sort -r -n -b ----- reverse, numeric, omit trailing blanks
>uniq ------- for counting pre-sorted records
> | -------- piping output from one process into another (all interim files are not necessary, everything could be piped through!
#!/bin/bash
grep PATTERN | sed 's/\t*PATTERN\t*,/\t/g' > matching_records.csv
cat matching_records.csv | awk -F '\t' '{print $2}' > csv_column.csv
cat csv_column.csv | tr -s ',' '\n' | sort -n | uniq -c | sort -r -n -b
Labels
lifestyle
(68)
food
(60)
recipes
(43)
gadgets
(33)
society
(30)
diet
(27)
fashion
(21)
share
(19)
consumer review
(17)
sex
(12)
breakfast
(10)
childfree
(10)
dessert
(10)
soup
(9)
salad
(8)
cookware
(7)
popcorn
(7)
travel
(7)
gluten-free
(6)
plofl
(6)
sur la table
(6)
amazon
(5)
fear eats soul
(5)
fish
(5)
linux
(5)
vegan
(5)
vegetarian
(5)
advertising
(4)
baking
(4)
brunch
(4)
cats
(4)
garlic
(4)
local
(4)
shell scripting
(4)
3D printing
(3)
9 lives minus one
(3)
9 minus one
(3)
angry birds
(3)
austerity
(3)
de Buyer
(3)
dinner
(3)
dinnerware
(3)
dofd
(3)
education
(3)
eggs
(3)
films
(3)
meat
(3)
preview
(3)
williams-sonoma
(3)
beauty
(2)
bible
(2)
cascade
(2)
party
(2)
terror
(2)
things the Government doesn't want you to know
(2)
toys
(2)
AWS
(1)
art
(1)
bread
(1)
breasts
(1)
butter
(1)
casserole
(1)
costco
(1)
cult
(1)
iceland
(1)
le creuset
(1)
music
(1)
parve
(1)
plot from the news
(1)
potato
(1)
sed
(1)
shoes
(1)
time travel
(1)
tomcat
(1)
toys wooden puzzle
(1)
woman as an object
(1)
women
(1)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment