Scripting

grep Vs sed Vs awk: what a proficient/advanced Linux Shell User should know

How many times have we used grep to narrow our searches on a Linux FS (File System)? Well, this is a good question since almost everyone (intended as Linux average User) knows grep and its basic features set. To recap: g/re/p stands for globally search a regular expression and print, a name, a manifesto I would say.

The Linux’s ecosystem has two other very useful and powerful tools for patterns search: sed that stands for stream editor, and awk that instead is named by the names of its creators, Aho, Weinberger and Kerningham.

regex
A regex

Given the three, what is the main difference? Which is the best usage for each one of the three? Straight to the point, very good questions that hereafter are answered.

  • grep. A fast and powerful pattern search tool that can be easily combined with other filters to find results and customize the display, even if the main aim is to search for matches. Its main usage consists in narrowing search results by forcing the match with the given pattern.
  • sed. A fast stream editor, able to search for a pattern and apply the given transformations and/or commands; still easy to combine in sophisticated filters, but serving a different aim: modifying the text in the stream. Its main usage consists in editing in-memory a stream according to the given pattern.
  • awk. A loosely typed programming language for stream processing, where the basic unit is the String (intended as an array of characters) that can be i. matched, ii. substituted and iii. worked around; most of the times, it is no really needed to combine awk with other filters, since its reporting capabilities are very powerful (the printf built-in function allows to format the output text as in C). Its main usage consists in perform fine-grained (variables can be defined and modified incrementally) and programmatic manipulations (flow control statements) to the input stream.

According to the above definitions, the three tools serve different purposes, may still be used in combination, and as said work in matching patterns, but, there is still no net difference between sed and awk so let’s try to clarify by examples.

grep

Input Data

total 68
-rw-rw-r--. 1 pmaresca pmaresca 49 Mar 21 20:34 blanks
-rw-rw-r--. 1 pmaresca pmaresca 36257 Mar 22 20:05 commands
-rw-rw-r--. 1 pmaresca pmaresca 79 Mar 20 23:18 json
-rw-rw-r--. 1 pmaresca pmaresca 37 Mar 21 20:44 keyvalue
-rw-rw-r--. 1 pmaresca pmaresca 873 Mar 21 22:51 menu_json
-rw-rw-r--. 1 pmaresca pmaresca 85 Mar 22 18:41 phones
-rw-rw-r--. 1 pmaresca pmaresca 16 Mar 21 19:01 sum
-rw-rw-r--. 1 pmaresca pmaresca 67 Mar 22 18:31 telephones
-rw-rw-r--. 1 pmaresca pmaresca 199 Mar 22 14:21 test

Processing – Take the ‘ls’ output and grep for a pattern ‘b.+s’

 ls -l | grep -E 'b.+s' 

Output Data

-rw-rw-r–. 1 pmaresca pmaresca    49 Mar 21 20:34 blanks

sed

Input Data – ‘phones’

(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217

Processing – take in Input some US numbers and split each one of them in i. Area, ii. Second and iii. Third

 

 sed -e 's/\(^.*)\)\(.*-\)\(.*$\)/Area: \1 Second: \2 Third: \3/g' phones 


Output Data

Area: (555) Second: 555- Third: 1212
Area: (555) Second: 555- Third: 1213
Area: (555) Second: 555- Third: 1214
Area: (666) Second: 555- Third: 1215
Area: (666) Second: 555- Third: 1216
Area: (777) Second: 555- Third: 1217

awk

Input Data – ‘menu_json’

{"menu": {
   "header": "SVG Viewer",
   "items": [
     {"id": "Open"},
     {"id": "OpenNew", "label": "Open New"},
     null,
     {"id": "ZoomIn", "label": "Zoom In"},
     {"id": "ZoomOut", "label": "Zoom Out"},
     {"id": "OriginalView", "label": "Original View"},
     null,
     {"id": "Quality"}, 
     {"id": "Pause"},
     {"id": "Mute"},
     null,
     {"id": "Find", "label": "Find..."},
     {"id": "FindAgain", "label": "Find Again"},
     {"id": "Copy"},
     {"id": "CopyAgain", "label": "Copy Again"},
     {"id": "CopySVG", "label": "Copy SVG"},
     {"id": "ViewSVG", "label": "View SVG"},
     {"id": "ViewSource", "label": "View Source"},
     {"id": "SaveAs", "label": "Save As"},
     null,
     {"id": "Help"},
     {"id": "About", "label": "About Adobe CVG Viewer..."}
  ]
}}

Processing – take in Input the menu data,  extract the IDs, the first value for each one of them, and build a set of Shell Exports

 awk 'BEGIN { sum = 0 }; \

/id/ { sum += 1; gsub(/[\",}]/, ""); sub(/{id:/, "export VAR_"sum"="); \

printf("%s %s%s%s%s\n", $1, $2, "\"", $3, "\"") }; \

END { print "Total", sum }' menu_json 

Output Data

export VAR_1="Open"
export VAR_2="OpenNew"
export VAR_3="ZoomIn"
export VAR_4="ZoomOut"
export VAR_5="OriginalView"
export VAR_6="Quality"
export VAR_7="Pause"
export VAR_8="Mute"
export VAR_9="Find"
export VAR_10="FindAgain"
export VAR_11="Copy"
export VAR_12="CopyAgain"
export VAR_13="CopySVG"
export VAR_14="ViewSVG"
export VAR_15="ViewSource"
export VAR_16="SaveAs"
export VAR_17="Help"
export VAR_18="About"
Total 18

 

As conclusion of this short post, and from above, awk’s capabilities shine: it is programming language with an awkward syntax that allows advanced in-memory modifications and powerful reporting; as seen, sed is able to modify the text, but, it cannot operate programmatically as awk, this poperly makes it a powerful stream editor – to be used like that.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s