CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed

CS 497C – Introduction to UNIXLecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Changchang@cs.twsu.edu

Substitution • sed’s strongest feature is substitution, achieved with its s (substitute) command. • It has the following format: [address]s/expression1/string2/flag • This is how you replace the | with a colon: $ sed ‘s/|/:/g’ emp.lst | head -2 • To check whether substitution is performed, you can use the cmp command as follows: $ sed ‘s/|/:/g’ emp.lst | cmp -l - emp.lst | wc -l

Substitution • You can perform multiple substitutions with one invocation of sed by pressing [Enter] at the end of each instruction, and then close the quote at the end: $ sed ‘s/<I>/<EM>/g > s/<B>/<STRONG>/g’ form.html • You can compress multiple spaces as below: $ sed ‘s^ *|^|^g’ emp.lst | head -2

Substitution sed ‘/dirctor/s/director/member/’ emp.lst sed ‘/dirctor/s//member/’ emp.lst • The above command suggests that sed ‘remembers’ the scanned pattern, and stores it in // (2 frontslashes). • The // representing an empty (or null) regular expression is interpreted to mean that the search and substituted patterns are the same. This is called the remembered pattern.

Substitution • When a pattern in the source string also occurs in the replaced string, you can use the special character & to represent it. sed ‘s/director/executive director/’ emp.lst sed ‘s/director/executive &/’ emp.lst • These two commands are same. The &, known as the repeated pattern, expands to the entire source string.

Regular Expressions • The interval regular expression (IRE) uses the escaped pair of curly braces {} with a single or a pair of numbers between them. • We can use this sequence to display files which have write permission set for group: $ ls -l | grep “^.\{5\}w” • The regular expression ^.\{5\}w matches five characters (.\{5\}) at the beginning (^) of the line, followed by the pattern (w).

Regular Expressions • The \{5\} signifies that the previous character (.) has to occur five times. The . (dot) character is used to match any character. • The IRE has three forms: • ch\{m\} – The metacharacter ch can occur m times. • ch\{m,n\} – ch can occur between m and n times. • ch\{m,\} – ch can occur at least m times.

Regular Expressions • We can display the listing for those files that have the write bit set either for group or others: $ ls –l | grep “^.\{5,8\}w” • To locate the people born in 1945 in the sample database, use sed as follows: $ sed –n ‘/^.\{49\}45/p’ emp.lst • The tagged regular expression (TRE) uses $ and $ to enclose a pattern.

Regular Expressions • Suppose you want to replace the words John Wayne by Wayne, John. The sed substitution instruction will then look like this: $ echo “John Wayne” | sed ‘s/$John$ $Wayne$/\2, \1/’ • Because the TRE remembers a grouped pattern, you can look for these repeated words like this: $ grep “\[a-z][a-z][a-z]*\) *\1” note

Regular Expressions • These are pattern matching options used by grep, sed, and perl (Page 441): • abc : match the character string “abc”. • * : zero or more occurrences of previous character. • . : match any character except newline. • .* : nothing or any number of characters. • a? : match zero or one instance “a”. • a* : match zero or more repetitions of “a”.

Regular Expressions • [abcde] : match any character within the brackets. • [a-b] : match any character within the range a to b. • [^abcde] : match any character except those within the brackets. • [^a-b] : match any character except those in the range a to b. • ^ : match beginning of line, e.g., /^#/. • ^$ : lines containing nothing.

Regular Expressions • $ : match end of line, e.g., /money.$/. • a\{2\} : match exactly two repetitions of “a”. • a\{4,\} : match four or more repetitions of “a”. • a\{2, 4\} : match between two and four repetitions of “a”. • $exp$: expression exp for later referencing with \1, \2, etc. • a|b : match a or b.

CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed