Grep global regular expresion print
1 / 23

grep (Global REgular expresion Print) - PowerPoint PPT Presentation

  • Uploaded on

grep (Global REgular expresion Print). Operation Search a group of files Find all lines that contain a particular regular expression pattern Write the result to an output file grep returns to the prompt with no extra output when it is done Syntax: grep [-cilLnrsvwx] pattern [list of files]

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'grep (Global REgular expresion Print)' - nida

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Grep global regular expresion print
grep (Global REgular expresion Print)

  • Operation

    • Search a group of files

    • Find all lines that contain a particular regular expression pattern

    • Write the result to an output file

    • grep returns to the prompt with no extra output when it is done

  • Syntax: grep [-cilLnrsvwx] pattern [list of files]

  • Examples

    • find information about the user, harley>grep harley /etc/passwd

    • Find all lines in the files containing the string xxx .

      >grep xxx .

Grep flags
grep Flags

  • -c count the number of matches

  • -i Ignore case when searching for matches

  • -l List the file names containing matches

  • -L list files that do not have a match

  • -n Write the line number in front of each line

  • -r perform a recursive directory search

  • -s suppress warning and error messages

  • -v search for lines without the matching pattern

  • -w search only for complete words

  • -x only match lines that exactly match the pattern

Regular expressions
Regular Expressions

Note: Many UNIX programs use these (vi, sed, more, grep, awk)

  • Industry standard way to specify patterns

    • In Java: string.match("pattern");

    • In Java: string.replaceAll("pattern", string)

  • Meta-characters and operators

    ^ beginning of line, $ end of a line

    * match 0 or more of the previous group

    + match 1 or more of the previous group

    ? match 0 or one of the previous group

    {n} match n of the previous group

    {m,n} match m to n of the previous group

    {n,} match n or more of the previous group

    | match either the group before or the groups after

    . match any character except for new line

    \ literally interpret the following meta-character or operator

Regular expression examples
Regular Expression Examples

Note: To use ( ) {} or + grep use the –E (extended) switch or precede with \

More grep examples
More grep Examples

Contents of a file called homework

Math: problems 12-10 to 12-33, due Monday

BasketWeaving: make a 6-inch basket, DONE

Psychology: essay on Animal Existentialism, due end of term

Surfing:catch at least 10

grep commands

>grep –v DONE homework displays all but line 2

>grep –c DONE homework displays 1

>grep –wi ".*a.*" on homework displays all lines

>grep –w "m.*e" homework displays line 2

>grep –i "d.*e" homework displays lines 1, 2 and 3

>grep '\(Ma\|DO\).*' homework displays lines 1 and 2

Note: the last example escapes the parentheses and the vertical bar

Sorting data
Sorting Data

  • Background

    • Each line in a file is a record

    • Each line is a series of fields separated by spaces and/or tabs

  • Commands

    >sort fileName sorts fileName on the 1st field of each line

    >sort +5 fileName sorts on the 6th field of each line

    >sort –n +4 fileName sort on the 5th field numerically

    >sort –t ':' +3r +2 fileName sort descending on the 4th field, and then ascending on the 3rd with ':' as a delimeter

    >sort –t ':' fileName sort using ':' as a separator character

    >sort –u +1 fileName sort reverse on the 2nd field and remove duplicates (output must be unique)

    >sort –k 3,4 in a pipe sorts by the key, from field 4 through field 5

    >sort +4n +7 sorts numeric by the 5th field and alphabetic by the 8th

Sed stream editor
SED (Stream Editor

  • SED is a filter

    • Input from stdin or a file

    • Output to stdout or a file

    • Modifies the input to produce the output

    • Non-interactive

  • Processing

    • Read from an input stream

    • Perform line oriented commands

    • Write to an output stream

  • Syntax: >sed [-i] command | [-e command] … [file]

Search and replace
Search and Replace

Note: This syntax works in vi, more, awk

  • Search, change and redirect to newFile>sed ‘s/cat/dog/g' file > newFile

  • Search, change, and edit file>sed –i ‘s/cat/dog/g' file

  • Specific range of lines: >sed '5,10s/cat/dog/g' file

  • Lines apply search to lines containing OK: >sed '/OK/s/cat/dog/g' names

  • Lines apply to lines having 2 numeric characters>sed '/[0-9]\{2\}/s/cat/dog/g' names

  • Delete range of lines: >sed '5,10d' file

Note: single quotes suppress the shell's interpretation of special characters

Note: You must escape the characters: +, { and } for it to work

Complex commands

sed –i \

-e 's/mon/Monday/g' \

-e 's/tue/Tuesday/g' \

-e 's/wed/Wednesday/g' \

-e 's/thu/Thursday/g' \

-e 's/fri/Friday/g' \

-e 's/sat/Saturday/g' \

-e 's/sun/Sunday/g' \


The backslash is a continuation character

The –e specifies another command (extension)

The '/g/ means change every occurrence on each line, not just the first

Complex Commands


  • AWK (Aho, Weinberger, Kernigham)

  • Special purpose programming language

    • Interpretive

    • Useful for UNIX Scripts

  • Purposes

    • Filter text files based on supplied patterns

    • Produce reports

    • Callable from "vi"

    • Create simple databases

    • Simple mathematical operations

    • Creating scripts

  • Not good for large complicated tasks

  • Other interpretive languages: perl, php

General syntax

The single quote causes the shell to ignore special characters

The various clauses are optional

Much of the syntax for <action> clauses is c and Java compatible

The patterns utilize regular expressions

BEGIN {<initialization>}

<pattern> {<action>}

<pattern> {<action>}

<pattern> {<action>}

END {<final actions>}

>awk '<awk program>'

General Syntax

Awk general operation
AWK General Operation characters

  • Each file consists of a series of records

  • Each record is a series of fields

  • Defaults

    • Record separator: new line character

    • Field separator: white space characters

  • Flow of Operation

    • Read the input file line by line

    • If it matches the line, then process

    • Otherwise skip

Some awk simple examples
Some AWK Simple Examples characters

  • Print fields of records in a file>awk ' {print $5, $6, $7, $8} ' fileName

  • Print lines with a search string>awk '/gold/ {print}' fileName

  • Print the number of records>awk 'END {print NR, "records"}' fileName

  • Print records using a condition>awk '{if ($3 < 1980) print $3}' fileName

  • Using variables>awk '/gold/{sum += $2} END {print "value = " sum}' fileName

Longer program in a file
Longer Program in a file characters

# awk program summarizing a coin collection

BEGIN {num_gold=0; wt_gold=0; }

{ /gold/ {num_gold++; wt_gold += $2}; }


{ val_gold = 485 * wt_gold;printf("\n Gold Pieces: %2d," num_gold);

printf("\n Gold Weight: %5.2f", wt_gold);

printf("\n Gold Value: %7.2f\n", val_gold);


To Execute an AWK program:

>awk –f <program fileName>

Invoking awk
Invoking AWK characters

>awk [-F<ch>] [<program>] [-f <programFile>] [<vars>] [- | <datafile>]

  • <ch> is a field separator (default: space, tab)

  • <program> an AWK program

  • <programFile> a file containing an AWK program

  • <vars> a series of variables to initialize>awk –f program f1=file2 f2=file1 > output

  • - means accept AWK input from STDIN

  • <dataFile> a file containing data to process

Note: AWK is often invoked repeatedly in shell scripts

Search patterns
Search Patterns characters

  • An exact string: /The/

  • A string starting a line: /^The/

  • A string ending a line: /The$/

  • A String ignoring case of first letter: /[Tt]he

  • Decimal: /[0-9]*.[0-9]*/

  • Alphanumeric: /[a-zA-Z0-9]*/

  • Choice between two strings: /(da|De).*/

  • Numeric: /[+-]?[0-9]+/

  • Any Boolean expression: $4>90 or $4>$5

Note: Some utilities require \(, \) and \| if you use ()| regular expression characters

Built in variables
Built in Variables characters

  • NR: Total number of records

  • NF: Total number of fields

  • FILENAME: The current input file

  • FS: Field separator character

  • RS: Record separator character

  • OFS: Output field separator character

  • ORS: Output record separator character

  • OFMT: The default printf output format

Arrays and control structures
Arrays and control structures characters

  • Indexed and associative arrays

    • By index: months[3] = "March";

    • Associative: debts["Kim"] = 1000;

    • Note: arrays index from one, not zero

  • Counter Controled: for (i=1, i<100; i++) data[i] = i;

  • Iterator: for (i in myArray) print i, names[i];

  • Pre test: i=0; while (i<20) data[i] = i++;

  • Condition: if (i==1) print debts["Kim"] else print debts["Joe"]; print (i==1)? debts["Kim"] : debts["Joe"];

  • Unconditional control statements

    • break: jump out of a loop

    • continue: next iteration

    • next: get next line of input

    • exit: exit the AWK program

Built in functions
Built-in functions characters

  • Square root: print sqrt(3.6)

  • Integer portion: print int(3.2)

  • Substring: print substr("abcde", 3,2);

  • Split: letters = split("a;b;c;d;e", ";");

  • Position: print index("gorbachev", "bach");Note:if a substring doesn't exist, 0 returnedNote:Strings index from one, not zero

printf characters

  • printf(<template>, <arguments>);

    • printf applies the template to the arguments

    • Formats are specified in the templates%d for integer output%o for octal%x for hexadecimal%s for string%e for exponential format%f for floating point format

    • Greater control%5.2f means 5 spaces wide, print two digits%-8.4s means left justify, 8 wide, print 4 characters%08s means output leading zeroes, print 8 characters

Escape characters
Escape Characters characters

  • New line: \n

  • Carriage return: \r

  • Backspace: \b

  • Horizontal tab: \t

  • Form feed: \f

  • A quote: \"

  • A backslash: \\

Awk redirection and pipes
AWK redirection and pipes characters

  • Create a file with the first field>awk '{print $1 >> "file" }

  • Pipe output to another utility>ls –l | awk '{print $8}' | tr '[a-z]' '[ A-Z]'Pipe to a utility to translate from lower to upper case

  • Sort the grades file and print the first field>sort +4n grades | awk '{print $1}'

  • list .txt files < 2000 bytes, print sorted descending>ls –l | grep '\.txt$' | awk '$5 < 2000 {print $9, $5}' | sort –nr +1

More examples
More Examples characters

  • Print Bush's grades>awk '/Bush/{print $3, $4}' grades

  • Print first name, last name, and quiz 3 grade for everyone who got more than a 90 on quiz 1 and 2>awk '{if ($4>90 && $5>90) print $3, $2, $6}' grades>awk '$4>90 && $5>90 {print $3, $2, $6}'

  • Print username for user with userid 502>awk –F: '{if ($3==502) print $1}'>awk –F: '$3==502 {print $1}'