1 / 44

awk - PowerPoint PPT Presentation

  • Uploaded on

awk. awk is a file-processing programming language. Makes it easy to perform text manipulation tasks. Is used in Generating reports Matching patterns Validating data Filtering data for transmission An awk program is a sequence of statements of the form Pattern {action}

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' awk' - joyce

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  • awk is a file-processing programming language.

  • Makes it easy to perform text manipulation tasks.

  • Is used in

    • Generating reports

    • Matching patterns

    • Validating data

    • Filtering data for transmission

  • An awk program is a sequence of statements of the form

    • Pattern {action}

    • Scans the input lines, in order, one at a time.

    • Searches for the pattern and if pattern is found, the corresponding action is performed.

    • Each statement of awk program is executed for each line of input.



Executed once before any input is read

main control loop

Executed for each line of input

Input lines


Executed once all input is read

Awk programming model
awk programming model

  • awk program consists of a main input loop (you don’t write the loop but the main program works as one).

  • The main routine reads one line of input from a file and makes it available for processing. The main loop executes as many times as there are lines in the input.

  • Preprocessing before the main loop and post processing after the loop are done with BEGIN and END.

  • The routine is applied to each input line, one line at a time.

Two ways to present the program to awk.

Make the program the first argument on the command line – if the program is short.

awk ‘program ‘ [filename ....]


%awk '/Smith/ {print}' people

%awk '/Smith/ {print}' -

Put the program in a separate file and tell awk to use the program file on the input files.


awk -f awkprog file1 file2

Keywords and some important functions


break, close, continue, exit, exp, for, getline, if, in, index, int, length

log, next, number, print, printf, split, sprintf, sqrt, string, string, substr, while


Assignment, compound assignment, arithmetic, relational, logical and regular expression matching operators.


Some regular expression metacharacters

\ - escapes any meta character that follows, including itself.

^ - anchors the following regular expression to the beginning of string.

$ - anchors the following regular expression to the end of string.

. (dot) Matches any character including newline

[…] – matches any one of the class characters enclosed between the brackets.

[^] – A circumflex as first character inside [] reverses the match to all characters except those listed in the [].

r1 | r2: between two regular expressions r1 and r2, it allows either of the regular expressions to be matched.

r* - Matches any number (including zero) of the regular expression that precedes it.

r+ - Matches one or more occurences of the regular expression that precedes it.

r? - Matches 0 or 1 occurences of the regular expression that precedes it.

() – groups regular expressions

\{n,m\} – Matches a range of occurences of a single character that precedes it. Matches any number of occurences between n and m.

May not be available in very old versions.

Some Regular Expression Metacharacters

Writing regular expressions
Writing Regular Expressions

  • Writing regular expressions involves three steps:

    • Specification: Knowing what you want to match.

    • Coding: Writing an expression to describe what you want to match

    • Testing: Testing the pattern to see what it matches.

    • Testing your regular expression may result in,

      • Hits: Lines you wanted to match

      • Misses: Lines you did not want to match

      • Omissions:Lines you wanted to match but did not.

      • False Alarms: The lines you matched but did not want to match.

    • Eliminate false alarms by limiting the matches and capture the omissions by expanding the possible matches.

Some examples
Some Examples

What do they match?

  • [a-zA-Z?+!] -

  • [a-zA-Z][?+!] -

  • [-+*/] -

  • AB\{2,4\}C -


  • Compan(y|ies) -

  • [0-9][0-9]*\.\{2,\}[a-z][a-z]* -

Multiline records
Multiline Records

  • FS – default value is a single space. FS can be set to a single character. When more than one character is given it is interpreted as a regular expression.

  • RS – default value is a newline. Default value can be changed.

  • Example:

    BEGIN {RS = "" ; FS = "\n"} # Record separator is a blank line

    { print "Name ", $1

    print "Zip ", $NF


    Input file:

    John Smith

    235 Alameda

    Santa Clara




    Name John Smith

    Zip 95053


cat prog1.awk

# test for integer, string or a blank line.

/[0-9]+/ {print $0 ": An integer"}

/[A-Za-z]+/ { print $0 ": A String"}

/^$/ {print "A Blank line"}

# + metacharacter – one or more

cat testfile


This is a test

789 Hello

%awk –f prog1.awk testfile

1234: An integer

This is a test: A String

789 Hello: An integer

789 Hello: A String

A Blank line

A Blank line



%cat prog2.awk

BEGIN {FS = ","}

# Comma is the field separator

{ print $1

print $2

print $3


% cat prog3.awk

BEGIN {FS = ","}

/CA/ {print $1 "," $3} # will match any field with CA

$3 ~ /CA/ {print $1 "," $3} # field match

%cat testfile2

John Smith, Santa Clara, CA

Mary Jones, Red Bank, NJ

Susan Wang, Denver, CO

% awk –f prog2.awk testfile2

What is the output?

More than one character can be specified as a field separator, it will be interpreted as a regular expression.


FS = “\t+”

How many fields are in the following line?


FS= “[‘:,\t\]



$cat prog4.awk

BEGIN {printf ("Scores\n "); }

{ print $0; total = total + $2}

#NR – number of input records that are read

END {print "Average score is ", total / NR }

$cat scores

Smith 80

Jones 97

Chan 95

King 78

$ awk -f prog4.awk scores


Smith 80

Jones 97

Chan 95

King 78

Average score is 87.5


Passing parameters into awk script
Passing Parameters into awk script

  • Parameters can be passed from the command line into an awk script. A variable(s) is set from the command line and can be accessed from the awk script.

  • Parameters that are passed in, are not available in BEGIn, they are available to the script only after the first line of input is read.

  • Example – param.awk

    BEGIN {print "Passing Parameters"}

    {print "arg1 = ", arg1

    print "arg2 = ", arg2


    From the command line, invoke

    awk –f param.awk arg1=100 arg2=200 datafile

    A shell script’s command line arguments can be passed in as follows: Assume that the following line is in a shell script called

    awk –f param.awk “arg1=$1 arg2=$2” datafile

    $1 and $2 are the positional parameters given as arguments on command line when is invoked as 100 200

Patterns using regular expressions

# print lines ending with ia

awk ‘ia$/ {print}’ countries -

#print countries ending with ia

Awk ‘$1 ~ /ia$/ {print $1 }’ countries

#select lines where the third field #matches Asia or begins with North #or South

$3 ~ /Asia |^North | ^South/{print}

#Pattern Ranges

/Russia/,/Brazil/ {print}

#Replace USA by United States

/USA/ {$1 = "United States";print}

%cat countries

Australia 3000 Australia

USA 3615 North America

Argentina 1072 South America

India 1270 Asia

Russia 8650 Asia

China 3692 Asia

Brazil 3286 South America

Patterns Using Regular Expressions

Associative arrays
Associative Arrays

  • Arrays in awk are associative arrays where the index can be a number or a string.

  • The order in which the items are retrieved may be random.

    %cat prog6.awk

    { x [$1] = $2 }

    END {

    for (item in x)

    print item,x[item]


    %awk –f prog6.awk scores

    Jones 89

    Smith 65

    Chen 100

    King 120

    Lowel 200

Example computing grades

Cat prog7.awk

BEGIN { OFS = "\t" }{

# main loop applied to all input lines

total = 0

for (I = 2; I <= NF; ++I)

total += $I;

average = total / (NF -1)

# store each student average

stAvg[NR] = average

avgByName[$1] = average

#determine the letter grade

if (average >= 90) grade = "A"

else if (average >= 80) grade = "B"

else if (average >= 70) grade = "C"

else grade = "F“

#store a count of the letter grades



Example: Computing Grades

#class statistics


#calculate class average

for (x = 1; x <= NR; x++)

classTotal += stAvg[x]

classAve = classTotal / NR

print "Class Average = " classAve

#determine how many above or below average

#print number of students per letter grade

print "Enter name "

getline name < "-"

print name ": " avgByName[name]

for (letterGrade in classGrade)

print letterGrade ":" classGrade[letterGrade] | "sort"


%cat grades

Smith 90 80 50

Jones 20 0 70

Wang 67 90 80

Wolf 70 100 90

Pratt 90 88 92

%awk -f prog7.awk grades

Smith 73.3333 C

Jones 30 F

Wang 79 C

Wolf 86.6667 B

Pratt 90 A

Class Average = 71.8

Enter name


Smith: 73.3333





Multidimensional arrays
Multidimensional arrays

#awk offers a syntax for subscripts that simulate a reference to multidimensional arrays

{ for (i = 1; i <= NF; ++i)

table[NR,i] = $i



for (k = 1; k <= NR ; ++k){

for (i = 1; i <= 4; ++i){

total += table[k,i]

printf("%d ", table[k,i])




{print "Total = " total}


Next and getline
next and getline

  • Next causes the next input line to be read.

  • Next statement passes control back to the top of the script.

    %cat prog9.awk

    NF == 2 {next} # skips to the next record and starts the program from the

    # beginning

    /USA/ {$4 = "United States Of America"; print $0}

    {print NR }

    %cat countries

    Japan Asia

    2: UK Europe

    3: Brazil S.America

    Egypt Africa

    5: USA N.America

    Canada N.America

    % awk –f prog9.awk countries



    5: USA N.America United States Of America


Using getline
Using getline

#Using getline function to read the next line of input

/^\/+/ { getline

print $1


#get input from command line


printf "Enter your name: "

getline name < "-"

print name


/Smith/ {


print $1


#Reading from a pipe using a getline

{while ("who" | getline)

terminal[$1] = $2



for (item in terminal)

print item, terminal[item]


Example an word lookup
Example - An word lookup

# reads a file with acronyms and their expansions,

#handles users queries

BEGIN { FS = “\t”; OFS = “\t”

printf (“Enter a word for lookup: “);


#Load the file named acronyms

FILENAME == “acronyms” {

wordList[$1] = $2



Example an word lookup cont
Example - An word lookup (cont)

#scan for command to exit program

$0 ~ /^(quit|qQ|[Xx]|exit|)$/ { exit }

#process any non-empty line

$0 != “” {

if ( $0 in wordList) { print wordList[$0]}

else print $0 “ not found”


#Prompt user to enter another word


printf (“Enter another word or q|Q to quit”);

} acronyms -

split ()

  • Split () is a built-in function that can parse any string into elements of an array.

  • Syntax:

  • No Of elements = split (string,array,separator). If no separator is specified, FS is used as the field separator.

    n = split($0,days)

    {for (j = 1; j <= n; ++j)

    print days[j]



  • The next statement forces awk to immediately stop processing the current record and go on to the next record. The rest of the current rule's action is not executed either.

  • If you think of the main body in awk is a loop, thenext statement is analogous to a continue statement: it skips to the end of the body of this implicit loop, and executes the increment (which reads another record).

  • Note: getline function causes awk to read the next record immediately, but it does not alter the flow of control in any way. So the rest of the current action executes with a new input record.

  • For example, if your awk program works only on records with four fields, and you don't want it to fail when given bad input, you might use this rule near the beginning of the program:


FILENAME == "names.txt" {

count += 1;



{print $0 }


print count


#Counts each line in the file, “names.txt”.

%cat prog9.awk

NF == 2 {next} # skips to the next record and starts the program from the

# beginning

/USA/ {$4 = "United States Of America"; print $0}

{print NR }

%cat countries

Japan Asia

2: UK Europe

3: Brazil S.America

Egypt Africa

5: USA N.America

Canada N.America

% awk –f prog9.awk countries



5: USA N.America United States Of America



  • getline is used to read the next line of input input from the current input file, from a specified file and a pipe.

  • The getline command can be used without arguments to read input from the current input file.

  • Reads the next input record and split it up into fields. This is useful if you've finished processing the current record, but you want to continue processing from the next record.

  • Note: the new value of $0 is used in testing the patterns of any subsequent rules. The original value of $0 that triggered the rule which executed getline is lost.


/^[0-9]+/ {print "Line number ", NR, ":", "starts with a number" }

/^\/\*/ { getline }

{print NR “:” $0 }


This is a cat

1234 a cat

A test

/* A comment line */

990 is the score


1:This is a cat

Line number 2 : starts with a number

2:1234 a cat

3:A test

5:990 is the score


  • Using getline to read a line into a variable

  • You can use `getline variable' to read the next record from awk's input into the variable variable. No other processing is done.

  • For example, suppose the next line is a comment, or a special string, and you want to read it, without triggering any rules. This form of getline allows you to read that line and store it in a variable so that the main read-a-line-and-check-each-rule loop of awk never sees it.

  • The getline command used in this way sets only the variables NR and FNR.

  • The record is not split into fields, so the values of the fields (including $0) and the value of NF do not change.

getline given below:

  • Using getline to read the next record from the file file.

  • Here file is a string-valued expression that specifies the file name. `< file' is called a redirection since it directs input to come from a different place.

  • For example, the following program reads its input record from the file `input.dat when it encounters a first field with a value equal to 10 in the current input file.

  • awk '{ if ($1 == 10) { getline < "input.dat" print } else print }' .

  • Since the main input stream is not used, the values of NR and FNR are not changed. But the record read is split into fields in the normal manner, so the values of $0 and other fields are changed. So is the value of NF.

  • Using getline to read the output of a command from a pipe: given below:

  • You can pipe the output of a command into getline, using `command | getline'. In this case, the string command is run as a shell command and its output is piped into awk to be used as input. This form of getline reads one record at a time from the pipe.

  • For example, the following program copies its input to its output, except for lines that begin with [email protected]', which are replaced by the output produced by running the rest of the line as a shell command:

    awk ‘{

    if ($1 == "@execute")

    { tmp = substr($0, 10)

    while ((tmp | getline) > 0)


    close(tmp) }

    else print }' input

    The close function is called to ensure that if two identical [email protected]' lines appear in the input, the command is run for each one.

Close() given below:

  • Close () allows you to close open files and pipes.

    • There may be a limitation on the number of files and pipes that can be open at the same time.

    • Closing a pipe allows you to run the same command twice.

    • Example: Close (“who”)

What is the output for the given input file given below:



@execute who


  • Using getline to read the output of a command from pipe into a variable:

  • When you use `command | getline var', the output of the command command is sent through a pipe to getline and into the variable var.

  • Example:

  • awk 'BEGIN { "date" | getline current_time close("date") print "Report printed on " current_time }'

  • In this version of getline, none of the built-in variables are changed, and the record is not split into fields.

Using system
Using system() a variable:

  • System() function executes a command supplied as an expression.

  • The output generated from executing system() is not available within the program for processing.

  • System() returns the exit status of the program that was executed.


    #!/bin/awk -f


    status = system ("mkdir temp")

    if (status != 0)

    print "command failed"


User defined functions
User-defined functions a variable:

  • A Function definition can be anywhere that a pattern-action rule can be.

  • Input to the function are passed as a list of parameters.


    # inserts a string, insertStr after position in aString

    function insertString(aString, position, insertStr){

    before = substr(aString, 1,position)

    after = substr(aString,position +1)

    return before insertStr after


    { print insertString($1,5,"BBBB") }#No spaces are allowed between the function name and the left parenthesis.

  • All the variables in the parameter list are considered local to the function.

  • All variables defined in the body of the function are treated as global variables.

  • Therefore any temporary variables that are declared are put at the end of the parameter list.

  • Example:

    function insertString(aString, position, insertStr,after){

    before = substr(aString, 1,position)

    after = substr(aString,position +1)

    return before insertStr after


    { print insertString($1,5,"BBBB") }

    { print aString }

    { print "before: " before}

    { print "after: "after }

cat testFile to the function.


This is a test


awk –f fun2.awk testFile


before: Hello



before: This



before: XYZ12


Functions to the function.

  • Arrays are passed by reference

    #!/bin/awk -f

    function moveSmallest(LIST,SIZE, temp,small,smal

    small = LIST[1]

    for (i = 2; i <= SIZE; ++i){

    if (LIST[i] < small){

    small = LIST[i]

    smallIndex = i;



    LIST[smallIndex] = LIST[1]

    LIST[1] = small




    array[1] = 12;

    array[2] = 0;

    array[3] = -1;

    array[4] = 100;


    for(i = 1; i <= 4;++i){

    print array[i]



Some built in functions

Arithmetic Functions to the function.

cos, exp,int,log,sin,sqrt,atan2,rand,srand

Some useful String Functions

index, length, split, sub,substr,tolower,loupper

gsub(regExp,replaceWithString,inString) – globally substitutes replaceWithString for regExp in inString.

match (string, regExp) – returns the position of where the regExp is found in string or 0 if no occurences are found.

Some built-in Functions

Passing parameters into a script
Passing parameters into a script to the function.

  • Input is passed into an awk script by setting variables on the command line.

  • Example:

    • awk –f awkprog x=1 y=2 inputfile

    • The variables x and y can be accessed in the main loop (not in the BEGIN section).

    • The system variables ARGC and ARGV can be used to access the command line arguments


      BEGIN { print "BEGIN: " n }

      NR == 1 { print ARGC; print n

      for (i = 0; i < ARGC; ++i){

      print ARGV[i]}


      % awk -f param.awk n=20 testfile







An array of environment variables
An array of Environment variables to the function.

#!/bin/awk -f


for (env in ENVIRON){

print env "=" ENVIRON[env]


print “Logname = “,ENVIRON[“LOGNAME”]