The awk utility
This presentation is the property of its rightful owner.
Sponsored Links
1 / 52

The awk Utility PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

CS465 - Unix. The awk Utility. Background. awk was developed by Aho, Weinberger, and Kernighan (of K & R) Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions awk - original version nawk - new awk - improved awk

Download Presentation

The awk Utility

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The awk utility

CS465 - Unix

The awk Utility


Background

Background

  • awk was developed by

    • Aho, Weinberger, and Kernighan (of K & R)

    • Was further extended at Bell Labs

  • Handles simple data-reformatting jobs easily with just a few lines of code.

  • Versions

    • awk - original version

    • nawk - new awk - improved awk

    • gawk - gnu awk - improved nawk


How awk works

How awk works

  • awk commands include patterns and actions

    • Scans the input line by line, searching for lines that match a certain pattern (or regular expression)

    • Performs a selected action on the matching lines

  • awk can be used:

    • at the command line for simple operations

    • in programs or scripts for larger applications


Running awk

Running awk

  • From the Command Line:

    $ awk '/pattern/{action}' file

  • OR From an awk script file:

    $ cat awkscript

    # This is a comment

    /pattern/ {action}

    $ awk –f awkscript file


Awk s format using input from a file

awk’s Format using Input from a File

$ awk /pattern/ filename

  • awk will act like grep

    $ awk '{action}' filename

  • awk will apply the action to every line in the file

    $ awk '/pattern/ {action}' filename

  • awk will apply the action to every line in the file that matches the pattern


Example 1

Example 1


Example 11

Example 1


Example 12

Example 1


Records and fields

record 1 -> George Jones Admin

record 2 -> Anthony Smith Accounting

Records and Fields

  • Each record is split into fields, delimited by a special character (whitespace by default)

    • Can change delimeter with –F or FS

  • awk divides the input into records and fields

    • Each line is a record (by default)

field-1 field-2 field-3

| | |

v v v


Awk field variables

awk field variables

  • awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script).

    • $1 is the first field, $2 is the second…

    • $0 is a special field which is the entire line

    • NF is always set to the number of fields in the current line (no dollar sign to access)


Example 13

Example #1

$ cat students

Bill White 77777711980/01/01 Science

Jill Blue 11111171978/03/20 Arts

Ben Teal 71717171985/02/26 CompSci

Sue Beige 17171711963/09/12 Science

$

$ awk '/Science/{print $1, $2}' students

Bill White

Sue Beige

$

  • Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated):

    • $ awk '/Science/{print $1 $2}' students

    • BillWhite

    • SueBeige


Example 2

Example #2

$ cat phonelist

Joe Smith 774-0888

Mary Jones 772-2345

Hank Knight 494-8888

$

$ awk '{print "Name: ", $1, $2, \

" Telephone:", $3}' phonelist

Name: Joe Smith Telephone: 774-0888

Name: Mary Jones Telephone: 772-2345

Name: Hank Knight Telephone: 494-8888

$

  • No pattern given, so matches ALL lines

  • Text strings to print are placed in double quotes


Example 3

Example #3

Given a username, display the person’s real name:

$ grep small /etc/passwd

small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh

$

$ awk -F: '/small000/{print $5}' /etc/passwd

Faculty - Pam Smallwood

$


Awk using input from commands

awk using Input from Commands

  • You can run awk in a pipeline, using input from another command:

    $ command | awk '/pattern/ {action}'

    • Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern


Piped awk input example

Piped awk Input Example

$ w

1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00, 0.00, 0.01

User tty login@ idle JCPU PCPU what

pugli766 pts/8 Tue10pm 3days -ksh

lin318 pts/17 10:58am 1:45 vi choosesort

small000 pts/18 12:43pm w

mcdev712 pts/10 11:52am 14 1 vi adddata

gibbo201 pts/12 12:15pm 18 -ksh

nelso828 pts/16 7:17pm 17:43 -ksh

$

$ w | awk '/ksh/{print $1}'

pugli766

gibbo201

nelso828

$


Relational operators

Relational Operators

  • awk can use relational operators ( <, >, <=, >=, ==, !=, ! ) to compare a field to a value

    • If the outcome of the comparison is true then the the action is performed

  • Examples:

    • To print every record in the log.txt file in which the second field is larger than 10

      • $ awk '$2 > 10' log.txt

  • To print every record in the log.txt file which does NOT contain ‘Win32’

    • $ awk '!/Win32/' log.txt


Relational operator example

Relational Operator Example

$ who

pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)

lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)

small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)

mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)

gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)

nelso828 pts/16 Jun 5 19:17 (65.100.138.177)

$

$ who | awk '$4 < 6 {print $1, $3, $4, $5}'

pugli766 Jun 3 22:24

nelso828 Jun 5 19:17

$


Piping awk output

Piping awk output

$ who

pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)

lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)

small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)

mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)

gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)

nelso828 pts/16 Jun 5 19:17 (65.100.138.177)

$

$ who | awk '$4 == 6 {print $1}' | sort

gibbo201

lin318

mcdev712

small000

$


Awk programming

awk Programming

  • awk programming is done by building a list

    • The list is a list of rules

    • Each rule is applied sequentially to each line (record)

  • Example:

    /pattern1/ { action1 }

    /pattern2/ { action2 }

    /pattern3/ { action3 }


Awk pattern matching

awk - pattern matching

  • Before processing, lines can be matched with a pattern.

    /pattern/ { action }execute if line matches pattern

    The pattern is a regular expression.

  • Examples:

    /^$/ { print "This line is blank" }

    /num/ { print "Line includes num" }

    /[0-9]+$/ { print "Integer at end:", $0 }

    /[A-z]+/ { print "String:", $0 }

    /^[A-Z]/{ print "Starts w/uppercase letter" }


Awk program from a file

awk program from a file

  • The awk commands (program) can be placed into a file

  • The –f (lowercase f) indicates that the commands come from a file whose name follows the –f

    $ awk –f awkfile datafile

    The contents of the file called awkfile will be used as the commands for awk


Example 14

Example 1

$ cat students

Bill White 3333331980/01/01 Science

Jill Blue 3334441978/03/20 Arts

Bill Teal 5555551985/02/26 CompSci

Sue Beige 5557771963/09/12 Science

$ cat awkprog

/5?5/ {print $1, $2}

/3*4/ {print $5}

$

$ awk –f awkprog students

Arts

Bill Teal

Sue Beige

$

**NOTE: All patterns applied to each line before moving to next line


Example 21

Example 2

$ cat students

Bill White 3333331980/01/01 Science

Jill Blue 3334441978/03/20 Arts

Bill Teal 5555551985/02/26 CompSci

Sue Beige 5557771963/09/12 Science

$ cat awkprog

/Science/ {print "Science stu:", $1, $2}

/CompSci/ {print "Computing stu:", $1, $2}

$

$ awk –f awkprog students

Science stu: Bill White

Computing stu: Bill Teal

Science stu: Sue Beige

$


More about patterns

More about Patterns

  • Patterns can be:

    • Empty: will match everything

    • Regular expressions:

      /reg-expression/

    • Boolean Expressions:

      $2=="foo" && $7=="bar"

    • Ranges:

      /jones/,/smith/


Example boolean expressions

Example - Boolean Expressions

$ cat students

Bill White 3333331980/01/01 Science

Jill Blue 3334441978/03/20 Arts

Bill Teal 5555551985/02/26 CompSci

Sue Beige 5557771963/09/12 Science

$ cat awkprog

$3 <= 444444 {print "Not counted"}

$3 > 444444 {print $2 ",", $1}

$

$ awk –f awkprog students

Not counted

Not counted

Teal, Bill

Beige, Sue

$


Example ranges

Example - Ranges

$ cat students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

Sue Beige 555777 1963/09/12 Science

$

$ awk '/333333/,/555555/' students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

$


More built in awk variables

More Built-In awk Variables

  • Two types: Informative and Configuration

  • Informative:

    NR = Current Record Number (start at 1)

    • Counts ALL records, not just those that match

      NF = Number of Fields in the Current Record

      FILENAME = Current Input Data File

    • Undefined in the BEGIN block


Example using nf

Example using NF

$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$

$ awk '{print NF}' names

3

4

2

0

$


Example using a boolen nf and nr

Example using a boolen, NF, and NR

$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$

$ awk 'NF > 2 {print NR ":", NF, "fields"}' names

1: 3 fields

2: 4 fields

$


Built in awk functions

Built-in awk functions

log(expr)natural logarithm

index(s1,s2)position of string s2 in string s1

length(s)string length

substr(s,m,n)n-char substring of s starting at m

tolower(s)converts string to lowercase

printf()print formatted - like C printf


Example 22

Example 2


Print printf

print & printf

  • Use print in an awk statement to output specific field(s)

  • printf is more versatile

    • works like printf in the C language

    • May contain a format specifier and a modifier


Format specification

Format Specification

  • A format specification consists of a percent symbol, a modifier, width and precision values, and a conversion character

  • To display the third field as a floating point number with two decimal places:

    awk '{printf("%.2f\n", $3)}' file

  • You can include additional text in the printf statement

    '{printf ("3rd value: %.2f\n", $3)}'


Specifiers width precision modifiers

Type Specifiers:

%cSingle character

%dinteger (decimal)

%fFloating point

%sString

Between the % and the specifier you can place the width and precision

%6.2f means a floating point number in a field of width 6 in which there are two decimal places

Modifiers control details of appearance:

-minus sign is the left justification modifier right justification)

+plus sign forces the appearance of a sign (+,-) for numeric output

0zero pads a right justified number with zeros

Specifiers, Width, Precision, & Modifiers


Awk variables

awk Variables

  • Variables

    • No need for declaration

      • Implicitly set to 0 AND the Empty String

    • Variable type is a combination of a floating-point and string

    • Variable is converted as needed, based on its use

      title = "Number of students"

      no = 100

      weight = 13.4


Example 23

Example 2


Awk program execution

awk program execution

Executes only once before

reading input data

BEGIN { ….}

{

….}

specification {

…..

}

END {

…..

}

Executes for each input line

Executes for each input linethat matches specified /pattern/

or Boolean expression

Executes at the end after all

lines being processed


Example 1 count lines in file

Example #1: Count # lines in file

  • $ cat awkprog

  • BEGIN {total = 0}

  • {total = total + 1}

  • END {print total " lines"}

  • $ cat testfile

  • Hello There

  • Goodbye!

  • $

- Set total to 0 before processing any lines

- For every row in the file, execute {total = total + 1}

- Print total after all lines processed.

  • $ awk –f awkprog testfile

  • 2 lines

  • $


Ex 2 count lines containing a pattern

Ex #2: Count lines containing a pattern

{totalpattern++} only executes if the line in filename has pattern appearing in the line.

$ cat Simpsons

Marge34

Homer32

Lisa10

Bart11

Maggie01

$ cat countthem

BEGIN {totalMa = 0; totalar = 0}

/Ma/ { totalMa++ }

/ar/ { totalar++ }

END { print totalMa " Ma's"

print totalar " ar's"}

$

$ awk -f countthem Simpsons

2 Ma's

2 ar's

$


Example 3 add line numbers

Example #3: Add line numbers

$ cat numawk

BEGIN { print "Line numbers by awk" }

{ print NR ":", $0 }

END { print "Done processing " FILENAME }

$ cat testfile

Hello There

Goodbye!

$

  • $ awk –f numawk testfile

  • Line numbers by awk

  • 1: Hello There

  • 2: Goodbye!

  • Done processing testfile

  • $


More built in awk variables1

More Built-In awk Variables

  • Two types: Informative and Configuration

  • Configuration

    FS = Input field separator

    OFS = Output field separator

    (default for both is space " ")

    RS = Input record seperator

    ORS = Output record seperator

    (default for both is newline "\n")


Example 1 reverse 2 columns

Example #1: Reverse 2 columns

$ cat switch

BEGIN{FS="\t"}

{print $2 "\t" $1}

$ awk -f switch Simpsons

34Marge

32Homer

10Lisa

11Bart

01Maggie

$

NOTE: Columns separated by tabs

  • Alternatively you could do the following:

    • $ awk -F\t '{print $2 "\t" $1}' Simpsons


Example 2 sum a column

Example #2: Sum a column

$ cat awksum2

BEGIN { FS="\t"

sum = 0 }

{sum = sum + $2}

END { print "Done"

print "Total sum is " sum }

$

  • $ awk -f awksum2 Simpsons

  • Done

  • Total sum is 88

  • $


Example 3 comma delimited file

Example #3: Comma delimited file

$ cat names

Bill Jones,3333,M

Pam Smith,5555,F

Sue Smith,4444,F

$

  • $ awk -F, '{print $2}' names

  • 3333

  • 5555

  • 4444

  • $


Longer awk program

Longer awk program

$ cat awkprog

BEGIN { print "Processing..." }

# print number of fields in first line

NR == 1 { print $0, NF, "fields"}

/^Unix/ { print "Line starts with Unix: ", $0 }

/Unix$/ { print "Line ends with Unix: " $0 }

# finishing it up

END {print NR " lines checked"}

$


Awk program execution1

awk program execution

$ cat datfile

First Line

Unix is great!

What else is better?

This is Unix

Yes it is Unix

Goodbye!

$

$ awk -f awkprog datfile

Processing...

First Line 2 fields

Line starts with Unix: Unix is great!

Line ends with Unix: This is Unix

Line ends with Unix: Yes it is Unix

6 lines checked

$


Awk programming language syntax

awk programming language syntax

if ( found == true )# if (expr)print “Found”; # {action1}else# elseprint “Not found”; # {action2}

while ( i <= 100)# while (cond){ i = i + 1;# { actions...

print i }# }


Awk programming language syntax1

awk programming language syntax

do# do{ i = i + 1; #{ actions ...

print i }# }while ( i < 100);# while (cond);

for (i=1; i < 10; i++ ) # for (set; test; incr)

{# {

sqr = i * i;#actions

print i " squared is " sqr}# }


Awk longer example

awk – longer example

  • Write an awk program that prints out content of a directory in the following format:

BYTESFILE

24576 copyfile

736 copyfile.c

740 copyfile.c~

24576 dirlist

989 dirlist.c

977 dirlist.c%

24576 envadv

185 envadv.c

<dir> tmp

740 x.c

Total: 73684 bytes in

9 regular files


Awk example code

awk example - code

$ cat awkprog

BEGIN {print " BYTES \t FILE";

sum=0; filenum=0

}

# test for lines starting with -

/^-/ { sum += $5

++filenum

printf ("%10d \t%s\n", $5, $9) }

# test for directories - line starts with d

/^d/ { print " <dir> \t", $9 }

# conclusion

END { print "\n Total: " sum" bytes in"

print " " filenum " regular files"

}

$


Awk example output

awk example - output

$ ls -l

total 84

drwx------ 2 small000 faculty 512 Jun 2 13:44 sub2

-rwx------ 1 small000 faculty 224 Jun 3 10:35 sumnums

-rw------- 1 small000 faculty 2 Jun 3 21:08 tab

-rw------- 1 small000 faculty 187 Jun 8 11:15 tbook

$

$ ls -l | awk –f awkprog

BYTES FILE

<dir> sub2

224sumnums

2tab

187tbook

Total: 413 bytes in

3 regular files

$


Awk handout

awk Handout

  • Review awk examples on handout


  • Login