The awk utility
Sponsored Links
This presentation is the property of its rightful owner.
1 / 52

The awk Utility PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on
  • Presentation posted in: General

CS465 - Unix. The awk Utility. Background. awk was developed by Aho, Weinberger, and Kernighan (of K & R) Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions awk - original version nawk - new awk - improved awk

Download Presentation

The awk Utility

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


CS465 - Unix

The awk Utility


Background

  • awk was developed by

    • Aho, Weinberger, and Kernighan (of K & R)

    • Was further extended at Bell Labs

  • Handles simple data-reformatting jobs easily with just a few lines of code.

  • Versions

    • awk - original version

    • nawk - new awk - improved awk

    • gawk - gnu awk - improved nawk


How awk works

  • awk commands include patterns and actions

    • Scans the input line by line, searching for lines that match a certain pattern (or regular expression)

    • Performs a selected action on the matching lines

  • awk can be used:

    • at the command line for simple operations

    • in programs or scripts for larger applications


Running awk

  • From the Command Line:

    $ awk '/pattern/{action}' file

  • OR From an awk script file:

    $ cat awkscript

    # This is a comment

    /pattern/ {action}

    $ awk –f awkscript file


awk’s Format using Input from a File

$ awk /pattern/ filename

  • awk will act like grep

    $ awk '{action}' filename

  • awk will apply the action to every line in the file

    $ awk '/pattern/ {action}' filename

  • awk will apply the action to every line in the file that matches the pattern


Example 1


Example 1


Example 1


record 1 -> George Jones Admin

record 2 -> Anthony Smith Accounting

Records and Fields

  • Each record is split into fields, delimited by a special character (whitespace by default)

    • Can change delimeter with –F or FS

  • awk divides the input into records and fields

    • Each line is a record (by default)

field-1 field-2 field-3

| | |

v v v


awk field variables

  • awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script).

    • $1 is the first field, $2 is the second…

    • $0 is a special field which is the entire line

    • NF is always set to the number of fields in the current line (no dollar sign to access)


Example #1

$ cat students

Bill White 77777711980/01/01 Science

Jill Blue 11111171978/03/20 Arts

Ben Teal 71717171985/02/26 CompSci

Sue Beige 17171711963/09/12 Science

$

$ awk '/Science/{print $1, $2}' students

Bill White

Sue Beige

$

  • Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated):

    • $ awk '/Science/{print $1 $2}' students

    • BillWhite

    • SueBeige


Example #2

$ cat phonelist

Joe Smith 774-0888

Mary Jones 772-2345

Hank Knight 494-8888

$

$ awk '{print "Name: ", $1, $2, \

" Telephone:", $3}' phonelist

Name: Joe Smith Telephone: 774-0888

Name: Mary Jones Telephone: 772-2345

Name: Hank Knight Telephone: 494-8888

$

  • No pattern given, so matches ALL lines

  • Text strings to print are placed in double quotes


Example #3

Given a username, display the person’s real name:

$ grep small /etc/passwd

small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh

$

$ awk -F: '/small000/{print $5}' /etc/passwd

Faculty - Pam Smallwood

$


awk using Input from Commands

  • You can run awk in a pipeline, using input from another command:

    $ command | awk '/pattern/ {action}'

    • Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern


Piped awk Input Example

$ w

1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00, 0.00, 0.01

User tty login@ idle JCPU PCPU what

pugli766 pts/8 Tue10pm 3days -ksh

lin318 pts/17 10:58am 1:45 vi choosesort

small000 pts/18 12:43pm w

mcdev712 pts/10 11:52am 14 1 vi adddata

gibbo201 pts/12 12:15pm 18 -ksh

nelso828 pts/16 7:17pm 17:43 -ksh

$

$ w | awk '/ksh/{print $1}'

pugli766

gibbo201

nelso828

$


Relational Operators

  • awk can use relational operators ( <, >, <=, >=, ==, !=, ! ) to compare a field to a value

    • If the outcome of the comparison is true then the the action is performed

  • Examples:

    • To print every record in the log.txt file in which the second field is larger than 10

      • $ awk '$2 > 10' log.txt

  • To print every record in the log.txt file which does NOT contain ‘Win32’

    • $ awk '!/Win32/' log.txt


Relational Operator Example

$ who

pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)

lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)

small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)

mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)

gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)

nelso828 pts/16 Jun 5 19:17 (65.100.138.177)

$

$ who | awk '$4 < 6 {print $1, $3, $4, $5}'

pugli766 Jun 3 22:24

nelso828 Jun 5 19:17

$


Piping awk output

$ who

pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)

lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)

small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)

mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)

gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)

nelso828 pts/16 Jun 5 19:17 (65.100.138.177)

$

$ who | awk '$4 == 6 {print $1}' | sort

gibbo201

lin318

mcdev712

small000

$


awk Programming

  • awk programming is done by building a list

    • The list is a list of rules

    • Each rule is applied sequentially to each line (record)

  • Example:

    /pattern1/ { action1 }

    /pattern2/ { action2 }

    /pattern3/ { action3 }


awk - pattern matching

  • Before processing, lines can be matched with a pattern.

    /pattern/ { action }execute if line matches pattern

    The pattern is a regular expression.

  • Examples:

    /^$/ { print "This line is blank" }

    /num/ { print "Line includes num" }

    /[0-9]+$/ { print "Integer at end:", $0 }

    /[A-z]+/ { print "String:", $0 }

    /^[A-Z]/{ print "Starts w/uppercase letter" }


awk program from a file

  • The awk commands (program) can be placed into a file

  • The –f (lowercase f) indicates that the commands come from a file whose name follows the –f

    $ awk –f awkfile datafile

    The contents of the file called awkfile will be used as the commands for awk


Example 1

$ cat students

Bill White 3333331980/01/01 Science

Jill Blue 3334441978/03/20 Arts

Bill Teal 5555551985/02/26 CompSci

Sue Beige 5557771963/09/12 Science

$ cat awkprog

/5?5/ {print $1, $2}

/3*4/ {print $5}

$

$ awk –f awkprog students

Arts

Bill Teal

Sue Beige

$

**NOTE: All patterns applied to each line before moving to next line


Example 2

$ cat students

Bill White 3333331980/01/01 Science

Jill Blue 3334441978/03/20 Arts

Bill Teal 5555551985/02/26 CompSci

Sue Beige 5557771963/09/12 Science

$ cat awkprog

/Science/ {print "Science stu:", $1, $2}

/CompSci/ {print "Computing stu:", $1, $2}

$

$ awk –f awkprog students

Science stu: Bill White

Computing stu: Bill Teal

Science stu: Sue Beige

$


More about Patterns

  • Patterns can be:

    • Empty: will match everything

    • Regular expressions:

      /reg-expression/

    • Boolean Expressions:

      $2=="foo" && $7=="bar"

    • Ranges:

      /jones/,/smith/


Example - Boolean Expressions

$ cat students

Bill White 3333331980/01/01 Science

Jill Blue 3334441978/03/20 Arts

Bill Teal 5555551985/02/26 CompSci

Sue Beige 5557771963/09/12 Science

$ cat awkprog

$3 <= 444444 {print "Not counted"}

$3 > 444444 {print $2 ",", $1}

$

$ awk –f awkprog students

Not counted

Not counted

Teal, Bill

Beige, Sue

$


Example - Ranges

$ cat students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

Sue Beige 555777 1963/09/12 Science

$

$ awk '/333333/,/555555/' students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

$


More Built-In awk Variables

  • Two types: Informative and Configuration

  • Informative:

    NR = Current Record Number (start at 1)

    • Counts ALL records, not just those that match

      NF = Number of Fields in the Current Record

      FILENAME = Current Input Data File

    • Undefined in the BEGIN block


Example using NF

$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$

$ awk '{print NF}' names

3

4

2

0

$


Example using a boolen, NF, and NR

$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$

$ awk 'NF > 2 {print NR ":", NF, "fields"}' names

1: 3 fields

2: 4 fields

$


Built-in awk functions

log(expr)natural logarithm

index(s1,s2)position of string s2 in string s1

length(s)string length

substr(s,m,n)n-char substring of s starting at m

tolower(s)converts string to lowercase

printf()print formatted - like C printf


Example 2


print & printf

  • Use print in an awk statement to output specific field(s)

  • printf is more versatile

    • works like printf in the C language

    • May contain a format specifier and a modifier


Format Specification

  • A format specification consists of a percent symbol, a modifier, width and precision values, and a conversion character

  • To display the third field as a floating point number with two decimal places:

    awk '{printf("%.2f\n", $3)}' file

  • You can include additional text in the printf statement

    '{printf ("3rd value: %.2f\n", $3)}'


Type Specifiers:

%cSingle character

%dinteger (decimal)

%fFloating point

%sString

Between the % and the specifier you can place the width and precision

%6.2f means a floating point number in a field of width 6 in which there are two decimal places

Modifiers control details of appearance:

-minus sign is the left justification modifier right justification)

+plus sign forces the appearance of a sign (+,-) for numeric output

0zero pads a right justified number with zeros

Specifiers, Width, Precision, & Modifiers


awk Variables

  • Variables

    • No need for declaration

      • Implicitly set to 0 AND the Empty String

    • Variable type is a combination of a floating-point and string

    • Variable is converted as needed, based on its use

      title = "Number of students"

      no = 100

      weight = 13.4


Example 2


awk program execution

Executes only once before

reading input data

BEGIN { ….}

{

….}

specification {

…..

}

END {

…..

}

Executes for each input line

Executes for each input linethat matches specified /pattern/

or Boolean expression

Executes at the end after all

lines being processed


Example #1: Count # lines in file

  • $ cat awkprog

  • BEGIN {total = 0}

  • {total = total + 1}

  • END {print total " lines"}

  • $ cat testfile

  • Hello There

  • Goodbye!

  • $

- Set total to 0 before processing any lines

- For every row in the file, execute {total = total + 1}

- Print total after all lines processed.

  • $ awk –f awkprog testfile

  • 2 lines

  • $


Ex #2: Count lines containing a pattern

{totalpattern++} only executes if the line in filename has pattern appearing in the line.

$ cat Simpsons

Marge34

Homer32

Lisa10

Bart11

Maggie01

$ cat countthem

BEGIN {totalMa = 0; totalar = 0}

/Ma/ { totalMa++ }

/ar/ { totalar++ }

END { print totalMa " Ma's"

print totalar " ar's"}

$

$ awk -f countthem Simpsons

2 Ma's

2 ar's

$


Example #3: Add line numbers

$ cat numawk

BEGIN { print "Line numbers by awk" }

{ print NR ":", $0 }

END { print "Done processing " FILENAME }

$ cat testfile

Hello There

Goodbye!

$

  • $ awk –f numawk testfile

  • Line numbers by awk

  • 1: Hello There

  • 2: Goodbye!

  • Done processing testfile

  • $


More Built-In awk Variables

  • Two types: Informative and Configuration

  • Configuration

    FS = Input field separator

    OFS = Output field separator

    (default for both is space " ")

    RS = Input record seperator

    ORS = Output record seperator

    (default for both is newline "\n")


Example #1: Reverse 2 columns

$ cat switch

BEGIN{FS="\t"}

{print $2 "\t" $1}

$ awk -f switch Simpsons

34Marge

32Homer

10Lisa

11Bart

01Maggie

$

NOTE: Columns separated by tabs

  • Alternatively you could do the following:

    • $ awk -F\t '{print $2 "\t" $1}' Simpsons


Example #2: Sum a column

$ cat awksum2

BEGIN { FS="\t"

sum = 0 }

{sum = sum + $2}

END { print "Done"

print "Total sum is " sum }

$

  • $ awk -f awksum2 Simpsons

  • Done

  • Total sum is 88

  • $


Example #3: Comma delimited file

$ cat names

Bill Jones,3333,M

Pam Smith,5555,F

Sue Smith,4444,F

$

  • $ awk -F, '{print $2}' names

  • 3333

  • 5555

  • 4444

  • $


Longer awk program

$ cat awkprog

BEGIN { print "Processing..." }

# print number of fields in first line

NR == 1 { print $0, NF, "fields"}

/^Unix/ { print "Line starts with Unix: ", $0 }

/Unix$/ { print "Line ends with Unix: " $0 }

# finishing it up

END {print NR " lines checked"}

$


awk program execution

$ cat datfile

First Line

Unix is great!

What else is better?

This is Unix

Yes it is Unix

Goodbye!

$

$ awk -f awkprog datfile

Processing...

First Line 2 fields

Line starts with Unix: Unix is great!

Line ends with Unix: This is Unix

Line ends with Unix: Yes it is Unix

6 lines checked

$


awk programming language syntax

if ( found == true )# if (expr)print “Found”; # {action1}else# elseprint “Not found”; # {action2}

while ( i <= 100)# while (cond){ i = i + 1;# { actions...

print i }# }


awk programming language syntax

do# do{ i = i + 1; #{ actions ...

print i }# }while ( i < 100);# while (cond);

for (i=1; i < 10; i++ ) # for (set; test; incr)

{# {

sqr = i * i;#actions

print i " squared is " sqr}# }


awk – longer example

  • Write an awk program that prints out content of a directory in the following format:

BYTESFILE

24576 copyfile

736 copyfile.c

740 copyfile.c~

24576 dirlist

989 dirlist.c

977 dirlist.c%

24576 envadv

185 envadv.c

<dir> tmp

740 x.c

Total: 73684 bytes in

9 regular files


awk example - code

$ cat awkprog

BEGIN {print " BYTES \t FILE";

sum=0; filenum=0

}

# test for lines starting with -

/^-/ { sum += $5

++filenum

printf ("%10d \t%s\n", $5, $9) }

# test for directories - line starts with d

/^d/ { print " <dir> \t", $9 }

# conclusion

END { print "\n Total: " sum" bytes in"

print " " filenum " regular files"

}

$


awk example - output

$ ls -l

total 84

drwx------ 2 small000 faculty 512 Jun 2 13:44 sub2

-rwx------ 1 small000 faculty 224 Jun 3 10:35 sumnums

-rw------- 1 small000 faculty 2 Jun 3 21:08 tab

-rw------- 1 small000 faculty 187 Jun 8 11:15 tbook

$

$ ls -l | awk –f awkprog

BYTES FILE

<dir> sub2

224sumnums

2tab

187tbook

Total: 413 bytes in

3 regular files

$


awk Handout

  • Review awk examples on handout


  • Login