The awk utility
Download
1 / 52

The awk Utility - PowerPoint PPT Presentation


  • 155 Views
  • Uploaded on

CS465 - Unix. The awk Utility. Background. awk was developed by Aho, Weinberger, and Kernighan (of K & R) Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions awk - original version nawk - new awk - improved awk

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The awk Utility' - foy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The awk utility

CS465 - Unix

The awk Utility


Background
Background

  • awk was developed by

    • Aho, Weinberger, and Kernighan (of K & R)

    • Was further extended at Bell Labs

  • Handles simple data-reformatting jobs easily with just a few lines of code.

  • Versions

    • awk - original version

    • nawk - new awk - improved awk

    • gawk - gnu awk - improved nawk


How awk works
How awk works

  • awk commands include patterns and actions

    • Scans the input line by line, searching for lines that match a certain pattern (or regular expression)

    • Performs a selected action on the matching lines

  • awk can be used:

    • at the command line for simple operations

    • in programs or scripts for larger applications


Running awk
Running awk

  • From the Command Line:

    $ awk '/pattern/{action}' file

  • OR From an awk script file:

    $ cat awkscript

    # This is a comment

    /pattern/ {action}

    $ awk –f awkscript file


Awk s format using input from a file
awk’s Format using Input from a File

$ awk /pattern/ filename

  • awk will act like grep

    $ awk '{action}' filename

  • awk will apply the action to every line in the file

    $ awk '/pattern/ {action}' filename

  • awk will apply the action to every line in the file that matches the pattern





Records and fields

record 1 -> George Jones Admin

record 2 -> Anthony Smith Accounting

Records and Fields

  • Each record is split into fields, delimited by a special character (whitespace by default)

    • Can change delimeter with –F or FS

  • awk divides the input into records and fields

    • Each line is a record (by default)

field-1 field-2 field-3

| | |

v v v


Awk field variables
awk field variables

  • awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script).

    • $1 is the first field, $2 is the second…

    • $0 is a special field which is the entire line

    • NF is always set to the number of fields in the current line (no dollar sign to access)


Example 13
Example #1

$ cat students

Bill White 7777771 1980/01/01 Science

Jill Blue 1111117 1978/03/20 Arts

Ben Teal 7171717 1985/02/26 CompSci

Sue Beige 1717171 1963/09/12 Science

$

$ awk '/Science/{print $1, $2}' students

Bill White

Sue Beige

$

  • Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated):

    • $ awk '/Science/{print $1 $2}' students

    • BillWhite

    • SueBeige


Example 2
Example #2

$ cat phonelist

Joe Smith 774-0888

Mary Jones 772-2345

Hank Knight 494-8888

$

$ awk '{print "Name: ", $1, $2, \

" Telephone:", $3}' phonelist

Name: Joe Smith Telephone: 774-0888

Name: Mary Jones Telephone: 772-2345

Name: Hank Knight Telephone: 494-8888

$

  • No pattern given, so matches ALL lines

  • Text strings to print are placed in double quotes


Example 3
Example #3

Given a username, display the person’s real name:

$ grep small /etc/passwd

small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh

$

$ awk -F: '/small000/{print $5}' /etc/passwd

Faculty - Pam Smallwood

$


Awk using input from commands
awk using Input from Commands

  • You can run awk in a pipeline, using input from another command:

    $ command | awk '/pattern/ {action}'

    • Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern


Piped awk input example
Piped awk Input Example

$ w

1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00, 0.00, 0.01

User tty [email protected] idle JCPU PCPU what

pugli766 pts/8 Tue10pm 3days -ksh

lin318 pts/17 10:58am 1:45 vi choosesort

small000 pts/18 12:43pm w

mcdev712 pts/10 11:52am 14 1 vi adddata

gibbo201 pts/12 12:15pm 18 -ksh

nelso828 pts/16 7:17pm 17:43 -ksh

$

$ w | awk '/ksh/{print $1}'

pugli766

gibbo201

nelso828

$


Relational operators
Relational Operators

  • awk can use relational operators ( <, >, <=, >=, ==, !=, ! ) to compare a field to a value

    • If the outcome of the comparison is true then the the action is performed

  • Examples:

    • To print every record in the log.txt file in which the second field is larger than 10

      • $ awk '$2 > 10' log.txt

  • To print every record in the log.txt file which does NOT contain ‘Win32’

    • $ awk '!/Win32/' log.txt


Relational operator example
Relational Operator Example

$ who

pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)

lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)

small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)

mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)

gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)

nelso828 pts/16 Jun 5 19:17 (65.100.138.177)

$

$ who | awk '$4 < 6 {print $1, $3, $4, $5}'

pugli766 Jun 3 22:24

nelso828 Jun 5 19:17

$


Piping awk output
Piping awk output

$ who

pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net)

lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com)

small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net)

mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net)

gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com)

nelso828 pts/16 Jun 5 19:17 (65.100.138.177)

$

$ who | awk '$4 == 6 {print $1}' | sort

gibbo201

lin318

mcdev712

small000

$


Awk programming
awk Programming

  • awk programming is done by building a list

    • The list is a list of rules

    • Each rule is applied sequentially to each line (record)

  • Example:

    /pattern1/ { action1 }

    /pattern2/ { action2 }

    /pattern3/ { action3 }


Awk pattern matching
awk - pattern matching

  • Before processing, lines can be matched with a pattern.

    /pattern/ { action } execute if line matches pattern

    The pattern is a regular expression.

  • Examples:

    /^$/ { print "This line is blank" }

    /num/ { print "Line includes num" }

    /[0-9]+$/ { print "Integer at end:", $0 }

    /[A-z]+/ { print "String:", $0 }

    /^[A-Z]/ { print "Starts w/uppercase letter" }


Awk program from a file
awk program from a file

  • The awk commands (program) can be placed into a file

  • The –f (lowercase f) indicates that the commands come from a file whose name follows the –f

    $ awk –f awkfile datafile

    The contents of the file called awkfile will be used as the commands for awk


Example 14
Example 1

$ cat students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

Sue Beige 555777 1963/09/12 Science

$ cat awkprog

/5?5/ {print $1, $2}

/3*4/ {print $5}

$

$ awk –f awkprog students

Arts

Bill Teal

Sue Beige

$

**NOTE: All patterns applied to each line before moving to next line


Example 21
Example 2

$ cat students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

Sue Beige 555777 1963/09/12 Science

$ cat awkprog

/Science/ {print "Science stu:", $1, $2}

/CompSci/ {print "Computing stu:", $1, $2}

$

$ awk –f awkprog students

Science stu: Bill White

Computing stu: Bill Teal

Science stu: Sue Beige

$


More about patterns
More about Patterns

  • Patterns can be:

    • Empty: will match everything

    • Regular expressions:

      /reg-expression/

    • Boolean Expressions:

      $2=="foo" && $7=="bar"

    • Ranges:

      /jones/,/smith/


Example boolean expressions
Example - Boolean Expressions

$ cat students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

Sue Beige 555777 1963/09/12 Science

$ cat awkprog

$3 <= 444444 {print "Not counted"}

$3 > 444444 {print $2 ",", $1}

$

$ awk –f awkprog students

Not counted

Not counted

Teal, Bill

Beige, Sue

$


Example ranges
Example - Ranges

$ cat students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

Sue Beige 555777 1963/09/12 Science

$

$ awk '/333333/,/555555/' students

Bill White 333333 1980/01/01 Science

Jill Blue 333444 1978/03/20 Arts

Bill Teal 555555 1985/02/26 CompSci

$


More built in awk variables
More Built-In awk Variables

  • Two types: Informative and Configuration

  • Informative:

    NR = Current Record Number (start at 1)

    • Counts ALL records, not just those that match

      NF = Number of Fields in the Current Record

      FILENAME = Current Input Data File

    • Undefined in the BEGIN block


Example using nf
Example using NF

$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$

$ awk '{print NF}' names

3

4

2

0

$


Example using a boolen nf and nr
Example using a boolen, NF, and NR

$ cat names

Pam Sue Laurie

Bob Joe Bill Dave

Joan Jill

$

$ awk 'NF > 2 {print NR ":", NF, "fields"}' names

1: 3 fields

2: 4 fields

$


Built in awk functions
Built-in awk functions

log(expr) natural logarithm

index(s1,s2) position of string s2 in string s1

length(s) string length

substr(s,m,n) n-char substring of s starting at m

tolower(s) converts string to lowercase

printf() print formatted - like C printf



Print printf
print & printf

  • Use print in an awk statement to output specific field(s)

  • printf is more versatile

    • works like printf in the C language

    • May contain a format specifier and a modifier


Format specification
Format Specification

  • A format specification consists of a percent symbol, a modifier, width and precision values, and a conversion character

  • To display the third field as a floating point number with two decimal places:

    awk '{printf("%.2f\n", $3)}' file

  • You can include additional text in the printf statement

    '{printf ("3rd value: %.2f\n", $3)}'


Specifiers width precision modifiers

Type Specifiers:

%c Single character

%d integer (decimal)

%f Floating point

%s String

Between the % and the specifier you can place the width and precision

%6.2f means a floating point number in a field of width 6 in which there are two decimal places

Modifiers control details of appearance:

- minus sign is the left justification modifier right justification)

+ plus sign forces the appearance of a sign (+,-) for numeric output

0 zero pads a right justified number with zeros

Specifiers, Width, Precision, & Modifiers


Awk variables
awk Variables

  • Variables

    • No need for declaration

      • Implicitly set to 0 AND the Empty String

    • Variable type is a combination of a floating-point and string

    • Variable is converted as needed, based on its use

      title = "Number of students"

      no = 100

      weight = 13.4



Awk program execution
awk program execution

Executes only once before

reading input data

BEGIN { ….}

{

….}

specification {

…..

}

END {

…..

}

Executes for each input line

Executes for each input linethat matches specified /pattern/

or Boolean expression

Executes at the end after all

lines being processed


Example 1 count lines in file
Example #1: Count # lines in file

  • $ cat awkprog

  • BEGIN {total = 0}

  • {total = total + 1}

  • END {print total " lines"}

  • $ cat testfile

  • Hello There

  • Goodbye!

  • $

- Set total to 0 before processing any lines

- For every row in the file, execute {total = total + 1}

- Print total after all lines processed.

  • $ awk –f awkprog testfile

  • 2 lines

  • $


Ex 2 count lines containing a pattern
Ex #2: Count lines containing a pattern

{totalpattern++} only executes if the line in filename has pattern appearing in the line.

$ cat Simpsons

Marge 34

Homer 32

Lisa 10

Bart 11

Maggie 01

$ cat countthem

BEGIN {totalMa = 0; totalar = 0}

/Ma/ { totalMa++ }

/ar/ { totalar++ }

END { print totalMa " Ma's"

print totalar " ar's"}

$

$ awk -f countthem Simpsons

2 Ma's

2 ar's

$


Example 3 add line numbers
Example #3: Add line numbers

$ cat numawk

BEGIN { print "Line numbers by awk" }

{ print NR ":", $0 }

END { print "Done processing " FILENAME }

$ cat testfile

Hello There

Goodbye!

$

  • $ awk –f numawk testfile

  • Line numbers by awk

  • 1: Hello There

  • 2: Goodbye!

  • Done processing testfile

  • $


More built in awk variables1
More Built-In awk Variables

  • Two types: Informative and Configuration

  • Configuration

    FS = Input field separator

    OFS = Output field separator

    (default for both is space " ")

    RS = Input record seperator

    ORS = Output record seperator

    (default for both is newline "\n")


Example 1 reverse 2 columns
Example #1: Reverse 2 columns

$ cat switch

BEGIN {FS="\t"}

{print $2 "\t" $1}

$ awk -f switch Simpsons

34 Marge

32 Homer

10 Lisa

11 Bart

01 Maggie

$

NOTE: Columns separated by tabs

  • Alternatively you could do the following:

    • $ awk -F\t '{print $2 "\t" $1}' Simpsons


Example 2 sum a column
Example #2: Sum a column

$ cat awksum2

BEGIN { FS="\t"

sum = 0 }

{sum = sum + $2}

END { print "Done"

print "Total sum is " sum }

$

  • $ awk -f awksum2 Simpsons

  • Done

  • Total sum is 88

  • $


Example 3 comma delimited file
Example #3: Comma delimited file

$ cat names

Bill Jones,3333,M

Pam Smith,5555,F

Sue Smith,4444,F

$

  • $ awk -F, '{print $2}' names

  • 3333

  • 5555

  • 4444

  • $


Longer awk program
Longer awk program

$ cat awkprog

BEGIN { print "Processing..." }

# print number of fields in first line

NR == 1 { print $0, NF, "fields"}

/^Unix/ { print "Line starts with Unix: ", $0 }

/Unix$/ { print "Line ends with Unix: " $0 }

# finishing it up

END {print NR " lines checked"}

$


Awk program execution1
awk program execution

$ cat datfile

First Line

Unix is great!

What else is better?

This is Unix

Yes it is Unix

Goodbye!

$

$ awk -f awkprog datfile

Processing...

First Line 2 fields

Line starts with Unix: Unix is great!

Line ends with Unix: This is Unix

Line ends with Unix: Yes it is Unix

6 lines checked

$


Awk programming language syntax
awk programming language syntax

if ( found == true ) # if (expr)print “Found”; # {action1}else # elseprint “Not found”; # {action2}

while ( i <= 100) # while (cond) { i = i + 1; # { actions...

print i } # }


Awk programming language syntax1
awk programming language syntax

do # do{ i = i + 1; #{ actions ...

print i } # }while ( i < 100); # while (cond);

for (i=1; i < 10; i++ ) # for (set; test; incr)

{ # {

sqr = i * i; # actions

print i " squared is " sqr} # }


Awk longer example
awk – longer example

  • Write an awk program that prints out content of a directory in the following format:

BYTES FILE

24576 copyfile

736 copyfile.c

740 copyfile.c~

24576 dirlist

989 dirlist.c

977 dirlist.c%

24576 envadv

185 envadv.c

<dir> tmp

740 x.c

Total: 73684 bytes in

9 regular files


Awk example code
awk example - code

$ cat awkprog

BEGIN {print " BYTES \t FILE";

sum=0; filenum=0

}

# test for lines starting with -

/^-/ { sum += $5

++filenum

printf ("%10d \t%s\n", $5, $9) }

# test for directories - line starts with d

/^d/ { print " <dir> \t", $9 }

# conclusion

END { print "\n Total: " sum" bytes in"

print " " filenum " regular files"

}

$


Awk example output
awk example - output

$ ls -l

total 84

drwx------ 2 small000 faculty 512 Jun 2 13:44 sub2

-rwx------ 1 small000 faculty 224 Jun 3 10:35 sumnums

-rw------- 1 small000 faculty 2 Jun 3 21:08 tab

-rw------- 1 small000 faculty 187 Jun 8 11:15 tbook

$

$ ls -l | awk –f awkprog

BYTES FILE

<dir> sub2

224 sumnums

2 tab

187 tbook

Total: 413 bytes in

3 regular files

$


Awk handout
awk Handout

  • Review awk examples on handout


ad