340 likes | 477 Views
Time to talk about your class projects!. Shell Scripting. Awk (lecture 2). Basic structure of AWK use The essential organization of an AWK program follows the form: pattern { action } The pattern specifies when the action is performed. Like most UNIX utilities, AWK is line oriented.
 
                
                E N D
Shell Scripting Awk (lecture 2)
Basic structure of AWK use The essential organization of an AWK program follows the form: pattern { action } The pattern specifies when the action is performed.
Like most UNIX utilities, AWK is line oriented. That is, the pattern specifies a test that is performed with each line read as input. If the condition is true, then the action is taken. The default pattern is something that matches every line. This is the blank or null pattern.
Program syntax • BEGIN { } : the begin block contains all modifications to built-in variables and anything you want done before awk procedures are implemented • { }: list of procedures carried out on all lines • END { } : the end block contains all final calculations or print summaries
As you might expect, these two words specify actions to be taken before any lines are read, and after the last line is read. The AWK program: BEGIN { print "START" } { print } END { print "STOP" } adds one line before and one line after the input file.
Example: #!/usr/bin/nawk -f BEGIN { FS=“:” #the –F of the command line becomes FS in a script } { print $1} END { print “Finished working on this file” } %chmod 755 example.awk %./example.awk /etc/passwd | tail noaccess nobody4 Finished working on this file
Input file: Jimmy the Weasel 100 Pleasant Drive San Francisco, CA 12345 Big Tony 200 Incognito Ave. Suburbia, WA 67890 Cousin Vinnie Vinnie's Auto Shop 300 City Alley Sosueme, OR 76543 Awk script: #!/usr/bin/awk –f BEGIN { FS="\n" RS="" ORS="" } { x=1 while ( x<NF ) { print $x "\t" x++ } print $NF "\n" }
Looping Constructs • awk loop syntax are very similar to C and perl • while: continues to loop as long as condition exited successfully while ( x==y ) { commands }
do/while • do the following set of commands, while condition is true do { commands } while ( x==y ) • The difference between while and do/while is when the condition is tested. It is tested prior to running the commands for a while loop, but tested after the set of commands is run once in a do/while loop
for loops • one of the most common loop structures is the for loop, which iterates over an array of objects for ( x=1; x<=NF; x++) { #in awk, arrays start at 1 commands } * if you take anything away from this lecture, memorize the above for loop syntax
break and continue • break: breaks out of a loop • continue: restarts at the beginning of the loop x=1 while (1) { if ( x == 4 ) { x++ continue } print "iteration",x if ( x > 20 ) { break } x++ }
if/else/else if • if loops work much like they did in bash but the syntax is a bit different (no then or fi) if ( conditional1 ) { commands } else if ( conditional2 ) { #optional commands } else { #optional commands } • you can have an if loop without an else if or else, but you can’t have an else if or else without an if
Arrays • array indices start at 1 (in most computer programming languages, except fortran and matlab, arrays start at 0) • mis-indexing arrays is one of the most common bugs in any code • arrays are commonly indexed by numbers, but in awk, they can be indexed by strings
to explicitly set an array element, use brackets to specify which index of the array you are setting myarray[1]=“jim” #note, strings appear in quotes myarray[2]=456 or myarray[“name”]=“jim” #index strings appear in quotes too
to reference an array element, use brackets to specify what index you want for ( x in myarray ) { print myarray[x] } #x gets set to an index variable by use of the in function, but the access order of the index variables is random
to delete an array element, use the delete command delete myarray[1] • to test if an element exists, use a if loop for ( 1 in myarray ) { print “It’s there” } else { print “It’s missing” }
you can also set arrays using the split command split(“string”,destinationarray,separator) • split returns the number of indices numelements=split("Jan,Feb,Mar,Apr,May",mymonths,",") so that numelements=5 and mymonths[1]=“Jan”
Formatted output • printf : the formatted print function returns with the standard C syntax %s specifies strings %d specifies integers %f specifies floating point values printf(“%s %s version %d\n”, “Hello”, “world”, 2) Hello world version 2
you can control how many spaces are reserved for the formatted print (%) by adding numbers %10s - 10 character string print %5d - reserves 5 spaces for the integer %10.2f - reserves 10 spaces for the float and prints only to the 100ths value 9.05 • the default format is right justified. To make formatted text left justified, add a – after the % %-10.2f becomes 9.05
sprintf sends formatted print to a string variable rather to stdout n=sprintf ("%d plus %d is %d", a, b, a+b);
Sub-strings • substr : allows you to cut specific characters from strings. • this function also available in C and perl • substr(string,startcharacter,numberofcharacters) oldstring=“How are you?” newstr=substr(oldstring,9,3) What is newstr in this example?
Other string functions • length : returns the number of characters in a string length(oldstring) returns 12 • index : returns the start character of the one string in another index(oldstring,”you”) returns 9 • tolower/toupper : converts string to all lower or to all upper case
subroutines (aka functions) • Format -- "function", then the name, and then the parameters separated by commas, inside parentheses. • "{ }" code block contains the code that you'd like this function to execute. • function monthdigit(mymonth) { • return (index(months,mymonth)+3)/4 • }
nawk provides a "return" statement that allows the function to return a value. function monthdigit(mymonth) { return (index(months,mymonth)+3)/4 } This function converts a month name in a 3-letter string format into its numeric equivalent. For example, this: print monthdigit("Mar") ....will print this: 3
What does this do? index(months,mymonth) Built-in string function index, returns the starting position of the occurrence of a substring (the second parameter) in another string (the first paramater), or it will return 0 if the string isn't found.
months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec” 000000000111111111122222222223333333333444444444 123456789012345678901234567890123456789012345678  print index(months,”Aug”) 29 To get the number associated with the month (based on the string with the 12 months) add 3 to the index (29+3=32) and divide by 4 (32/4=8, Aug is 8th month). The string months was designed so the calculation gave the month number.
Matching Regular Expressions • match : search for a regular expression, set the built-in variables RSTART to start character and RLENGTH to the matched string length • match returns the start character by default start=match(oldstring,/you/) #note, regexp format print start RSTART RLENGTH 9 9 3
String substitution • sub and gsub : serve as single search and replace or global search and replace functions that work with regular expressions sub(regexp,replacestring,oldstring) sub(/o/,"O",oldstring) #this changes the given string print oldstring oldstring="How are you doing today?" gsub(/o/,"O”,oldstring) print oldstring HOw are you doing today? HOw are yOudOingtOday?
Example Script Input file: 23 Aug 2000 food - - Y Jimmy's Buffet 30.25 23 Aug 2000 - inco - Y Boss Man 2001.00 Note, there are tabs between the fields, which you can’t really see with this screen copy
#!/usr/bin/awk -f BEGIN { #set global variables and built-in functions FS="\t+" months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec" } function monthdigit(mymonth) { #set subroutines (aka functions) return (index(months,mymonth)+3)/4 } function doincome(mybalance) { mybalance[curmonth,$3] += amount mybalance[0,$3] += amount } function doexpense(mybalance) { mybalance[curmonth,$2] -= amount mybalance[0,$2] -= amount } function dotransfer(mybalance) { mybalance[0,$2] -= amount mybalance[curmonth,$2] -= amount mybalance[0,$3] += amount mybalance[curmonth,$3] += amount }
#main program { curmonth=monthdigit(substr($1,4,3)) amount=$7 #record all the categories encountered if ( $2 != "-" ) globcat[$2]="yes" if ( $3 != "-" ) globcat[$3]="yes" #tally up the transaction properly if ( $2 == "-" ) { if ( $3 == "-" ) { print "Error: inc and exp fields are both blank!" exit 1 } else { #this is income doincome(balance) if ( $5 == "Y" ) doincome(balance2) }
} else if ( $3 == "-" ) { #this is an expense doexpense(balance) if ( $5 == "Y" ) doexpense(balance2) } else { #this is a transfer dotransfer(balance) if ( $5 == "Y" ) dotransfer(balance2) } } #end of main program END { bal=0 bal2=0 for (x in globcat) { bal=bal+balance[0,x] bal2=bal2+balance2[0,x] } printf("Your available funds: %10.2f\n", bal) printf("Your account balance: %10.2f\n", bal2) }
Input file: 23 Aug 2000 food - - Y Jimmy's Buffet 30.25 23 Aug 2000 - inco - Y Boss Man 2001.00 Output to the screen: Your available funds: 1174.22 Your account balance: 2399.33