1 / 40

CIS 218 – Advanced UNIX

CIS 218 – Advanced UNIX. (g)awk. Overview. awk is a programming language Awk uses syntax based on grep and sed for handling numbers and text awk provides field level addressability. And within a field (word) using substring commands awk works field by field. awk command syntax.

alice
Download Presentation

CIS 218 – Advanced UNIX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 218 – Advanced UNIX (g)awk

  2. Overview • awk is a programming language • Awk uses syntax based on grep and sed for handling numbers and text • awk provides field level addressability. And within a field (word) using substring commands • awk works field by field

  3. awk command syntax • There are two ways to execute an awk program/script: • awk [-F field-separator] ‘program’ target-file • awk [-F field-separator] -f program.file target • From our discussion of sed, and Refrigerator Rule No. 5, I would hope you are firmly committed to the second form!

  4. awk Variables • There are a number of awk variables that are very useful • FS (The field separator, defaults to white space) • OFS (Output field separator, can be critical) • NR (Number of records, a sequential counter) • NF (Number of fields in the current record) • FILENAME (Name of the current target file)

  5. awk Variables (cont.) • $0 (The entire line as read from the target file) • $n (Where n is the nth field in the record. This is how we get field level addressability in awk) • nawk, gawk, etc give us more variables, the most significant two are: • ARGC (the count of the command line arguments) • ARGV (an array of the command line arguments)

  6. Parts of a program • All programs are composed of one or more of the following three constructs: • sequence (a series of instructions, one following the next, executed sequentially) • selection (the ability of the code to decide which instructions to execute, conditional execution) • iteration (adding looping so that selected code will be repeated over an over)

  7. awk Program Format • Awk programs are composed of pattern {action} pairs (actions must be enclosed in French braces {} ) • a pattern without a corresponding action takes the default action, print $0 • an action without a corresponding pattern is applied to every line • each input line is submitted to every pattern/action pair

  8. awk Program Format (cont.) • Placement of the open French brace is critical • pattern { both patterns are action 1 executed for lines action 2 matching the pattern } • pattern lines matching the pattern {action 1 are printed, and both action 2 actions are performed on } every line!

  9. Patterns • In an awk program, the pattern is the selection tool that decides what actions are applied to which lines. • Patterns can be: • relational expressions • regular expressions • magic patterns

  10. Relational Expression patterns

  11. Regular Expression patterns • Must be enclosed in slashes /RE/ • Anchors apply to the entire line if they are used as the only pattern • Remember, you can use regular expressions in relational patterns with ~ and !~ to apply them to fields • Both true regular expressions and fixed patterns can be used as REs in awk

  12. Pre/Post Processing • There are two in awk: • BEGIN {the action associated is performed before the target file is opened} • END {the action associated is performed after the target file is successfully closed} • Both are coded in UPPER CASE

  13. # comments • Like most scripting languages # indicates a comment • awk scripts should be well documented • Comments should explain what you are doing and why.

  14. print • The print command is the simplistic output tool for awk. Basically and “echo”/ • You can direct print to send its data to a file with the > operator • Generally print is used for simple output or debugging output

  15. printf • Similar in concept to the “C” language command. The format of a printf command is: printf (“formatting string”,variables) • The formatting characters correspond to the variables one for one in both lists. • Each formatting character is prefixed by %

  16. printf (cont.) • The formatting specifiers contain then following characters: • - indicates that the data should be left justifed • n indicates the minimum width of the field • .n indicates the maximum width of the field “%-5s” indicates a string field, left justified, of width 5 bytes

  17. printf formatting characters

  18. printf spacing characters • There are two characters available to change the spacing of your text: • \n inserts a newline character. You must use this if you want your output to occur on successive lines. • \t inserts a tab character

  19. getline • getline is used to read from the keyboard • It can also capture the results of a command but this form is seldom used • Read from the keyboard using getline variable < “/dev/tty” • If you don’t supply a variable, awk will use $0, so in most cases you want to use a variable.

  20. rand() srand() • The rand() function generates pseudo-random numbers in the range 0 - 1. • Given the same seed, it will always generate the same series of numbers. • srand() is used to supply a new seed to rand(). • If you don’t supply srand() a value, it uses the current time as the seed.

  21. system() • The system() function allows you to execute system commands within an awk script. • You must enclose the system command in quotation marks. • You cannot capture the output from the system() function within the script but you can capture the return code.

  22. length() • The length([argument]) function returns the length of the argument in bytes. • If you give length() a number, it will return the number of digits in the number. • If you don’t give length() an argument, it will use $0 by default.

  23. index() • The index(string,target) function returns the position of the first occurrence of the target within the string. • The index() function is often used to set the boundary for the substr() function.

  24. substr() • The substr(string,start[,length]) function will return the part of the string beginning with start and continuing for length bytes. • If you don’t give it a length, it will return all the bytes between the start and the end of the string.

  25. split() • You will use split(string, array[, separator]) to divide a string into parts using separator to parse them, storing the resultant parts in the array. • If you don’t code a separator, the function will use the field separator to parse the string.

  26. if • Besides using patterns, if gives us another way to perform selection • The format of an if statement is if (condition) {verb(s)} [else { verb(s)}] • If you have more than one verb, they must be enclosed in French braces.

  27. if conditions

  28. if • A sample if

  29. exit • The input file is closed • Control is transferred to the action associated with the END magic pattern if there is one • Generally used as a bailout in case of catastrophic errors

  30. for loop • This is a counted loop • executes until the counter reaches the target value • Increment (count up) or decrement (count down) • also works with the elements of an array • multiple verbs must be enclosed in { }

  31. for loop example

  32. while loop • The while loop is an example of conditional execution • The loop cycles as long as the condition specified is true • A while loop always checks to see if it should execute • multiple verbs must be enclosed in { }

  33. while loop example

  34. do/while • Even though it has a while in it, this is an example of until logic. • Until logic is shunned by conscientious coders. • ‘nuff said

  35. break • Used to exit from a loop • Control is passed to the line following the end of the loop • Causes an exit from the loop but NOT the awk script. If you want to bail out of the whole script, use the exit command.

  36. break example

  37. continue • Causes awk to skip the rest of the body of the loop for the current value • In a for loop the counter is incremented, and the next cycle of the loop is started • In a while loop, the next iteration of the loop starts

  38. continue example

  39. next • Causes the script to start over • takes the next element from standard input or the target file • Like exit, this command effects the whole script

  40. next example

More Related