1 / 33

Shell Scripting

Shell Scripting. Awk (part1). Awk Programming Language. standard unix language that is geared for text processing and creating formatted reports but it is very valuable to seismologists because it uses floating point math, unlike integer only bash, and is designed to work with columnar data

nita
Download Presentation

Shell Scripting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shell Scripting Awk (part1)

  2. Awk Programming Language • standard unix language that is geared for text processing and creating formatted reports • but it is very valuable to seismologists because it uses floating point math, unlike integer only bash, and is designed to work with columnar data • syntax similar to C and bash • one of the most useful unix tools at your command

  3. considers text files as fields (columns) and records (lines) • performs floating & integer arithmetic and string operations • uses loops and conditionals • define your own functions (subroutines) • execute unix commands within the scripts and process the results

  4. versions • awk: original awk • nawk: new awk, dates to 1987 • gawk: GNU awk has more powerful string functionality • the CERI unix system has all three. You want to use nawk. I suggest adding this line to your .cshrc file alias awk ‘nawk’ • in OS X, awk is already nawk so no changes are necessary

  5. Command line functionality • you can call awk from the command line two ways: • awk [options] ‘{ commands }’ variables infile(s) • awk –f scriptfile variables infile(s) • or you can create an executable awk script %cat << EOF > test.awk #!/usr/bin/nawk some set of commands EOF %chmod 755 test.awk %./test.awk

  6. How it treats text • awk commands are applied to every record or line of a file • it is designed to separate the data in each line into a field • essentially, each field becomes a member of an array so that the first field is $1, second field $2 and so on. • $0 refers to the entire record

  7. Field Separator • the default field separator is one or more white spaces $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 ehb Notice that the fields may be integer, floating point (have a decimal point) or strings. Nawk is generally smart enough to figure out how to use them.

  8. Field Separator • the field separator may be modified by resetting the FS built in variable Look at passwd file %head -n1 /etc/passwd root:x:0:1:Super-User:/:/sbin/sh Separator is “:”, so reset it. %awk –F”:” ‘{ print $1, $3}’ /etc/passwd root 0

  9. print • One of the most common commands used in awk scripts is print • awk is not sensitive to white space in the commands %awk –F”:” ‘{ print $1 $3}’ /etc/passwd root0 • two solutions to this %awk –F”:” ‘{ print $1 ““ $3}’ /etc/passwd %awk –F”:” ‘{ print $1, $3}’ /etc/passwd root 0

  10. any string or numeric text can be explicitly output using “” Assume a starting file like so: 1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x %awk '{print "latitude:",$9,"longitude:",$10,"depth:",$11}’ SUMA. loc latitude: -1.698 longitude: 98.298 depth: 15.0 latitude: 9.599 longitude: 92.802 depth: 30.0 latitude: 4.003 longitude: 94.545 depth: 20.0

  11. Unlike the shell AWK does not evaluate variables within strings. • The second line, for example, could not be written: • {print "$8\t$3" } • As it would print ”$8$3.” • Inside quotes, the dollar sign is not a special character. Outside, it corresponds to a field.

  12. 1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x • you can specify a newline in two ways %awk '{print "latitude:",$9;print "longitude:",$10}’ SUMA. loc %awk '{print "latitude:",$9”\n”,”longitude:",$10}’ SUMA. loc latitude: -1.698 longitude: 98.298

  13. a trick • If a field is composed of both strings and numbers, you can multiple the field by 1 to remove the string. %head test.tmp 1.5 2008/09/09 03:32:10 36.440N 89.560W 9.4 1.8 2008/09/08 23:11:39 36.420N 89.510W 7.1 1.7 2008/09/08 19:44:29 36.360N 89.520W 8.2 %awk '{print $4,$4*1}' test.tmp 36.440N 36.44 36.420N 36.42 36.360N 36.36

  14. Selective execution • awk recognizes regular expressions and conditionals, which can be used to selective execute awk procedures on certain records %awk –F”:” ‘ /root/ { print $1, $3}’ /etc/passwd#regexpr root 0 or within our example script #!/usr/bin/nawk -f /root/ { print $1}

  15. if statements are also very useful %awk –F”:” ‘ {if ($1==“root”) print $1, $3}’ /etc/passwd root 0 or within our example script { if ($1==“root”) { print $1 } • note, this particular if syntax is a bit different from your reading, which suggested %awk –F”:” ‘ $1==“root” {print $1, $3}’ /etc/passwd • the syntax I use is more explicit and more like C or perl, so I essentially have to remember less syntax

  16. Floating Point Arithmetic • awk does floating point math!!!!! • it stores all variables as strings, but when math operators are applied, it converts the strings to floating point numbers if the string consists of numeric characters • the reading calls this stringy variables

  17. Arithmetic Operators • All basic arithmetic is left to right associative • + : addition • - : subtraction • * : multiplication • / : division • % : remainder or modulus • ^ : exponent • other standard C programming operators

  18. Assignment Operators • = : set variable equal to value on right • += : set variable equal to itself plus the value on right • -= : set variable equal to itself minus the value on right • *= : set variable equal to itself times the value on right • /= : set variable equal to itself divided by value on right • %= : set variable equal to the remainder of itself divided by the value on the right • ^= : set variable equal to the itself to the exponent following the equal sign

  19. Unary Operations • A unary expression contains one operand and one operator • ++ : increment the operand by 1 • if ++ occurs after, $x++, the original value of the operand is used in the expression and then incremented • if ++ occurs before, ++$x, the incremented value of the operand is used in the expression • -- : decrement the operand by 1 • + : unary plus maintains the value of the operand, x=+x • - : unary minus negates the value of the operand, -1*x=-x • ! : logical negation evaluates if the operand is true (returns 1) or false (returns 0)

  20. Relational Operators • Returns 1 if true and 0 if false • !!! opposite of bash test command • All relational operators are left to right associative • < : test for less than • <= : test for less than or equal to • > : test for greater than • >= : test for greater than or equal to • == : test for equal to • != : test for not equal

  21. Boolean (Logical) Operators • Boolean operators return 1 for true and 0 for false • && : logical AND; tests that both expressions are true • left to right associative • || : logical OR ; tests that one or both of the expressions are true • left to right associative • ! : logical negation; tests that expression is true

  22. Unlike bash, the comparison and relational operators don’t have different syntax for strings and numbers. ie: == in awk rather than == or eq using test

  23. Comparison Operators • ~ : pattern match • !~ : pattern does not match • && : logical AND • || : logical OR • == : equals (numeric or string) • != : does not equal (numeric or string)

  24. Built-In Variables • FS: Field Separator • NR: record number is another useful built-in awk variable • it takes on the current line number, starting from 1 %awk –F”:” ‘ {if (NR==1) print $1, $3}’ /etc/passwd root 0 • this is useful when headers are present in a file

  25. RS : record separator specifies when the current record ends and the next begins • default is “\n” or newline • useful option is “”, or a blank line • OFS : output field separator • default is “ “ or a whitespace • ORS : output record separator • default is a “\n” or newline

  26. NF : number of fields in the current record • think of this as awk looking ahead to the next RS to count the number of fields in advance • FILENAME : stores the current filename • OFMT : output format for numbers • example OFMT=“%.6f” would make all numbers output as floating points

  27. Accessing shell variables in nawk 3 methods to access shell variables inside a nawk script ...

  28. 1. Assign the shell variables to awk variables after the body of the script, but before you specify the input file VAR1=3 VAR2=“Hi” awk '{print v1, v2}' v1=$VAR1 v2=$VAR2 input_file 3 Hi Note that I am sneaking in the concept of awk variables here (v1,v2)

  29. There are a couple of constraints with this method • Shell variables assigned using this method are not available in the BEGIN section • If variables are assigned after a filename, they will not be available when processing that filename • awk '{print v1, v2}' v1=$VAR1 file1 v2=$VAR2 file2 • In this case, v2 is not available to awk when processing file1.

  30. Also note: awk variables are referred to by just their name (no $ in front) awk '{print v1, v2, NF, NR}' v1=$VAR1 file1 v2=$VAR2 file2

  31. 2. Use the -v switch to assign the shell variables to awk variables. This works with nawk, but not with all flavours of awk. nawk -v v1=$VAR1 -v v2=$VAR2 '{print v1, v2}' input_file

  32. 3. Protect the shell variables from awk by enclosing them with "'" (i.e. double quote - single quote - double quote). awk '{print "'"$VAR1"'", "'"$VAR2"'"}' input_file

More Related