1 / 51

AWK

AWK. Tip of the Day. You have a file that you misplaced and want to find it quickly You want to use the command find. find command example. % find ~ -name “final_exam.txt” Above command will search for the file “final_exam.txt” in all subdirectories under your home directory

willow
Download Presentation

AWK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AWK

  2. Tip of the Day • You have a file that you misplaced and want to find it quickly • You want to use the command find

  3. find command example % find ~ -name “final_exam.txt” • Above command will search for the file “final_exam.txt” in all subdirectories under your home directory • When found, it will print out the full path file name

  4. Another find command example % find . -name “*.pro” -ls • Above command will search for all your IDL files in all subdirectories under your current working directory • When found, it will print out an ls listing of the file

  5. Yet another find command example % find / -name “*junk*” -exec rm {} \; • Above command will search for all files in the entire directory tree that contains the pattern junk in the file name • When found, the file will be deleted

  6. Aho, Weinberger, Kernighan AWK - is a text processing utility that can efficiently process and extract text data with minimal programming

  7. Example #1 - I have a column of numbers (input.dat) that require conversion, e.g., square root. Centigrad to Fahrenheit, etc. 1 2 … 100

  8. Solutions • You can write an IDL or C program to do this. • Transfer the data over to a spreadsheet • Or write a one line awk program

  9. How Does AWK Work? • Awk is based on the concept of pattern matching • Think of AWK as a filter program • Looks for key “patterns” and process records matching patterns.

  10. Syntax of AWK /pattern/ {action}

  11. Simplest AWK program % gawk ‘{print $0}’ input.dat This simply prints out (echoes) the output file

  12. To take the square root % gawk ‘{print sqrt($0)}’ input.dat

  13. If you have two columns of data and you want to add them up % gawk ‘{print $1+$2}’ input.dat

  14. Meaning of the fields $0 - represent the entire input line $1 - represent the first field $2 - represents the second field Etc. NF - number of fields NR - record number

  15. Metacharacters and Patterns

  16. Patterns and Regular Expression • In UNIX, some metacharacters such as *, ?, \, and many more are used to create what are known as “regular expressions” • Regular expressions are concise means of specifying a pattern template • Its most common use is found in matching filenames in a directory structure. • Following examples will illustrate this

  17. Metacharacter Patterns(Most often used for filename matching) The asterisk character * Indicates matching a pattern of zero or more characters Example: % ls * [will match all files in directory] Example: % ls *dat [will match files such as “data.dat”, ”dat”,”1dat”, “1.dat”] The question mark character ? Indicates matching a pattern of exactly one character Example: % ls ? [will match files such as “0”, “1”, “a”, ”z”] Example: % ls ?dat [will match files such as “1dat”, “bdat”]

  18. More Complex Patterns The pattern construct [0-9] or [a-z] Indicates a single character match constrained to digits Example: % ls [a-z] [will match files “a”, “b”, “c”, “z”] Example: % ls [0-9]dat [will match files “0dat”, “5dat”, “9dat”] The pattern construct {larry, moe, curly} Indicates explicit pattern entries with which to match Example: % ls {larry,moe,curly}.mail [will match the files “larry.mail”, “moe.mail”, “curly.mail”]

  19. What if I want to match to an actual metacharacter? • To match to an actual metacharacter, we need to tell UNIX that the following character should be interpretted literally rather than as a metacharacter • This is accomplished using the “\” character • For example, if we have the filename “*.dat” in our directory and we want to remove it • DO NOT GIVE THE COMMAND % rm *.dat • INSTEAD % rm \*.dat

  20. Pattern Matching in other UNIX Utilities • Although our examples are in the context of file matching, be aware that pattern matching is prevalent in many UNIX utilities such as • vi (text editor) • sed (batch stream text editor) • awk (text processing utility) • shell programming • There will be slight differences between each utility, but the concept is the same • Proficiency with patterns can be gained with the following exercises

  21. Pattern Exercise (Set 1) • In the directory ~rvrpci/public_html/simg726/20012/patterns are several filenames • Given the following patterns, list what filenames are selected (e.g. using ls) and explain why • Each of these patterns will evoke different results and you will need to study each one to understand any subtleties

  22. image.{red,grn,blu} image.* image* image. image[0-9] image?[0-9] image?[a-c] image?[a-z] image.[0-9] image?* image? image.? image.?? image?? image??? image.? image.\? image{1,3,5,7} image{0,1}0 Pattern Exercise (Set 2)

  23. Review of patterns? * - matches all patterns ? - matches a single character [0-9] - matches a single character that is a number [A-Z] - matches a single character that is an upper case letter.

  24. Matching /pattern/ - tries to match the pattern /^pattern/ - makes sure the pattern starts at the beginning of a line /pattern$/ - end of a line $1 ~ /pattern/ - tries to match the first field to a pattern $1 !~ /pattern/ - tries to NOT match the first field to a pattern

  25. Suppose you had headers on the top of your file which you wanted to ignore % gawk ‘/[0-9]/ {print $0}’ input.dat

  26. Removing comments # gawk '$0 !~ /^#/’ • Above works for # at the beginning of line gawk '$0 !~ /^ *#/’ • Better Pattern • Works for # at the beginning of line when preceded by whitespace

  27. Water Quality Samples MISI Image example at 2000'AGL4'pixel 4 4 4 ID Chlor SS CDOM B1 P1 P2 Legend MISI flight area Boston Whaler canoe kayak pier/bridge truth panels Pier Team radiometer thermistors secchi depth water samples 4 4 ASD Truth Panels Real Life Problem 1:ASD Spectra Conversion

  28. Conversion of wavelength units from nanometers to microns for a spectral file (water.ref) 400.350 0.0509975 410.170 0.0502359 419.990 0.0474999 … 683.900 0.0215759 693.440 0.0214323 702.980 0.0213168

  29. Conversion AWK script % gawk ‘{print $1/1000.0, $2}’ water.ref > water.ref.microns

  30. What if you have multiple files Water_0001.ref Water_0002.ref … Soil_0001.ref Soil_0002.ref … Cement_1000.ref

  31. How do we repeatedly apply the AWK script • We would use the foreach UNIX statement. • The form of the foreach statement % foreach shell_variable (regular_expression) unix_statements unix_statements … unix_statments end

  32. Processing only the water files % foreach i (water*.ref) foreach? echo “Processing $i” foreach? gawk ‘{print $1/1000.0, $2}’ $i > $i.microns foreach? end

  33. Renaming a set of files • Suppose you had a set of files Water_0001.ref.microns Water_0002.ref.microns … Water_0100.ref.microns • You want to rename them back to Water_0001.ref Water_0002.ref

  34. We need tools to extract file name components • Given the sample file water_0001.ref.microns • Need to extract the file name extension(s) .ref.microns .microns • Need to extract the file name base Water_0001

  35. Shell Filename Modifiers h Remove a trailing pathname component, leaving only the head. r Remove a trailing suffix of the form .xxx, leaving the basename. e Remove all but the trailing suffix. t Remove all leading pathname components, leaving the tail.

  36. Sample output of the shell modifiers % set a=/usr/tmp/water_00001.ref.microns % echo $a /usr/tmp/water_00001.ref.microns % echo $a:h /usr/tmp % echo $a:r /usr/tmp/water_00001.ref % echo $a:e microns % echo $a:t water_00001.ref.microns

  37. Renaming the water files % foreach i (water*.microns) foreach? echo “Renaming $i to $i:r” foreach? mv $i $i:r foreach? end

  38. foreach statement can extract elements of a shell variable % set a='0.0 0.1 0.2' % foreach i ($a) foreach? echo $i foreach? end 0.0 0.1 0.2

  39. Real Life Problem 2:MODTRAN Output • How do you extract a single value out of a 40 page output? 1 ***** MODTRAN 3.5 Version 1.1 Jan 97 ***** 0 CARD 1 *****t0 7 2 2 1 0 0 0 0 0 0 1 1 0 0.000 0.00 0 CARD 1B *****T 8F 0 360.000 0 CARD 2 ***** 1 1 0 0 0 0 30.00000 0.00000 0.00000 0. 00000 0.31500 0 GNDALT = 0.31500 0 CARD 2C ***** 15 0 0AUG01 MODEL ATMOSPHERE NO. 7 ICLD = 0 MODEL 0 / 7 USER INPUT DATA 0.315 9.842E+02 3.230E+01 7.545E-01 0.000E+00 0.000E+00 ABD2222222 22222 0.554 9.581E+02 2.720E+01 6.765E-01 0.000E+00 0.000E+00 ABD2222222 2

  40. What do we want? • “H2O” value Z P T REL H H2O CLD AMT RAIN RATE AEROSOL (KM) (MB) (K) (%) (GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE 0.315 984.200 305.45 2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL 0.554 958.100 300.35 2.60 6.765E-01 0.000E+00 0.000E+00 RURAL … H2O O3 CO2 CO CH4 N2O ( ATM CM ) 2.2208E+02 1.3433E-01 2.6589E+02 8.2446E-02 1.1924E+00 2.2553E-01 … Z P T REL H H2O CLD AMT RAIN RATE AEROSOL (KM) (MB) (K) (%) (GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE 0.315 984.200 305.45 2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL 0.554 958.100 300.35 2.60 6.765E-01 0.000E+00 0.000E+00 RURAL

  41. What do we know? • We know that the value we want has the table name “H2O” in the first field. Z P T REL H H2O CLD AMT RAIN RATE AEROSOL (KM) (MB) (K) (%) (GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE 0.315 984.200 305.45 2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL 0.554 958.100 300.35 2.60 6.765E-01 0.000E+00 0.000E+00 RURAL … H2O O3 CO2 CO CH4 N2O ( ATM CM ) 2.2208E+02 1.3433E-01 2.6589E+02 8.2446E-02 1.1924E+00 2.2553E-01 … Z P T REL H H2O CLD AMT RAIN RATE AEROSOL (KM) (MB) (K) (%) (GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE 0.315 984.200 305.45 2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL 0.554 958.100 300.35 2.60 6.765E-01 0.000E+00 0.000E+00 RURAL

  42. Using grep to help analyze pattern % grep H2O output.tp6 Z P T REL H H2O CLD AMT RAIN RATE AEROSOL I Z P H2O O3 CO2 CO CH4 N2O O2 NH3 NO NO2 SO2 HNO3 1 J Z H2O O3 CO2 CO CH4 N2O O2 NH3 NO NO2 SO2 H2O O3 CO2 CO CH4 N2O 1 J Z H2O O3 CO2 CO CH4 N2O O2 NH3 NO NO2 SO2 H2O O3 CO2 CO CH4 N2O

  43. Need to Identify Unique Pattern Property • Several H2O’s in the file • Desired record is in the first column • Need to specify “first column”-only matches $1 ~ /H2O/

  44. Need to skip to the value and extract the value • Based on the following pattern H2O O3 CO2 CO CH4 N2O ( ATM CM ) 2.2208E+02 1.3433E-01 2.6589E+02 8.2446E-02 1.1924E+00 2.2553E-01 • We need to “skip” to the third line and get the first record • This can be accomplished by the getlinecommand

  45. Putting it all together gawk '$1 ~ /H2O/ { getline; getline; getline; \ print ($1*18.015/22413.83) }’ input_modtran.dat • Action is a unit conversion of water vapor value print ($1*18.015/22413.83)

  46. Can be made into a shell script (get_water_vapor.csh) #!/bin/csh gawk '$1 ~ /H2O/ { getline; getline; getline; \ print ($1*18.015/22413.83) }' $1

  47. From within IDL IDL> spawn, ‘get_water_vapor.csh input.dat’, results

  48. Stripping Out Comments in IDL

  49. What is this file? 400.350 0.0509975 410.170 0.0502359 419.990 0.0474999 … 683.900 0.0215759 693.440 0.0214323 702.980 0.0213168

  50. Commented File # Water reflectance data file # ASD Reflectance May 20, 1999 11:31 PM # Local Time # Wavelength [Nanometers] Reflectance # [unitless] 400.350 0.0509975 410.170 0.0502359 419.990 0.0474999 … 702.980 0.0213168

More Related