1 / 54

CIS52 – File Manipulation

CIS52 – File Manipulation. File Manipulation Utilities Regular Expressions sed, awk. Overview. comm – comparison of sorted files cut – output sections of lines in a file find – find files that match a pattern paste – merges records in files pr – paginate files into pages

trula
Download Presentation

CIS52 – File Manipulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk

  2. Overview • comm – comparison of sorted files • cut – output sections of lines in a file • find – find files that match a pattern • paste – merges records in files • pr – paginate files into pages • tr – translate or delete characters

  3. Overview • regular expressions • sed – StreamEditor (batch file editor) • awk – Aho,Weinberger,Kernighan (Pattern match)

  4. The comm before the storm • Compares 2 sorted files • Results reported in 3 columns • 1st – records found only in file 1 • 2nd – records found only in file 2 • 3rd – records that match in both files • Options remove corresponding columns • – [1] [2] [3]

  5. comm – cont. • Either file name can be substituted with standard input • Example: • File1 File2 aa bb dd cc ee dd gg ee hh ff

  6. comm results File1 File2 Both aa bb cc dd ee ff gg hh option -1 bb cc dd ee ff option -2 aa dd ee gg hh option -12 dd ee

  7. cut to the chase • Allows you to extract portions of each record in a file. • Delimits data in the file into fields or columns. • Default delimiter is the tab character • Can be changed by the –d option

  8. cut cont. • cut- [b | c | [ f [-d char] [-s] ] list [--output-delimiter=string] • b – bytes • c – characters (same as bytes) • f – fields • d – delimiter character • s– display only records with delimiters

  9. cut ! print • char – single byte used to delimit fields in a record • list – list of range/s of characters to display • Ranges are comma separated. • 1-7 first 7 characters in record • 1,7 first and seventh characters

  10. cut ! print again • string – list of characters to substitute for the delimiters.

  11. cut - Example [/@linux2 uid]$ cat file1 The quick brown fox eyed the jactitating dog [/@linux2 uid]$ cut –f1,3,5,7 –d’ ‘ file1 The brown eyed dog [/@linux2 uid]$ cut –f1,4-6,7 –d’ ‘ file1 The fox eyed the dog

  12. find that pot of gold • find – selects all files that meet the selection criteria in the expression • No action is taken unless it is specified • Sub-directories are scanned automatically • The expression can be simple or complex

  13. find me something • The criteria expression: • And’s each operand separated by a space • Or’s each operand separated by –o • Processes left to right sequentially

  14. find criteria continued • Actions • -print prints the path of all files that meet the selection criteria • -exec cmds\; executes the commands before the \: • -ok same as –exec but must have a Y from stdin.

  15. find criteria continued again • Evaluations • -type specify a type of file (ie. directory) • -atime ±naccessed ±n days ago. • -mtime ±nmodified ±n days ago. • -user uid owner of the file • -nouser uid owner is not known to system

  16. paste tastes good • paste [options] [filelist]each record in the file is merged into 1 record • -s process filelist sequentially. All records are processed before going to the next file • -d [delimiter list] each character in turn delimits the file records.

  17. paste continued [/@linux2 uid]$ cat file1 A B C [/@linux2 uid]$ cat file2 1 2 3 [/@linux2 uid]$ cat file3 x y z

  18. paste continued [/@linux2 uid]$ paste file1 file2 file3 [/@linux2 uid]$ paste –s file1 file2 file3 Output file A 1 x B 2 y C 3 z Output file A B C 1 2 3 x y z

  19. pr – public relations--NOT • pr paginate file(s) for printing • Can specify page attributes • Changed lines through the –l option • For multiple files each starts a new page

  20. pr – continued • pr paginate a file for printing • Creates a header and trailer • Changed through the –h option • Suppress through the –t option • Can create columns of data • –nbrNumber of columns per line • –SxCharacter used to separate columns

  21. pr – continued • Can create numbers for each line • –nck • c - character data separator default is tab character • k – number of digits

  22. Regular Expressions • A set of characters that define the criteria used to identify a string within a record. • Used by vi, grep, sed, awk, and others.

  23. tr – Translate this • tr – [c] [d] [s] [t] set1 [ set2 ]Translate from set1 to set2 • c – compliment of set1 • d – delete characters found in set1 • s – squeeze out duplicates • t – truncate set1 to length of set2

  24. Regular Expressions • Simple strings • Bound by / … / • Interpreted literally • ie. /e D/ - matches exactly e D • Taste Dee – OK • Taste don’t – not OK

  25. Regular Expressions • The • special single sub character • Matches any single character • ie. – /.eny/ matches Aeny Beny Ceny • The [char-range ] define a character class • The [^ char-range ] define the not-in-character class

  26. Regular Expressions • The Ø • (asterisk) • Matches 0 or more of the preceding character. • What’s this? • /. Ø / • / [ a-zA-Z ] Ø/ • / ([^)] Ø)/

  27. Regular Expressions • The /^ (for the rabbit) character • In the beginning … • The $/ (for the teacher) character • At the end …

  28. Regular Expressions • Quote the raven – backslash • \. This yields • • \\ This yields \ • \* This yields * • \[ This yields [ • \] This yields ] • \ / This yields /

  29. sed – the old Stream EDitor Sed [-n] [-fscript ] [file-list] • Copies and edits to standard output • Edits file(s) in a non-interactive mode • Gets its instructions from a script file • –ffilename contains sed instructions • No option 1st command argument is used • –n suppress stdout unless specified

  30. sed – the old mill stream • Record processing • Read record from file list • Read record from script (or cmd line) • Apply selection criteria • If selected perform instructionand repeat 2  4 until no more script • Repeat 1  5 until no more file list.

  31. He sed what!!?? • Instruction format [addr1 ] ,addr2 ] ] inst [arg-list] • Address • A line number • Regular expression • Addr1 – start • Addr2 – stop

  32. Address line numbers • $ Designates the last line of the last file • 1st address line number • Starts selecting records based on their position in the input file list relative to 1. • 2nd address line number • Stops selecting records when position in the input file list is > than the line number.

  33. He sed some more • Instructions • ! – Not negates the address selection • sed ‘!/line/ p’ file.list • {…} – Groups the instructions for the address selection

  34. sed Instructions • p – Print now and continue • d – Delete and get the next record • q – Quit processing; Stop; Go Away

  35. sed Instructions • c – Change • [addr1] [addr2] c\ yada yada yadaall selected records are replaced as a group by the change value • a – Append • [addr1] a\ … add the text to the end of the selected records

  36. sed Instructions • i – Insert • [addr1] a\ … add the text to the beginning of the selected records • n – Next • [addr1] nwrites the current, gets the next and continues the script

  37. sed Instructions • w – Write • [addr1] [,addr2] w filenamewrites the selected records to a file • r – Read • [addr1] r filenamereads records from the filename and appends them to the selected record

  38. sed Instructions • s – Substitute • [addr1] [,addr2] s/ptrn /repl /[g] [p] [w f ]for each selected record match the pattern and replace • g – Replace all non-overlapping occurrences • p – Print the record • w – write the record to the filename

  39. Hawk – Squawk – awk • The programmable utility that does everything.Aho – Weinberger – Kernighan • Provides: • Conditional execution • Looping • Handles: • Numeric & string variables • Regular expresions • C print facilities

  40. awk • awk [–Fc] [–f]program-file [file list] • F – field delimiter character • f – name of the awk program file • program-file instream instructions • List of files to process

  41. awk – program lines • pattern [ action ] • Like sed pattern selects records • Record processing is the same as sed

  42. awk – pattern • Patterns follow regular expression format. • ~ Tests for match to regular expression • !~ Tests for NO match to regular expression • , – Establishes a pattern range all records are processed inclusively within the range • BEGINexecutes before the first record is processed • ENDexecutes after the last record is processed

  43. awk – relationaloperators • < – less than • <= – less than or equal to • == – equal to • != – not equal to • >= – greater than or equal to • > – greater than

  44. awk – operators • Arithmetic • + – addition • - – subtraction • * – multiplication • / – division • Assignment • = – assigns value to the left • += – adds value to the left

  45. awk – booleanoperators • && – and • || – or • ! – not

  46. awk – actions • # - Comment to the right on any line • Default action is print to stdout • Multiple actions can be taken • Use {…} to enclose multiple actions • Separate actions with ;

  47. awk – actions • print variable … • Var , Var2 , Var3 • Prints variables separated by delimiter • Var Var2 Var3 • NO separators • “literal value “ • Prints exactly everything between the “ “

  48. awk – actions • printf “cntl string” variable … • Control String • \n – new line • \t – tab • %[-] [n] [.d] conv char • - left justification • nnumber of character • .d decimal positions

  49. awk – actions • %[-] [n] [.d] conv char • - left justification • nnumber of character • .d decimal positions • conv char – conversion characterd - decimal, e - exponent, f - floating-pointo - octal, x - hexadecimals - string

  50. awk – variables • awk provided variables • NF – total number of fields • $1…$n – each field in the current record • FS – input field separator (default space or tab ) • OFS – output field separator (default space )

More Related