1 / 46

CIT 500: IT Fundamentals

CIT 500: IT Fundamentals. Text Processing. Topics. Displaying files: cat, less, od, head, tail Creating and appending Concatenating files Comparing files Printing files Sorting files Searching files and regular expressions Sed and awk. Displaying Files. cat less od head tail.

komala
Download Presentation

CIT 500: IT Fundamentals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIT 500: IT Fundamentals Text Processing

  2. Topics • Displaying files: cat, less, od, head, tail • Creating and appending • Concatenating files • Comparing files • Printing files • Sorting files • Searching files and regular expressions • Sed and awk

  3. Displaying Files • cat • less • od • head • tail

  4. Displaying files: cat cat [options] [file1 [file2 … ]] -e Displays $ at the end of each line. -n Print line numbers before each line. -t Displays tabs as ^I and formfeeds as ^L -v Display nonprintable characters, except for tab, newline, and formfeed. -vet Combines –v, -e, -t to display all nonprintable characters.

  5. Displaying files: less less [file1 [file2 … ]] h Displays help. q Quit. space Forward one page. return Forward one line. b Back one page. y Back one line. :n Next file. :p Previous file. / Search file.

  6. Displaying files: od od [options] [file1 [file2 … ]] -c Also display character values. -x Display numbers in hexadecimal. > file /kernel/genunix /kernel/genunix: ELF 32-bit MSB relocatable SPARC > od -c /kernel/genunix 0000000 177 E L F 001 002 001 \0 \0 \0 \0 \0 \0 \0 \0 0000020 \0 001 \0 002 \0 \0 \0 001 \0 004 246 230 \0 \0 \0 0000040 \0 033 ^ ` \0 \0 \0 \0 \0 4 \0 \0 \0 \0 \0 0000060 \0 017 \0 \n 235 343 277 240 310 006 004 244 020

  7. Displaying files: head and tail Display first/last 10 lines of file. head [-#] [file1 [file2 … ]] -# Display first # lines. tail [-#] [file1 [file2 … ]] -# Display last # lines. -f If data is appended to file, continue displaying new lines as they are added.

  8. File Size Determining File Size • ls –l wc [options] file-list

  9. Word count: wc wc [options] target1 [target2, …] -c Count bytes in file only. -l Count lines in file only. -w Count words in file only. CIT 140: Introduction to IT

  10. Creating and Appending to Files Creating files > cat >file Hello world Ctrl-d Appending to files > cat >> file Hello world line 2 Ctrl-d > cat file Hello world Hello world line 2

  11. Concatenating Files > cat >file1 This is file #1 > cat >file2 This is file #2 > cat file1 file2 >joinedfile > cat joinedfile This is file #1 This is file #2

  12. Comparing files: diff diff [options] oldfile newfile -b Ignore trailing blanks and treat other strings of blanks as equivalent. -c Output contextual diff format. -e Output ed script for converting oldfile to newfile. -i Ignore case in letter comparisons. -u Output unified diff format.

  13. Comparing Files with diff diff [options][file1][file2]

  14. diff Example > diff Fall_Hours Spring_Hours 1c1 < Hours for Fall 2004 --- > Hours for Spring 2005 6a7 > 1:00 - 2:00 p.m. 9d9 < 3:00 - 4:00 p.m. 12,13d11 < 2:00 - 3:00 p.m. < 4:00 - 4:30 p.m.

  15. Removing Repeated Lines uniq [options][+N][input-file][output-file] > cat sample This is a test file for the uniq command. It contains some repeated and some nonrepeated lines. Some of the repeated lines are consecutive, like this. Some of the repeated lines are consecutive, like this. Some of the repeated lines are consecutive, like this. And, some are not consecutive, like the following. Some of the repeated lines are consecutive, like this. The above line, therefore, will not be considered a repeated line by the uniq command, but this will be considered repeated! line by the uniq command, but this will be considered repeated! > uniq sample This is a test file for the uniq command. It contains some repeated and some nonrepeated lines. Some of the repeated lines are consecutive, like this. And, some are not consecutive, like the following. Some of the repeated lines are consecutive, like this. The above line, therefore, will not be considered a repeated line by the uniq command, but this will be considered repeated!

  16. uniq uniq [options] input [output file] -c Precedes each output line with a count of the number of times the line occurred in the input. -d Suppresses the writing of lines that are not repeated in the input. -u Suppresses the writing of lines that are repeated in the input.

  17. Removing Repeated Lines • uniq [options][+N][input-file][output-file] • > uniq -c sample • 1 This is a test file for the uniq command. • 1 It contains some repeated and some nonrepeated lines. • 3 Some of the repeated lines are consecutive, like this. • 1 And, some are not consecutive, like the following. • 1 Some of the repeated lines are consecutive, like this. • 1 The above line, therefore, will not be considered a repeated • 2 line by the uniq command, but this will be considered repeated! • > uniq -d sample • Some of the repeated lines are consecutive, like this. • line by the uniq command, but this will be considered repeated! • > uniq -d sample out • > cat out • Some of the repeated lines are consecutive, like this. • line by the uniq command, but this will be considered repeated!

  18. Printing Files

  19. Printing Files Printing Files lp [options] file-list lpr [options] file-list

  20. Printing Files lpq [options]

  21. Printing Files Canceling Your Print Job cancel [options] [printer]

  22. Printing Files Canceling Your Print Job (Contd) lprm [options][jobID-list][user(s)]

  23. Sorting Ordering set of items by some criteria. Systems in which sorting is used include: • Words in a dictionary. • Names of people in a telephone directory. • Numbers.

  24. Sorting: sort sort [-f] [-i] [-k #] [-d] [-l] [-v] files -d Sort in dictionary order (default.) -f Ignore case of letters. -i Ignore non-printable characters. -k # Sort by field number # -n Sort in numerical order. -r Reverse order of sort -u Do not list duplicate lines in output.

  25. sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days.txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday

  26. sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days.txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday

  27. sort Example > cat numbers.txt 101 5571 58 2001 9 > sort numbers.txt 101 2001 5571 58 9 > sort -n numbers.txt 9 58 101 2001 5571

  28. Searching Files: grep grep [-i] [-l] [-n] [-v] pattern file1 [file2, ...] Search for pattern in the file arguments. -i Ignore case of letters in files. -l Print only the names of files that contain matches. -n Print line numbers along with matching lines. -v Print only nonmatching lines.

  29. Simple Searches > grep catt /usr/share/dict/words cattail ... wildcatting > grep -c catt /usr/share/dict/words 29 > grep –c –v catt /usr/share/dict/words 98540 > wc –l /usr/share/dict/words 98569 /usr/dict/words > grep –n catt /usr/share/dict/words 28762:cattail … 97276:wildcatting

  30. Regular Expressions ^ Beginning of line $ End of line [a-z] Character range (all lower case) [aeiou] Character range (vowels) . Any character * Zero or more of previous pattern {n} Repeat previous match n times {n,m} Repeat previous match n to m times a|b Match a or b

  31. Regular Expression Searches > egrep ^dogg /usr/share/dict/words dogged … doggy’s > egrep dogg$ /usr/share/dict/words > egrep mann$ /usr/share/dict/words Bertelsmann … Weizmann > egrep ^mann /usr/share/dict/words manna … mannishness's

  32. Regular Expression Searches > egrep 'catt|dogg' /usr/share/dict/words boondoggle boondoggled ... wildcatting > egrep 'catt|dogg' /usr/share/dict/words | wc –l 54 > egrep '^(catt|dogg)‘ /usr/share/dict/words cattail … doggy’s

  33. Character classes > egrep [0-9] /usr/share/dict/words > egrep –c ^xz /usr/share/dict/words 0 > egrep -c ^[xz] /usr/share/dict/words 153 > egrep -c [xz]$ /usr/share/dict/words 321 > egrep -c [aeiou][aeiou][aeiou][aeiou] /usr/dict/words 36 > egrep [aeiou][aeiou][aeiou][aeiou][aeiou] /usr/share/dict/words queueing > egrep [aeiou]{5} /usr/share/dict/words queueing > egrep -c :[0-9][0-9]: /etc/passwd 9 > egrep -c ':[0-9]{2,3}:' /etc/passwd 18

  34. Extracting Fields: cut cut [-f #] [-d delim] file Select sections from each line of file. -f # Select field #. -d delim Use delim instead of tab to separate fields. -b # Select specified bytes instead of fields.

  35. Cut Examples • > cut -d: -f 1 /etc/passwd | head -5 • root • daemon • bin • sys • sync • > cut -d: -f 1,3 /etc/passwd | head -5 • root:0 • daemon:1 • bin:2 • sys:3 • sync:4 • > cut -d: -f 1,3-5,7 /etc/passwd | head -5 • root:0:0:root:/bin/bash • daemon:1:1:daemon:/bin/sh • bin:2:2:bin:/bin/sh • sys:3:3:sys:/bin/sh • sync:4:65534:sync:/bin/sync

  36. Cut Examples > cut -c1-4 /etc/passwd | head -5 root daem bin: sys: sync > cut -d: -f7 /etc/passwd | cut -c1-4 | head -5 /bin /bin /bin /bin /bin > cut -d: -f7 /etc/passwd | cut –c6-20 | head -5 bash sh sh sh sync

  37. Searching + Extracting: awk awk [-F delim] ‘/pattern/ {action}’ Execute awk program on each line of file. -F delim Use delim to separate fields Patterns are regular expressions. Actions are extremely powerful, as awk is a simple programming language, but we’ll just use print $#, where # is the field we want to print.

  38. Awk Examples > awk -F: '{print $1}' /etc/passwd|head -5 root daemon bin sys sync > awk -F: '{print $1, $3}' /etc/passwd|head -5 root 0 daemon 1 bin 2 sys 3 sync 4 > awk -F: '/root/ {print $1, $3}' /etc/passwd root 0 > awk -F: '/bin\/false/ {print $1, $3}' /etc/passwd dhcp 101 syslog 102 klog 103

  39. Stream Editor: sed sed [-n] ‘/pattern/action’ files sed [-n] ‘[line1,line2]s/pat1/pat2/options’ files Filter and modify (if specified) each line of file. -n Do not print lines unless action specifies printing. Patterns are regular expressions. Actions: p = print matching lines, d = delete matching lines s = replace pattern1 with pattern2

  40. Using Sed like Grep > sed -n '/catt/p' /usr/share/dict/words cattail … wildcatting > sed -n '/catt/p' /usr/share/dict/words | wc -l 29 > sed '/catt/d' /usr/share/dict/words | wc -l 98540 > sed -n '/^dogg/p' /usr/share/dict/words dogged … doggy’s > sed -n '/dogg$/p' /usr/share/dict/words > sed -n '/mann$/p' /usr/share/dict/words Bertelsmann … Weizmann

  41. Sed Examples > cat phones.txt Our phone bill for last year was $859,800,513.57. This is our list of phone numbers: 859-572-7568 859-572-7721 859-572-7568 859-572-5468 859-572-6930 859-572-5334 859-572-5320 859-572-5659 859-572-7568 859-572-7739 859-572-0000 859-572-6544 859-572-6346 859-572-5330 859-572-7551 859-572-5571 859-572-7786 859-572-1453 859-572-6025 859-572-5333

  42. Sed Substitutions > sed 's/859/(513)/' phones.txt | head -5 Our phone bill for last year was $(513),800,513.57. This is our list of phone numbers: (513)-572-7568 (513)-572-7721 (513)-572-7568 > sed 's/859-/(513)-/' phones.txt | head -5 Our phone bill for last year was $859,800,513.57. This is our list of phone numbers: (513)-572-7568 (513)-572-7721 (513)-572-7568 > sed '3,99s/859/(513)/' phones.txt | head -5 Our phone bill for last year was $859,800,513.57. This is our list of phone numbers: (513)-572-7568 (513)-572-7721 (513)-572-7568

  43. Sed Substitutions > sed 's/[0-9]*-[0-9]*-[0-9]*/Number Redacted/' phones.txt | head -5 Our phone bill for last year was $859,800,513.57. This is our list of phone numbers: Number Redacted Number Redacted Number Redacted > sed 's/\([0-9]*-[0-9]*-[0-9]*\)/Phone number is \1/' phones.txt | head -5 Our phone bill for last year was $859,800,513.57. This is our list of phone numbers: Phone number is 859-572-7568 Phone number is 859-572-7721 Phone number is 859-572-7568 > sed 's/\([0-9]*\)-\([0-9]*\)-\([0-9]*\)/(\1) \2-\3/' phones.txt | head -5 Our phone bill for last year was $859,800,513.57. This is our list of phone numbers: (859) 572-7568 (859) 572-7721 (859) 572-7568

  44. Sed and Awk Applications Sed • Double space a file. • DOS to UNIX line endings. • Trim leading spaces. • Delete consecutive blank lines. • Remove blanks from begin/end of file. Awk • Manage small file db. • Generate reports. • Validate data. • Produce indexes. • Extract fields from UNIX command output.

  45. Sed and Awk vs. Ruby and Others Sed and Awk • Small languages • Cryptic syntax • Best for writing one liners in the shell Ruby, Python, Perl, etc. • Large languages • Easy syntax • Best for writing longer programs

  46. References • Syed Mansoor Sarwar, Robert Koretsky, Syed Ageel Sarwar, UNIX: The Textbook, 2nd edition, Addison-Wesley, 2004. • Nicholas Wells, The Complete Guide to Linux System Administration, Thomson Course Technology, 2005.

More Related