1 / 15

Unix Lecture 6

Unix Lecture 6. Hana Filip. HW6 - Part II. solutions posted on my website see syllabus . sed wc awk comm cut. ex iconv join paste sort tr uniq xargs. Text Processing Command Line Utility Programs. TextPro Lexicon File. Lexicon file “core.text” Background:

susan
Download Presentation

Unix Lecture 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unix Lecture 6 Hana Filip LIN 6932

  2. HW6 - Part II • solutions posted on my website see syllabus LIN 6932

  3. sed wc awk comm cut ex iconv join paste sort tr uniq xargs Text ProcessingCommand Line Utility Programs LIN 6932

  4. TextPro Lexicon File Lexicon file “core.text” Background: TextPro • An information extraction system used as SRI International, Menlo Park, CA • Developed by Doug Appelt LIN 6932

  5. copy “machen.txt” into your account > cd .. > cd c6932aab > ls … machen.txt … > cp machen.txt ~c6932aad > cd > ls … machen.txt … LIN 6932

  6. Text ProcessingCommand Line Utility Programs tr translate or delete characters Example 1: delete (-d) all the new line characters from “machen.txt”, and redirect the output to a file named “machen-cont.txt”. % cat machen.txt | tr -d "\n" > machen-cont.txt Example 2: delete (-d) all characters from “machen.txt” except for alphabetical characters, new lines, and spaces, and redirect the output to a file named “machen-alpha.txt”. % cat machen.txt | tr -c -d "[:alpha:]\n " > machen-alpha.txt Try also: % cat machen.txt | tr -c -d "[:alpha:]\n" > machen-alpha.txt LIN 6932

  7. Text ProcessingCommand Line Utility Programs trcan be used to make a wordlist from a text. This can be done by replacing all spaces with a newline: % cat machen.txt | tr " " "\n" | less % cat machen.txt | tr " " "\012" | less We can combine the command above with the delete functionality of tr to make a wordlist without unwanted characters: % cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n" > lex LIN 6932

  8. Text ProcessingCommand Line Utility Programs sortprints the lines of its input or concatenation of all files listed in its argument list in sorted order. (The -r flag will reverse the sort order.) % sort -r movie_characters LIN 6932

  9. Text ProcessingCommand Line Utility Programs uniqtakes a text file and outputs the file with adjacent identical lines collapsed to one • it is a kind of filter program • typically it is used aftersort % cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n” | sort | uniq > lex LIN 6932

  10. Text ProcessingCommand Line Utility Programs sed = stream editor • a special editor for automatically modifying files • a find and replace program, it reads text from standard input and writes the result to standard outout (normally the screen) The search pattern is a regular expression (see references). • sed search pattern is a regular expression, essentially the same as a grep regular expression • often used in a program to make changes in a file LIN 6932

  11. Text ProcessingCommand Line Utility Programs sed: simple example 1 % sed 's/United States/USA/' < usa-gaz.text > new-usa-gaz.text s Substitute command /../../ Delimiter United States Regular Expression Pattern String USA Replacement string < old_file > new_file LIN 6932

  12. Text ProcessingCommand Line Utility Programs sed: simple example 2 % sed 's/\(United\)\(States\)/\2\1/'< usa-gaz.text>usa-switch-gaz.text switch two words around \( word onset \) word end /../../ delimiter \1 register 1 \2 register 2 LIN 6932

  13. Text ProcessingCommand Line Utility Programs multiple sed commands may also be stored in a script file. The "-f" option is used on the command line to access the commands in the script: % sed -f sedscript.sed [file] LIN 6932

  14. Text ProcessingCommand Line Utility Programs % sed 's/^/LexEntry: /g;s/$/ ; ./' lex > newlex ^ match the beginning of the line $ match the end of the line LIN 6932

  15. Text ProcessingCommand Line Utility Programs& shell script #! /usr/local/bin/tcsh #usage: make_lex filename1; make_lex filename1 filename2, … # first, make sure the user typed in at least one argument if ( $# < 1 ) then echo "This script needs at least 1 argument." echo "Exiting...(annoyed)" exit 666 endif foreach name ($*) cat $name | tr " " "\n" | tr -c -d "[:alpha:]\n" | sort | uniq > mylex sed 's/^/LexEntry: /g;s/$/ ; ./' mylex > newlex echo "Your new lexical file is called 'newlex'." end LIN 6932

More Related