Unix lecture 6
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Unix Lecture 6 PowerPoint PPT Presentation


  • 38 Views
  • Uploaded on
  • Presentation posted in: General

Unix Lecture 6. Hana Filip. HW6 - Part II. solutions posted on my website see syllabus . sed wc awk comm cut. ex iconv join paste sort tr uniq xargs. Text Processing Command Line Utility Programs. TextPro Lexicon File. Lexicon file “core.text” Background:

Download Presentation

Unix Lecture 6

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Unix lecture 6

Unix Lecture 6

Hana Filip

LIN 6932


Hw6 part ii

HW6 - Part II

  • solutions posted on my website

    see syllabus

LIN 6932


Text processing command line utility programs

sed

wc

awk

comm

cut

ex

iconv

join

paste

sort

tr

uniq

xargs

Text ProcessingCommand Line Utility Programs

LIN 6932


Textpro lexicon file

TextPro Lexicon File

Lexicon file “core.text”

Background:

TextPro

  • An information extraction system used as SRI International, Menlo Park, CA

  • Developed by Doug Appelt

LIN 6932


Copy machen txt into your account

copy “machen.txt” into your account

> cd ..

> cd c6932aab

> ls

… machen.txt …

> cp machen.txt ~c6932aad

> cd

> ls

… machen.txt …

LIN 6932


Text processing command line utility programs1

Text ProcessingCommand Line Utility Programs

tr translate or delete characters

Example 1: delete (-d) all the new line characters from “machen.txt”, and redirect the output to a file named “machen-cont.txt”.

% cat machen.txt | tr -d "\n" > machen-cont.txt

Example 2: delete (-d) all characters from “machen.txt” except for alphabetical characters, new lines, and spaces, and redirect the output to a file named “machen-alpha.txt”.

% cat machen.txt | tr -c -d "[:alpha:]\n " > machen-alpha.txt

Try also:

% cat machen.txt | tr -c -d "[:alpha:]\n" > machen-alpha.txt

LIN 6932


Text processing command line utility programs2

Text ProcessingCommand Line Utility Programs

trcan be used to make a wordlist from a text. This can be done by replacing all spaces with a newline:

% cat machen.txt | tr " " "\n" | less

% cat machen.txt | tr " " "\012" | less

We can combine the command above with the delete functionality of tr to make a wordlist without unwanted characters:

% cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n" > lex

LIN 6932


Text processing command line utility programs3

Text ProcessingCommand Line Utility Programs

sortprints the lines of its input or concatenation of all files listed in its argument list in sorted order. (The -r flag will reverse the sort order.)

% sort -r movie_characters

LIN 6932


Text processing command line utility programs4

Text ProcessingCommand Line Utility Programs

uniqtakes a text file and outputs the file with adjacent identical lines collapsed to one

  • it is a kind of filter program

  • typically it is used aftersort

    % cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n” | sort | uniq > lex

LIN 6932


Text processing command line utility programs5

Text ProcessingCommand Line Utility Programs

sed = stream editor

  • a special editor for automatically modifying files

  • a find and replace program, it reads text from standard input and writes the result to standard outout (normally the screen) The search pattern is a regular expression (see references).

  • sed search pattern is a regular expression, essentially the same as a grep regular expression

  • often used in a program to make changes in a file

LIN 6932


Text processing command line utility programs6

Text ProcessingCommand Line Utility Programs

sed: simple example 1

% sed 's/United States/USA/' < usa-gaz.text > new-usa-gaz.text

s Substitute command

/../../ Delimiter

United States Regular Expression Pattern String

USA Replacement string

< old_file > new_file

LIN 6932


Text processing command line utility programs7

Text ProcessingCommand Line Utility Programs

sed: simple example 2

% sed 's/\(United\)\(States\)/\2\1/'< usa-gaz.text>usa-switch-gaz.text

switch two words around

\( word onset

\) word end

/../../delimiter

\1 register 1

\2 register 2

LIN 6932


Text processing command line utility programs8

Text ProcessingCommand Line Utility Programs

multiple sed commands may also be stored in a script file. The "-f" option is used on the command line to access the commands in the script:

% sed -f sedscript.sed [file]

LIN 6932


Text processing command line utility programs9

Text ProcessingCommand Line Utility Programs

% sed 's/^/LexEntry: /g;s/$/ ; ./' lex > newlex

^ match the beginning of the line

$ match the end of the line

LIN 6932


Text processing command line utility programs shell script

Text ProcessingCommand Line Utility Programs& shell script

#! /usr/local/bin/tcsh

#usage: make_lex filename1; make_lex filename1 filename2, …

# first, make sure the user typed in at least one argument

if ( $# < 1 ) then

echo "This script needs at least 1 argument."

echo "Exiting...(annoyed)"

exit 666

endif

foreach name ($*)

cat $name | tr " " "\n" | tr -c -d "[:alpha:]\n" | sort | uniq > mylex

sed 's/^/LexEntry: /g;s/$/ ; ./' mylex > newlex

echo "Your new lexical file is called 'newlex'."

end

LIN 6932


  • Login