Lecture 4 - PowerPoint PPT Presentation

Lecture 4 l.jpg
Download
1 / 24

  • 217 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

Lecture 4. Getting data onto baboon The ASCII character code More Unix/Linux filters. Announcements. First Textools Quiz: October 6 (see review sheet). Getting data onto a machine running Linux. Scan it - scan data in and run optical character recognition (OCR) on it

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Lecture 4

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 4 l.jpg

Lecture 4

Getting data onto baboon

The ASCII character code

More Unix/Linux filters


Announcements l.jpg

Announcements

  • First Textools Quiz: October 6 (see review sheet)


Getting data onto a machine running linux l.jpg

Getting data onto a machine running Linux

  • Scan it -scan data in and run optical character recognition (OCR) on it

  • Copy it from a CD-Rom or floppy disk

  • Move it from another machine.

    • file transfer (ftp)

    • download data from the World Wide Web

    • email


Tcp ip l.jpg

TCP/IP

  • Any given machine might run with Unix, DOS, Windows95, . . . .

  • So how can machines with different OS’s communicate?

  • through TCP/IP - a common set of rules

    • transmission control protocol (TCP) - manages data flow by breaking data into packets

    • Internet protocol (IP) - moves the data


Ip addresses l.jpg

IP Addresses

  • Each machine on the Internet has an official numeric address, called an IP address

  • Some MSU IP addresses

    sapir130.68.160.51

    Picard.Montclair.edu 130.68.1.31

    baboon.montclair.edu 130.68.160.66

    chss.montclair.edu130.68.1.31


Domain names l.jpg

Domain Names

fitzpatr@baboon.montclair.edu

emf@homer.att.com

user IDmachineinstitutiondomain

name


Telnet l.jpg

Telnet

  • Enables log in to other computers from baboon

    telnet sapir

    Trying 130.68.160.51...

    Connected to sapir.

    Escape character is '^]'.

    Welcome to sapir.montclair.edu

    -- Unauthorized Access Prohibited --

    -----------------------------------------------------------------------

    *ATTENTION USERS: If you cannot login to your account, please

    contact the System Administrator at admin@sapir.montclair.edu

    -----------------------------------------------------------------------

    login:


File transfer protocol ftp l.jpg

File Transfer Protocol (ftp)

  • allows you to transfer files from a remote computer.

  • anonymous ftp allows you to transfer files without having an account on the remote machine.

  • Basic steps:

    ftp mrcnext.cso.uiuc.edu (one gutenburg site)

    login: anonymous

    password: fitzpatricke@baboon.montclair.edu


File compression l.jpg

File Compression

  • reduces the size of a file by finding repeating patterns and substituting a variable for the pattern

  • the compress tool is called gzip

  • check out

    man gzip

    (q to exit)


The gzip compression algorithm l.jpg

The gzip Compression Algorithm


Tape archiving tar l.jpg

Tape Archiving (tar)

  • tar saves multiple files to a single file, preserving the file names

  • this allows a set of files to be moved from machine to machine as one entity

  • tar also allows the restoration of the multiple files on the receiving machine

  • check out

    man tar

    (q to exit)


Tar options l.jpg

tar options

tar cffn.tar fn* create the following single file named fn.tar from

all files beginning with fn

tar xf fn.tarextract the contents of the single file restoring it as multiple files


The ascii character code l.jpg

The ASCII character code.

  • The American Standard Code for Information Exchange

  • The standard for sorting used on all computers

  • To see the ASCII standard order for characters, type

    man 7 ascii

    (q to exit)


More linux filters l.jpg

More Linux Filters


Egrep l.jpg

egrep

  • runs significantly faster than grep, but is greedy in terms of computer memory

  • egrep has several facilities that grep does not:

    grep egrep

    c+one or more occurrences of c No Yes

    c?zero or one occurrence of c No Yes

    c1|c2c1 or c2 No Yes


Egrep 2 l.jpg

egrep (2)

egrep b+ words

abating

Abba

abbe

egrep b? words

Aarhus

Aaron

Ababa


Egrep 3 l.jpg

egrep (3)

egrep ‘d|f’ words

abaft

abandon


The tr command l.jpg

tr transforms characters

tr expects its input to come from standard input

thus, you need a ‘<‘ to fool it into thinking the file input is actually stdin

tr from_chars to_chars < fn

refs file:

Bloomfield, L. 1933. Language.

Chomsky, N. 1986. Barriers.

Jacobson, R. 1941. Child Language.

tr o x < refs

Blxxmfield, L. 1933. Language.

Chxmsky, N. 1986. Barriers.

Jacxbsxn, R. 1941. Child Language.

The tr command


Common uses of tr l.jpg

Common Uses of tr

  • Case conversion

    tr a-z A-Z < refs

    BLOOMFIELD. L. 1933. LANGUAGE.

    CHOMSKY, N. 1986. BARRIERS.

  • Conversion of spaces to newlines

    tr ‘ ‘ ‘\n’ < gettysburg

    Four

    score

    and

    seven


Tr options l.jpg

tr options

tr -s o < refsthe squeeze option

Blomfield, L. 1933. Language.

Chomsky, N. 1986. Barriers.

Jacobson, R. 1941. Child Language.

tr -c [A-z0-9] ‘ ‘ < refs

the complement option

Bloomfield L 1933 Language

Chomsky N 1986 Barriers

Jacobson R 1941 Child Language


Tr options 2 l.jpg

tr options (2)

tr -d [0-9] < refs delete option

Bloomfield, L. . Language.

Chomsky, N. . Barriers.

Jacobson, R. . Child Language.


The sort command l.jpg

The sort command

  • The sort command

    • sorts data line by line in a file

    • reads previously sorted files and merges them

    • uses the ASCII code as the default order


Sort options l.jpg

sort options

sort -rsort in reverse order

sort -nsort by arithmetic value

sort -nrarithmetic, reverse

sort -feliminate case distinctions


The uniq command l.jpg

The uniq command

  • Operates on repeated lines

  • duplicate lines must be consecutive to be identified as duplicates (so sort often precedes uniq)

  • uniqdeletes duplicate lines

  • uniq -dreports duplicate lines

  • uniq -ureports unique lines

  • uniq -creports each line with the number of times it occurred.


  • Login