Ling 570
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Ling 570 PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Ling 570. Day #2 . Tokenizing and evaluating tokenization. Tokenization.

Download Presentation

Ling 570

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ling 570

Ling 570

Day #2


Tokenization

Tokenizing and evaluating tokenization

Tokenization


Ling 570

After coming close to a partial settlement a year ago, shareholders who filed civil suits against Ivan F. Boesky and the partnerships he once controlled again are approaching an accord, people familiar with the case said.

Meanwhile, within the next few weeks, the limited partners in Ivan F. Boesky & Co. L.P. are expected to reach a partial settlement with Drexel Burnham Lambert Inc. regarding the distribution of the $330 million in partnership assets, said one of the individuals.

One individual said the shareholders' accord was "well worked out."

There are at least 27 class-action shareholder suits that have been consolidated in federal court in New York under U.S. District Judge Milton Pollack.


Tokenize

Tokenize

  • After coming close to a partial settlement a year ago, shareholders who filed civil suits against Ivan F. Boesky and Co. L.P. Drexel’s plaintiffs’ …


Fsa t conventions

FSA/T Conventions


Fsas formally

FSAs Formally

  • A Finite-State Automaton (FSA) is a 5-tuple:

    • A set of states Q {q0,q1,q2,q3,q4}

    • A finite alphabet Σ {b,a,!}

    • A start state q0

    • A set of accepting states {q4}

    • A transition function Q x Σ Q


Fsa example

FSA Example

  • An automaton:

  • Σ


Fsa example1

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q =


Fsa example2

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q = {q0,q1}; start: ; final:


Fsa example3

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q = {q0,q1}; start: q0; final: {q1}

  • Regex=


Fsa example4

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q = {q0,q1}; start: q0; final: {q1}

  • Regex= a*b+


Fsa example5

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q = {q0,q1}; start: q0; final: {q1}

  • Regex= a*b+


Fsa example6

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q = {q0,q1}; start: q0; final: {q1}

  • Regex= a*b+


Fsa example7

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q = {q0,q1}; start: q0; final: {q1}

  • Regex= a*b+


Fsa example8

FSA Example

  • An automaton:

  • Σ= {a,b}

  • Q = {q0,q1}; start: q0; final: {q1}

  • Regex= a*b+


Another fsa example

Another FSA Example

  • Another automaton:


Two views of fsas

Two Views of FSAs

  • Recognition: An FSA is a model that, given an input string, accepts the string if it is in the language, and rejects otherwise

  • Generation: An FSA m is a model that can generate all and only the strings in L(m).


Finite state transducers

Finite-State Transducers


Fsts formally

FSTs, Formally


Ling 570

FSTs

  • Finite automaton that maps between two strings

    • Automaton with two labels/arc

      • input:output


Fst applications

FST Applications

  • Tokenization

    • Segmentation

  • Morphological analysis

  • Transliteration

  • Translation

  • Speech recognition

  • Spoken language understanding


Approaches to fsts

Approaches to FSTs

  • FST as recognizer:

    • Takes pair of input:output strings

    • Accepts if in language, o.w. rejects


Approaches to fsts1

Approaches to FSTs

  • FST as recognizer:

    • Takes pair of input:output strings

    • Accepts if in language, o.w. rejects

  • FST as generator:

    • Outputs pairs of strings in languages


Approaches to fsts2

Approaches to FSTs

  • FST as recognizer:

    • Takes pair of input:output strings

    • Accepts if in language, o.w. rejects

  • FST as generator:

    • Outputs pairs of strings in languages

  • FST as translator:

    • Reads an input string and prints output string


Approaches to fsts3

Approaches to FSTs

  • FST as recognizer:

    • Takes pair of input:output strings

    • Accepts if in language, o.w. rejects

  • FST as generator:

    • Outputs pairs of strings in languages

  • FST as translator:

    • Reads an input string and prints output string

  • FST as set relator:

    • Computes relations between sets


Fst as translator

FST as Translator

FR: ce bill met de le baume sur une blessure

EN: this bill putsbalm on a sore wound


Fst application examples

FST Application Examples

  • Case folding:

    • He said  he said


Fst application examples1

FST Application Examples

  • Case folding:

    • He said  he said

  • Tokenization:

    • “He ran.”  “ He ran . “


Fst application examples2

FST Application Examples

  • Case folding:

    • He said  he said

  • Tokenization:

    • “He ran.”  “ He ran . “

  • POS tagging:

    • They can fish  PRO VERB NOUN


Fst application examples3

FST Application Examples

  • Pronunciation:

    • B AH T EH R  B AH DX EH R

  • Morphological generation:

    • Fox s  Foxes

  • Morphological analysis:

    • cats  cat s


Stemming wfsts markov chains

Stemming/WFSTs/Markov Chains

  • Next Class:


  • Login