Synonymous paraphrasing using wordnet and internet l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Synonymous Paraphrasing Using WordNet and Internet PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on
  • Presentation posted in: General

Synonymous Paraphrasing Using WordNet and Internet. Igor A. Bolshakov & Alexander Gelbukh Center for Computing Research National Polytechnic Institute Mexico City, Mexico { igor,[email protected] Contents. Synopsis Absolute and Relative Synonyms Collocations

Download Presentation

Synonymous Paraphrasing Using WordNet and Internet

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Synonymous paraphrasing using wordnet and internet l.jpg

Synonymous Paraphrasing Using WordNet and Internet

Igor A. Bolshakov & Alexander Gelbukh

Center for Computing ResearchNational Polytechnic InstituteMexico City, Mexico

{igor,[email protected]


Contents l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Synopsis 1 l.jpg

Synopsis – 1

We propose a method of synonymous paraphrasing of a text based on

  • WordNet synonymy data and

  • Internet statistics of stable word combinations (collocations).

    Given a text, we look for words or word sequences in it for which WordNet provides synonyms, and substitute them with such synonyms only if the latter form valid collocations with the surrounding words according to the statistics gathered from Google


Synopsis 2 l.jpg

Synopsis – 2

We present two important applications of local synonymous paraphrasing:

  • Style checking and correction: automatic evaluation and computer-aided improvement of writing style  with regard to various aspects

  • Steganography: hiding of additional information in the given text by special selection of collocationally verified synonyms


Contents5 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Absolute and relative synonymy in general l.jpg

Absolute and Relative Synonymyin general

  • Text variations that conserve whole text’s meaning are called synonymous paraphrasings

  • There exist global and local types of synonymous paraphrasing

  • Local paraphrasing only replaces separate words (which have synonyms) conserving the word order and the number of words

  • Synonyms are words or multiwords that can replace each other in some class of contexts with insignificant change of the whole text’s meaning

  • A synonymy dictionary consists of groups of words considered synonyms to each other

  • WordNet contains a type of synonymous dictionary

  • There exist absolute and relative synonyms


Absolute and relative synonyms examples l.jpg

Absolute and Relative SynonymsExamples

  • Relative synonyms- {(to) schedule,plan, design, map out, project, lay on, scheme}- {rollercoaster, big dipper, Russian mountains}

  • Absolute synonyms- {sofa, settee}- {United States of America, United States, USA, US}- {former president, ex-president}


Synonymous dictionary we need l.jpg

Synonymous Dictionarywe need

  • Synonymy dictionary such as in WordNet or EuroWordNet

  • A specially compiled dictionary of absolute synonyms that contain all abovementioned types of English equivalents

    Our algorithms look up first the absolute synonymy subdictionary


Contents9 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Collocations in general l.jpg

Collocations in general

  • Collocation is a syntactically connected and semantically compatible pair of content (i.e. non-functional) words

  • Syntactical connectedness is understood as in dependency grammars (I. Melčuk)

  • Examples of English collocations are: full-lengthdress, wellexpressed, to brieflyexpose, to pick up the knife, to listen to the radio, energyfield,to promise to marry, to flatlyreject

  • Collocation components are connected to each other directly or through auxiliary words


Collocation databases l.jpg

Collocation Databases

For English, collocation databases exist only in printed form. The best is:

Oxford Collocations Dictionary for Students of English. Oxford University Press, 2003

In this paper we consider Google search engine as a collocation database


Contents12 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Evaluations of collocations via google in general l.jpg

Evaluations of Collocations via Googlein general

  • Google statistics on occurrences of words or word sequences is given in number of web pages containing these items in any amounts

  • There are only two ways to evaluate the occurrence numbers of a collocation  by giving its components:

    • in quotation marks (underestimation)

    • without them (overestimation)

  • It is necessary to propose an heuristical measure in between those mentioned

  • It is also necessary to introduce a threshold , to exclude marginal situations


Evaluations of collocations via google statistics on synonymous collocations with project l.jpg

Evaluations of Collocations via Google Statistics on synonymous collocations with project


Slide15 l.jpg

Evaluations of Collocations via Google Collocations with synonyms of departments:departments 42% offices 15% services 43%


Contents16 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Types of synonymous paraphrasing l.jpg

Types of Synonymous Paraphrasing

  • Text compression-the shortest synonyms are taken

  • Text canonization- the most frequently used synonyms are taken

  • Text simplification- synonyms more intelligible for language-impaired persons are taken (special marks of colloquialism are needed)

  • Conformistic variations- synonyms with the Internet distribution are randomly taken

  • Individualistic variations- nearly marginal synonyms within the Internet distribution are taken


Contents18 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Algorithm of interactive paraphrasing l.jpg

Algorithm of Interactive Paraphrasing

Ask mode {compression, canonization, simplification, conformistic, individualistic}

Ask marginality threshold  (0,1) and sensitivity threshold  (0,1)

For each content word or multiword w which is a member of a synset

Let S = union of all relevant synsets for w

For each word v in S

If its appropriateness a(v) <  then set score(v) = 0 else

If mode = compressionthen set score(v) = 1 / length (v)

If mode = canonizationthen set score(v) = a (v)

If mode = simplification then set score(v) as described in S. 5

If mode = conformistic then set score(v) = random from 0 to a(v)

If mode = individualisticthen set score(v) = 1 / a(v)

If score (w) / maxSscore (v) <  then

suggest to the user all variants v in S, score(v)  0, in the order of score(v)


Contents20 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


An experiment on text paraphrasing the source text with possible replacements l.jpg

An Experiment on Text ParaphrasingThe source text with possible replacements

The Georgian foreign minister(foreign office head) is scheduled (planned, designed, mapped out, projected, laid on, schemed) to meet (have a meeting, rendezvous) with the heads(chiefs, top executives) of various(different, diverse) Russian departments(offices, services) and with a deputy of Russian foreign minister(foreign office head). “Issues(problems, questions, items)concerning(pertaining, touching, regarding) the future(coming, prospective) contacts at the higher(high-rank) level will be discussed(considered, debated, parleyed, ventilated, reasoned, negotiated, talked about) in the course of the meeting(receptions, buzz sessions, interviews),” said Georgian ambassador to Russia Zurab Abashidze. The Georgian foreign minister(foreign office head) will be in(visit) Moscow on a private(privy)visit(trip), the Russian Foreign Ministry reported(communicated, informed, conveyed, announced).


An experiment on text paraphrasing the text with conformistic variations l.jpg

An Experiment on Text ParaphrasingThe text with conformistic variations

The Georgian foreign office headis plannedto have a meeting with the headsof diverse Russian offices and with a deputy of Russian foreign office head. “Questionstouching the future contacts at the high-rank level will be debated in the course of the interviews,” said Georgian ambassador to Russia Zurab Abashidze. The Georgian foreign minister will visit Moscow on a private trip, the Russian Foreign Ministry informed.


Contents23 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Style evaluation for compressibility l.jpg

Style Evaluation:for Compressibility

Set Compressibility to 0

For each content word w in the text

Set S = union of all relevant synsets containing w

Remove from S the members v below the marginality threshold

Let v0 be the shortest word in S

Increase Compressibility in length(w) – length(v0)


Contents25 l.jpg

Contents

  • Synopsis

  • Absolute and Relative Synonyms

  • Collocations

  • Evaluations of Collocations via Internet

  • Types of Synonymous Paraphrasing

  • Algorithm of Interactive Paraphrasing

  • An Experiment on Text Paraphrasing

  • Another Application: Style Evaluation

  • Yet Another Application: Linguistic Steganography


Linguistic steganography two inputs l.jpg

Linguistic SteganographyTwo Inputs:

  • The information I to be hidden, merely as a bit sequence

  • Any natural language text of the minimal length of approximately 250 per bit of I. The text is orthographically correct and semantically “common” (not a sequence of proper names, numbers, rhymes, etc.)


Linguistic steganography algorithm l.jpg

Linguistic SteganographyAlgorithm:

Search of synonyms- single or multiwords that have their own synsets

Formation of synonymy groups- Search for unions of all relevant synsets

Collocational verification of synonyms- Each member of the current group containing relative synonyms is tested as potential collocations together with its context wordsby Google statistics, with casting all inappropriate options

Enciphering- The current group is cut in length to the nearest power p of 2 - The p-syllable, s, of the I is taken- The s-th synonym replaces the source synonym

Reagreement


Linguistic steganography more detail in the paper l.jpg

Linguistic SteganographyMore detail in the paper:

Bolshakov, I.A. A Method of Linguistic Steganography Based on Collocation-Proven Synonymy. In: Proceedings of International Information Hiding Workshop IH2004, Toronto, Canada, May 2004. Lecture Notes in Computer Science, Springer, 2004 (now available only in the preprint form)


Thank you l.jpg

Thank you!

Igor A. Bolshakov

[email protected]


  • Login