synonymous paraphrasing using wordnet and internet
Download
Skip this Video
Download Presentation
Synonymous Paraphrasing Using WordNet and Internet

Loading in 2 Seconds...

play fullscreen
1 / 29

Synonymous Paraphrasing Using WordNet and Internet - PowerPoint PPT Presentation


  • 156 Views
  • Uploaded on

Synonymous Paraphrasing Using WordNet and Internet. Igor A. Bolshakov & Alexander Gelbukh Center for Computing Research National Polytechnic Institute Mexico City, Mexico { igor,gelbukh}@cic.ipn.mx. Contents. Synopsis Absolute and Relative Synonyms Collocations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Synonymous Paraphrasing Using WordNet and Internet' - elvin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
synonymous paraphrasing using wordnet and internet

Synonymous Paraphrasing Using WordNet and Internet

Igor A. Bolshakov & Alexander Gelbukh

Center for Computing ResearchNational Polytechnic InstituteMexico City, Mexico

{igor,gelbukh}@cic.ipn.mx

contents
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
synopsis 1
Synopsis – 1

We propose a method of synonymous paraphrasing of a text based on

  • WordNet synonymy data and
  • Internet statistics of stable word combinations (collocations).

Given a text, we look for words or word sequences in it for which WordNet provides synonyms, and substitute them with such synonyms only if the latter form valid collocations with the surrounding words according to the statistics gathered from Google

synopsis 2
Synopsis – 2

We present two important applications of local synonymous paraphrasing:

  • Style checking and correction: automatic evaluation and computer-aided improvement of writing style  with regard to various aspects
  • Steganography: hiding of additional information in the given text by special selection of collocationally verified synonyms
contents5
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
absolute and relative synonymy in general
Absolute and Relative Synonymyin general
  • Text variations that conserve whole text’s meaning are called synonymous paraphrasings
  • There exist global and local types of synonymous paraphrasing
  • Local paraphrasing only replaces separate words (which have synonyms) conserving the word order and the number of words
  • Synonyms are words or multiwords that can replace each other in some class of contexts with insignificant change of the whole text’s meaning
  • A synonymy dictionary consists of groups of words considered synonyms to each other
  • WordNet contains a type of synonymous dictionary
  • There exist absolute and relative synonyms
absolute and relative synonyms examples
Absolute and Relative SynonymsExamples
  • Relative synonyms- {(to) schedule,plan, design, map out, project, lay on, scheme}- {rollercoaster, big dipper, Russian mountains}
  • Absolute synonyms- {sofa, settee}- {United States of America, United States, USA, US}- {former president, ex-president}
synonymous dictionary we need
Synonymous Dictionarywe need
  • Synonymy dictionary such as in WordNet or EuroWordNet
  • A specially compiled dictionary of absolute synonyms that contain all abovementioned types of English equivalents

Our algorithms look up first the absolute synonymy subdictionary

contents9
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
collocations in general
Collocations in general
  • Collocation is a syntactically connected and semantically compatible pair of content (i.e. non-functional) words
  • Syntactical connectedness is understood as in dependency grammars (I. Melčuk)
  • Examples of English collocations are: full-lengthdress, wellexpressed, to brieflyexpose, to pick up the knife, to listen to the radio, energyfield,to promise to marry, to flatlyreject
  • Collocation components are connected to each other directly or through auxiliary words
collocation databases
Collocation Databases

For English, collocation databases exist only in printed form. The best is:

Oxford Collocations Dictionary for Students of English. Oxford University Press, 2003

In this paper we consider Google search engine as a collocation database

contents12
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
evaluations of collocations via google in general
Evaluations of Collocations via Googlein general
  • Google statistics on occurrences of words or word sequences is given in number of web pages containing these items in any amounts
  • There are only two ways to evaluate the occurrence numbers of a collocation  by giving its components:
    • in quotation marks (underestimation)
    • without them (overestimation)
  • It is necessary to propose an heuristical measure in between those mentioned
  • It is also necessary to introduce a threshold , to exclude marginal situations
evaluations of collocations via google statistics on synonymous collocations with project
Evaluations of Collocations via Google Statistics on synonymous collocations with project
slide15
Evaluations of Collocations via Google Collocations with synonyms of departments:departments 42% offices 15% services 43%
contents16
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
types of synonymous paraphrasing
Types of Synonymous Paraphrasing
  • Text compression-the shortest synonyms are taken
  • Text canonization- the most frequently used synonyms are taken
  • Text simplification- synonyms more intelligible for language-impaired persons are taken (special marks of colloquialism are needed)
  • Conformistic variations- synonyms with the Internet distribution are randomly taken
  • Individualistic variations- nearly marginal synonyms within the Internet distribution are taken
contents18
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
algorithm of interactive paraphrasing
Algorithm of Interactive Paraphrasing

Ask mode {compression, canonization, simplification, conformistic, individualistic}

Ask marginality threshold  (0,1) and sensitivity threshold  (0,1)

For each content word or multiword w which is a member of a synset

Let S = union of all relevant synsets for w

For each word v in S

If its appropriateness a(v) <  then set score(v) = 0 else

If mode = compression then set score(v) = 1 / length (v)

If mode = canonization then set score(v) = a (v)

If mode = simplification then set score(v) as described in S. 5

If mode = conformistic then set score(v) = random from 0 to a(v)

If mode = individualistic then set score(v) = 1 / a(v)

If score (w) / maxSscore (v) <  then

suggest to the user all variants v in S, score(v)  0, in the order of score(v)

contents20
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
an experiment on text paraphrasing the source text with possible replacements
An Experiment on Text ParaphrasingThe source text with possible replacements

The Georgian foreign minister(foreign office head) is scheduled (planned, designed, mapped out, projected, laid on, schemed) to meet (have a meeting, rendezvous) with the heads(chiefs, top executives) of various(different, diverse) Russian departments(offices, services) and with a deputy of Russian foreign minister(foreign office head). “Issues(problems, questions, items)concerning(pertaining, touching, regarding) the future(coming, prospective) contacts at the higher(high-rank) level will be discussed(considered, debated, parleyed, ventilated, reasoned, negotiated, talked about) in the course of the meeting(receptions, buzz sessions, interviews),” said Georgian ambassador to Russia Zurab Abashidze. The Georgian foreign minister(foreign office head) will be in(visit) Moscow on a private(privy)visit(trip), the Russian Foreign Ministry reported(communicated, informed, conveyed, announced).

an experiment on text paraphrasing the text with conformistic variations
An Experiment on Text ParaphrasingThe text with conformistic variations

The Georgian foreign office headis plannedto have a meeting with the headsof diverse Russian offices and with a deputy of Russian foreign office head. “Questionstouching the future contacts at the high-rank level will be debated in the course of the interviews,” said Georgian ambassador to Russia Zurab Abashidze. The Georgian foreign minister will visit Moscow on a private trip, the Russian Foreign Ministry informed.

contents23
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
style evaluation for compressibility
Style Evaluation:for Compressibility

Set Compressibility to 0

For each content word w in the text

Set S = union of all relevant synsets containing w

Remove from S the members v below the marginality threshold

Let v0 be the shortest word in S

Increase Compressibility in length(w) – length(v0)

contents25
Contents
  • Synopsis
  • Absolute and Relative Synonyms
  • Collocations
  • Evaluations of Collocations via Internet
  • Types of Synonymous Paraphrasing
  • Algorithm of Interactive Paraphrasing
  • An Experiment on Text Paraphrasing
  • Another Application: Style Evaluation
  • Yet Another Application: Linguistic Steganography
linguistic steganography two inputs
Linguistic SteganographyTwo Inputs:
  • The information I to be hidden, merely as a bit sequence
  • Any natural language text of the minimal length of approximately 250 per bit of I. The text is orthographically correct and semantically “common” (not a sequence of proper names, numbers, rhymes, etc.)
linguistic steganography algorithm
Linguistic SteganographyAlgorithm:

Search of synonyms- single or multiwords that have their own synsets

Formation of synonymy groups- Search for unions of all relevant synsets

Collocational verification of synonyms- Each member of the current group containing relative synonyms is tested as potential collocations together with its context wordsby Google statistics, with casting all inappropriate options

Enciphering- The current group is cut in length to the nearest power p of 2 - The p-syllable, s, of the I is taken- The s-th synonym replaces the source synonym

Reagreement

linguistic steganography more detail in the paper
Linguistic SteganographyMore detail in the paper:

Bolshakov, I.A. A Method of Linguistic Steganography Based on Collocation-Proven Synonymy. In: Proceedings of International Information Hiding Workshop IH2004, Toronto, Canada, May 2004. Lecture Notes in Computer Science, Springer, 2004 (now available only in the preprint form)

ad