1 / 23

Pushpak Bhattacharyya CSE Dept., IIT Bombay

CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick’s measures of word Similarity; coverage of Jiang and Conrath, 1997). Pushpak Bhattacharyya CSE Dept., IIT Bombay . Path length based similarity between house and lock. House belongs-to 12 senses. Sense-1

rozalia
Download Presentation

Pushpak Bhattacharyya CSE Dept., IIT Bombay

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS626/449 : Speech, NLP and the Web/Topics in AI Programming(Lecture 9: Resnick’s measures of word Similarity; coverage of Jiang and Conrath, 1997) Pushpak BhattacharyyaCSE Dept., IIT Bombay

  2. Path length based similarity between house and lock • House belongs-to 12 senses Sense-1 House study wall Has-part Has-part Has-part door doorway lock Has-part Has-part

  3. Properties that a Path Length based measure should satisfy • Zero property: • self distance is 0 (d(A,A)=0) • Symmetric property: • d(A,B)=d(B,A) • Positive property: • d is always non-negative, and • Triangular inequality: • d(A,C) <= d(A,B)+d(B,C).

  4. Motivating Resnick’s measure: through hypernymy (is-a) hierarchy • Sense 1 • lock -- (a fastener fitted to a door or drawer to keep it firmly closed) • => fastener, fastening, holdfast, fixing -- (restraint that attaches to something or holds something in place) • => restraint, constraint -- (a device that retards something's motion; "the car did not have proper restraints fitted") • => device -- (an instrumentality invented for a particular purpose; "the device is small enough to wear on your wrist"; "a device intended to conserve water") • => instrumentality, instrumentation -- (an artifact (or system of artifacts) that is instrumental in accomplishing some end) • => artifact, artefact -- (a man-made object taken as a whole) • => whole, unit -- (an assemblage of parts that is regarded as a single entity; "how big is that part compared to the whole?"; "the team is a unit") • => object, physical object -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects") • => physical entity -- (an entity that has physical existence) • => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))

  5. House: sense 1 • house -- (a dwelling that serves as living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house") • => dwelling, home, domicile, abode, habitation, dwelling house -- (housing that someone is living in; "he built a modest dwelling near the pond"; "they raise money to provide homes for the homeless") • => housing, lodging, living accommodations -- (structures collectively in which people are housed) • => structure, construction -- (a thing constructed; a complex entity constructed of many parts; "the structure consisted of a series of arches"; "she wore her hair in an amazing construction of whirls and ribbons") • => artifact, artefact -- (a man-made object taken as a whole) • => whole, unit -- (an assemblage of parts that is regarded as a single entity; "how big is that part compared to the whole?"; "the team is a unit") • => object, physical object -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects") • => physical entity -- (an entity that has physical existence) • => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving)) Overlap

  6. House: sense 2 • Sense 2 • house -- (an official assembly having legislative powers; "a bicameral legislature has two houses") • => legislature, legislative assembly, legislative, general assembly, law-makers -- (persons who make or amend or repeal laws) • => assembly -- (a group of persons gathered together for a common purpose) • => gathering, assemblage -- (a group of persons together in one place) • => social group -- (people sharing some social relation) • => group, grouping -- (any number of entities (members) considered as a unit) • => abstraction -- (a general concept formed by extracting common features from specific examples) • => abstract entity -- (an entity that exists only abstractly) • => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))

  7. House: sense 11 • Sense 11 • sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided) • => region, part -- (the extended spatial location of something; "the farming regions of France"; "religions in all parts of the world"; "regions of outer space") • => location -- (a point or extent in space) • => object, physical object -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects") • => physical entity -- (an entity that has physical existence) • => entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving)) Overlap

  8. Measures of Semantic Relatedness: Resnick • The Resnik Measure • Information content based relatedness measure • Higher information content specific to particular topics, lower ones specific to more general topics • Carving fork – HIGH IC, entity – LOW IC • The Idea is that two concepts are semantically related proportional to the amount of information shared

  9. Sense marked corpora: semcor • <s snum=3> • <wf cmd=ignore pos=PRP>He</wf> • <wf cmd=done pos=VB lemma=succeed wnsn=2 lexsn=2:41:01::>succeeds</wf> • <wf cmd=done rdf=person pos=NNP lemma=person wnsn=1 lexsn=1:03:00:: pn=person>Buck_Shaw</wf> • <punc>,</punc> • <wf cmd=ignore pos=WP>who</wf> • <wf cmd=done pos=VB lemma=retire wnsn=1 lexsn=2:41:01::>retired</wf> • <wf cmd=ignore pos=IN>at</wf> • <wf cmd=ignore pos=DT>the</wf> • <wf cmd=done pos=NN lemma=end wnsn=2 lexsn=1:28:00::>end</wf> • <wf cmd=ignore pos=IN>of</wf> • <wf cmd=done pos=JJ lemma=last wnsn=1 lexsn=5:00:00:past:00>last</wf> • <wf cmd=done pos=NN lemma=season wnsn=1 lexsn=1:28:02::>season</wf> • <punc>.</punc> • </s>

  10. Measures of Semantic Relatedness • Considers position of nouns in is-a hierarchy • SR is determined by information content of lowest common concept which subsumes both concept • For example: Nickel and Dime subsumed by Coin, Nickel and Credit card by Medium of Exchange • P(c) is probability of encountering concept c. • If a is-a b, then p(a) <= p(b) • Information content calculated by formula:- IC (concept) = – log (P (concept))

  11. Measures of Semantic Relatedness • Thus relatedness is given by:- Simres (c1, c2) = IC (LCS (c1, c2)) • Does not consider information content of the concepts themselves nor path length • Problems faced is that many concepts might have the same subsumer thus having same score • May get high measures on the basis of some inappropriate word senses. E.g tobacco and horse • Newer methods such as Jiang-Conrath, Linand Leacock-Chodorow measures

  12. In case of multiple senses where sen(w) denotes the set of possible senses for word w.

  13. Relevant formulae Classes(W) is no. of senses the word has; Words(c) is the set of words subsumed (directly or indirectly) by the class c

  14. Example of Resnick Similarity in action

  15. Structural Characteristics of a hierarchical n/w • Local network density (the number of child links that span out from a parent node) • In the plant/flora section of WordNet, the hierarchy is very dense • Depth of a node in the hierarchy • distance shrinks as one descends the hierarchy, since differentiation is based on finer and finer details • Type of link • The strength of an edge link: corpus statistics has to play role; theoretical soundness and computational efficiency are needed

  16. Link Strength: Probability and IC theoretic • The strength of a child link is proportional to the conditional probability of encountering an instance of the child concept ci given an instance of its parent concept p: P(ci | p)

  17. Link strength Intuition Formulation Actual formula

  18. What does all this buy us?

  19. Correlations

  20. Page Rank • Developed by Larry Page and Sergei Brinn • Link analysis algorithm assigns numerical weighting to hyperlinked set of documents • Measures relative importance of page in a set • Link to a page is a vote of support which increases the rank of that particular page • It is a probability distribution representing the likelihood of a person randomly clicking ultimately ending up on a specific page

  21. Pagerank based Algorithm • Assume universe has 4 pages A, B, C and D • Initial values of all the pages is 0.25 • Now suppose B, C and D link only to A • Rank of A given by:- • If B links to other pages also then rank of A:- • L(B) is the number of outbound links from B

  22. Pagerank based Algorithm (contd.) • Page rank of U depends on rank of page V linking to U divided by number of links from V • Page Rank can be given by general formula:- • Formula applicable for pages which link to U • Thus we can see that the page ranks of all pages in corpus will be equal to 1

  23. Pagerank based Algorithm (contd.) • Damping Factor : Imaginary surfer will stop clicking at links after some time. • d is probability that user will continue clicking • Damping factor is estimated at 0.85 here • The new page rank formula using this is:- • Now to get actual rank of a page we will have to iterate this formula many times • Problem of Dangling Links

More Related