1 / 166

SI 760 / EECS 597 / Ling 702 Language and Information

SI 760 / EECS 597 / Ling 702 Language and Information. Handout #3. Winter 2004. Course Information. Instructor: Dragomir R. Radev (radev@umich.edu) Office: 3080, West Hall Connector Phone: (734) 615-5225 Office hours: M&F 12-1 Course page: http://www.si.umich.edu/~radev/LNI-winter2004/

Download Presentation

SI 760 / EECS 597 / Ling 702 Language and Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SI 760 / EECS 597 / Ling 702Language and Information Handout #3 Winter 2004

  2. Course Information • Instructor: Dragomir R. Radev (radev@umich.edu) • Office: 3080, West Hall Connector • Phone: (734) 615-5225 • Office hours: M&F 12-1 • Course page:http://www.si.umich.edu/~radev/LNI-winter2004/ • Class meets on Mondays, 1-4 PM in 412 WH

  3. Lexical Semanticsand WordNet

  4. Meanings of words • Lexemes, lexicon, sense(s) • Examples: • Red, n: the color of blood or a ruby • Blood, n: the red liquid that circulates in the heart, arteries and veins of animals • Right, adj: located nearer the right hand esp. being on the right when facing the same direction as the observer • Do dictionaries gives us definitions??

  5. Relations among words • Homonymy: • Instead, a bank can hold the investments in a custodial account in the client’s name. • But as agriculture burgeons on the east bank, the river will shrink even more. • Other examples: be/bee?, wood/would? • Homophones • Homographs • Applications: spelling correction, speech recognition, text-to-speech • Example: Un ver vert va vers un verre vert.

  6. Polysemy • They rarely serve red meat, preferring to prepare seafood, poultry, or game birds. • He served as U.S. ambassador to Norway in 1976 and 1977. • He might have served his time, come out and led an upstanding life. • Homonymy: distinct and unrelated meanings, possibly with different etymology (multiple lexemes). • Polysemy: single lexeme with two meanings. • Example: an “idea bank”

  7. Synonymy • Principle of substitutability • How big is this plane? • Would I be flying on a large or small plane? • Miss Nelson, for instance, became a kind of big sister to Mrs. Van Tassel’s son, Benjamin. • ?? Miss Nelson, for instance, became a kind of large sister to Mrs. Van Tassel’s son, Benjamin. • What is the cheapest first class fare? • ?? What is the cheapest first class cost?

  8. Semantic Networks • Used to represent relationships between words • Example: WordNet - created by George Miller’s team at Princeton (http://www.cogsci.princeton.edu/~wn) • Based on synsets (synonyms, interchangeable words) and lexical matrices

  9. Lexical matrix

  10. Synsets • Disambiguation • {board, plank} • {board, committee} • Synonyms • substitution • weak substitution • synonyms must be of the same part of speech

  11. $ ./wn board -hypen Synonyms/Hypernyms (Ordered by Frequency) of noun board 9 senses of board Sense 1 board => committee, commission => administrative unit => unit, social unit => organization, organisation => social group => group, grouping Sense 2 board => sheet, flat solid => artifact, artefact => object, physical object => entity, something Sense 3 board, plank => lumber, timber => building material => artifact, artefact => object, physical object => entity, something

  12. Sense 4 display panel, display board, board => display => electronic device => device => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something Sense 5 board, gameboard => surface => artifact, artefact => object, physical object => entity, something Sense 6 board, table => fare => food, nutrient => substance, matter => object, physical object => entity, something

  13. Sense 7 control panel, instrument panel, control board, board, panel => electrical device => device => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something Sense 8 circuit board, circuit card, board, card => printed circuit => computer circuit => circuit, electrical circuit, electric circuit => electrical device => device => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something Sense 9 dining table, board => table => furniture, piece of furniture, article of furniture => furnishings => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something

  14. Antonymy • “x” vs. “not-x” • “rich” vs. “poor”? • {rise, ascend} vs. {fall, descend}

  15. Other relations • Meronymy: X is a meronym of Y when native speakers of English accept sentences similar to “X is a part of Y”, “X is a member of Y”. • Hyponymy: {tree} is a hyponym of {plant}. • Hierarchical structure based on hyponymy (and hypernymy).

  16. Other features of WordNet • Index of familiarity • Polysemy

  17. Familiarity and polysemy board used as a noun is familiar (polysemy count = 9) bird used as a noun is common (polysemy count = 5) cat used as a noun is common (polysemy count = 7) house used as a noun is familiar (polysemy count = 11) information used as a noun is common (polysemy count = 5) retrieval used as a noun is uncommon (polysemy count = 3) serendipity used as a noun is very rare (polysemy count = 1)

  18. Compound nouns advisory board appeals board backboard backgammon board baseboard basketball backboard big board billboard binder's board binder board blackboard board game board measure board meeting board member board of appeals board of directors board of education board of regents board of trustees

  19. Overview of senses 1. board -- (a committee having supervisory powers; "the board has seven members") 2. board -- (a flat piece of material designed for a special purpose; "he nailed boards across the windows") 3. board, plank -- (a stout length of sawn timber; made in a wide variety of sizes and used for many purposes) 4. display panel, display board, board -- (a board on which information can be displayed to public view) 5. board, gameboard -- (a flat portable surface (usually rectangular) designed for board games; "he got out the board and set up the pieces") 6. board, table -- (food or meals in general; "she sets a fine table"; "room and board") 7. control panel, instrument panel, control board, board, panel -- (an insulated panel containing switches and dials and meters for controlling electrical devices; "he checked the instrument panel"; "suddenly the board lit up like a Christmas tree") 8. circuit board, circuit card, board, card -- (a printed circuit that can be inserted into expansion slots in a computer to increase the computer's capabilities) 9. dining table, board -- (a table at which meals are served; "he helped her clear the dining table"; "a feast was spread upon the board")

  20. {act, action, activity} {animal, fauna} {artifact} {attribute, property} {body, corpus} {cognition, knowledge} {communication} {event, happening} {feeling, emotion} {food} {group, collection} {location, place} {motive} {natural object} {natural phenomenon} {person, human being} {plant, flora} {possession} {process} {quantity, amount} {relation} {shape} {state, condition} {substance} {time} Top-level concepts

  21. Text Summarization

  22. The BIG problem • Information overload: 3 Billion+ URLs catalogued by Google • Possible approaches: • information retrieval • document clustering • information extraction • visualization • question answering • text summarization

  23. MILAN, Italy, April 18. A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available.

  24. MILAN, Italy, April 18. A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available.

  25. What happened? MILAN, Italy, April 18. A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available. How many victims? When, where? Says who? Was it a terrorist act? What was the target?

  26. 1. How many people were injured? 2. How many people were killed? (age, number, gender, description) 3. Was the pilot killed? 4. Where was the plane coming from? 5. Was it an accident (technical problem, illness, terrorist act)? 6. Who was the pilot? (age, number, gender, description) 7. When did the plane crash? 8. How tall is the Pirelli building? 9. Who was on the plane with the pilot? 10. Did the plane catch fire before hitting the building? 11. What was the weather like at the time of the crash? 12. When was the building built? 13. What direction was the plane flying? 14. How many people work in the building? 15. How many people were in the building at the time of the crash? 16. How many people were taken to the hospital? 17. What kind of aircraft was used?

  27. Some concepts • Abstracts: “a concise summary of the central subject matter of a document” [Paice90]. • Indicative, informative, and critical summaries • Extracts (representative paragraphs/sentences/phrases) • Still grammatical

  28. Types of summaries • Dimensions • Single-document vs. multi-document • Context • Query-specific vs. query-independent • Genres

  29. Genres • headlines • outlines • minutes • biographies • abridgments • sound bites • movie summaries • chronologies, etc. [Mani and Maybury 1999]

  30. Bush may send 500-1,000 troops to Liberia Wednesday, July 2, 2003 Posted: 7:36 PM EDT (2336 GMT) President Bush could announce later this week that he is sending 500 to 1,000 peacekeeping troops to Liberia, two senior officials told CNN. Facing mounting international pressure to have the United States lead a Liberia mission that also would include West African peacekeepers, Bush discussed such a deployment Wednesday, the officials said. U.N. Secretary-General Kofi Annan and others have talked of a U.S. deployment of 2,000 troops, but U.S. officials told CNN any deployment would be no more than half that. The officials said the timing of the announcement could be slowed by efforts to get Liberian President Charles Taylor, who faces war crimes charges by a U.N. court in neighboring Sierra Leone, to step down and leave the war-torn country. The White House official line is that Taylor should leave now and face war crimes trial later. But Bush used different language Wednesday regarding Taylor, saying simply that he should leave the country. Many analysts read the new Bush language as a sign the president was prepared to accept Taylor going into exile in a country that would not extradite him to Sierra Leone. Bush has been reluctant to commit U.S. troops to Liberia, which was founded in 1822 as a settlement for freed American slaves, and hoped West African peacekeepers would be enough, with the possible exception of Marine reinforcements at the U.S. Embassy in Monrovia. But Secretary of State Powell has been arguing in favor of a U.S. commitment, sources said -- citing recent peacekeeping commitments by France in the Ivory Coast and Great Britain in Sierra Leone. Bush leaves this weekend for his first trip to Africa, and the Liberia issue has become a test of his promise to make a commitment to promoting peace, democracy and economic development in Africa, administration officials said. One senior official said, "There will be a U.S. role, but the details are still in somewhat of a flux." Another senior official said "it is not sealed" but a force of 500 to no more than 1,000 Army troops was under serious discussion and that there were "strong indications" a final decision in favor of a deployment "will be sooner rather than later."

  31. Despite suggestions by some administration officials to the contrary, neither Defense Secretary Donald Rumsfeld nor Joint Chiefs Chairman Gen. Richard Myers has expressed reservations about involving U.S. troops in Liberia, key aides to both men told CNN. An aide to Rumsfeld said the defense secretary believes the mission would fit into the category of "lesser contingencies" the Pentagon is prepared to handle. Sources close to Myers said the general shares that view. Pentagon officials acknowledged forces are stretched thin overseas -- in Afghanistan, Iraq and the Balkans -- but said the small number of troops required for Liberia would not create problems. But other administration officials said the Pentagon is wary in part because of the humiliating memories of the last major U.S. deployment in Africa -- to Somalia -- which ended in retreat 10 years ago after 18 Americans were killed. Several senior officials said reports that Bush had already signed orders authorizing a deployment were inaccurate. But these officials said planning was intensifying, including detailed conversations with the United Nations and with West African nations that would be part of a peacekeeping mission. Pentagon sources told CNN a unit of 50 U.S. Marines known as a FAST team -- for Fleet Anti-terrorism Security Team -- was on standby in Rota, Spain, for possible deployment to reinforce security at the U.S. Embassy. Several hundred Americans remain in Liberia, where intense fighting between Taylor's government and rebel forces has continued despite a June 17 cease-fire. Nigeria had been working with Taylor on a possible deal for him to take refuge in that country. One problem, however, is that Taylor has agreed to deals before, then backed out. Officials said the United States was working closely with members of the Economic Community of West African States on diplomatic efforts, particularly Ghana and Nigeria. Comments Tuesday by White House press secretary Ari Fleischer that Bush was considering sending troops provoked a nearly instantaneous reaction in Monrovia, where thousands of people gathered outside the U.S. Embassy to cheer a possible American presence. "We feel America can bring peace because they are the original founders of this nation, and secondly, they are the superpower of the world," one man said.

  32. Bush may send 500-1,000 troops to Liberia President Bush could announce later this week that he is sending 500 to 1,000 peacekeeping troops to Liberia. Bush discussed such a deployment Wednesday. The White House official line is that Liberian President Taylor should leave now and face war crimes trial later. A unit of 50 U.S. Marines known as a FAST teamwas on standby in Rota, Spain Several hundred Americans remain in Liberia, where intense fighting between Taylor's government and rebel forces has continued despite a June 17 cease-fire. …

  33. Bush may send 500-1,000 troops to Liberia Wednesday, July 2, 2003 Posted: 7:36 PM EDT (2336 GMT) President Bush could announce later this week that he is sending 500 to 1,000 peacekeeping troops to Liberia, two senior officials told CNN. Facing mounting international pressure to have the United States lead a Liberia mission that also would include West African peacekeepers, Bush discussed such a deployment Wednesday, the officials said. U.N. Secretary-General Kofi Annan and others have talked of a U.S. deployment of 2,000 troops, but U.S. officials told CNN any deployment would be no more than half that. The officials said the timing of the announcement could be slowed by efforts to get Liberian President Charles Taylor, who faces war crimes charges by a U.N. court in neighboring Sierra Leone, to step down and leave the war-torn country. The White House official line is that Taylor should leave now and face war crimes trial later. But Bush used different language Wednesday regarding Taylor, saying simply that he should leave the country. Many analysts read the new Bush language as a sign the president was prepared to accept Taylor going into exile in a country that would not extradite him to Sierra Leone. … Pentagon sources told CNN a unit of 50 U.S. Marines known as a FAST team -- for Fleet Anti-terrorism Security Team -- was on standby in Rota, Spain, for possible deployment to reinforce security at the U.S. Embassy. Several hundred Americans remain in Liberia, where intense fighting between Taylor's government and rebel forces has continued despite a June 17 cease-fire. …

  34. What does summarization involve? • Three stages (typically) • content identification • conceptual organization • realization

  35. Human summarization and abstracting • What professional abstractors do • Ashworth: • “To take an original article, understand it and pack it neatly into a nutshell without loss of substance or clarity presents a challenge which many have felt worth taking up for the joys of achievement alone. These are the characteristics of an art form”.

  36. Borko and Bernier 75 • The abstract and its use: • Abstracts promote current awareness • Abstracts save reading time • Abstracts facilitate selection • Abstracts facilitate literature searches • Abstracts improve indexing efficiency • Abstracts aid in the preparation of reviews

  37. Cremmins 82, 96 • American National Standard for Writing Abstracts: • State the purpose, methods, results, and conclusions presented in the original document, either in that order or with an initial emphasis on results and conclusions. • Make the abstract as informative as the nature of the document will permit, so that readers may decide, quickly and accurately, whether they need to read the entire document. • Avoid including background information or citing the work of others in the abstract, unless the study is a replication or evaluation of their work.

  38. Cremmins 82, 96 • Do not include information in the abstract that is not contained in the textual material being abstracted. • Verify that all quantitative and qualitative information used in the abstract agrees with the information contained in the full text of the document. • Use standard English and precise technical terms, and follow conventional grammar and punctuation rules. • Give expanded versions of lesser known abbreviations and acronyms, and verbalize symbols that may be unfamiliar to readers of the abstract. • Omit needless words, phrases, and sentences.

  39. Original version:There were significant positive associations between the concentrations of the substance administered and mortality in rats and mice of both sexes.There was no convincing evidence to indicate that endrin ingestion induced and of the different types of tumors which were found in the treated animals. Edited version:Mortality in rats and mice of both sexes was dose related.No treatment-related tumors were found in any of the animals. Cremmins 82, 96

  40. Morris et al. 92 • Reading comprehension of summaries • 75% redundancy of English [Shannon 51] • Compare manual abstracts, Edmundson-style extracts, and full documents • Extracts containing 20% or 30% of original document are effective surrogates of original document • Performance on 20% and 30% extracts is no different than informative abstracts

  41. Luhn 58 • Very first work in automated summarization • Computes measures of significance • Words: • stemming • bag of words E FREQUENCY WORDS Resolving power of significant words

  42. Luhn 58 • Sentences: • concentration of high-score words • Cutoff values established in experiments with 100 human subjects SENTENCE SIGNIFICANT WORDS * * * * 1 2 3 4 5 6 7 ALL WORDS SCORE = 42/7  2.3

  43. Cue method: stigma words (“hardly”, “impossible”) bonus words (“significant”) Key method: similar to Luhn Title method: title + headings Location method: sentences under headings sentences near beginning or end of document and/or paragraphs (also [Baxendale 58]) Edmundson 69

  44. Linear combination of four features:1C + 2K + 3T + 4L Manually labelled training corpus Key not important! Edmundson 69  1  C + T + L C + K + T + L LOCATION CUE TITLE KEY RANDOM 0 10 20 30 40 50 60 70 80 90 100 %

  45. Survey up to 1990 Techniques that (mostly) failed: syntactic criteria [Earl 70] indicator phrases (“The purpose of this article is to review…) Problems with extracts: lack of balance lack of cohesion anaphoric reference lexical or definite reference rhetorical connectives Paice 90

  46. Lack of balance later approaches based on text rhetorical structure Lack of cohesion recognition of anaphors [Liddy et al. 87] Example: “that” is nonanaphoric if preceded by a research-verb (e.g., “demonstrat-”), nonanaphoric if followed by a pronoun, article, quantifier,…, external if no later than 10th word,else internal Paice 90

  47. ANES: commercial news from 41 publications “Lead” achieves acceptability of 90% vs. 74.4% for “intelligent” summaries 20,997 documents words selected based on tf*idf sentence-based features: signature words location anaphora words length of abstract Brandow et al. 95

  48. Sentences with no signature words are included if between two selected sentences Evaluation done at 60, 150, and 250 word length Non-task-driven evaluation:“Most summaries judged less-than-perfect would not be detectable as such to a user” Brandow et al. 95

More Related