1 / 50

Ontology-based Extraction of Information from the Internet

Ontology-based Extraction of Information from the Internet. Jan Korst Philips Reseach Joint work with Michael Verschoor, Nick de Jong, and Gijs Geleijnse. Overview Context Ontologies Searching for enumerations / tables in web pages Case Study: Searching for famous persons on the web

yan
Download Presentation

Ontology-based Extraction of Information from the Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology-based Extraction of Information from the Internet Jan Korst Philips Reseach Joint work with Michael Verschoor, Nick de Jong, and Gijs Geleijnse

  2. Overview • Context • Ontologies • Searching for enumerations / tables in web pages • Case Study: Searching for famous persons on the web • Concluding remarks

  3. Context • recommender system: electronic program guide, cultural agenda matching and reasoning preferences, personal history, and calender ontologies and metadata recommendations for TV shows, expositions in museums, theatre shows, etc.

  4. Ontologies • An ontology is a “specification of a conceptualization”.[Tom Gruber] • In other words: a formal description of the concepts and their relationships in a certain domain. • Example: music domain • concepts: composers, songs, albums, performers,… • relationships: … • To define/specify ontologies for given knowledge domains • semantic web languages as RDF(S) and OWL are useful.

  5. Ontologies • Anontology Ois defined by a 4-tuple (C, I, P, T), • where: • Cis a set of classes c • e.g. composer, song, album, performer,… • I = { I(c ) | cC}, with • I(c) the set of instances of class c • Pis a set of properties p (c,c’ ) for some c, c’C • e.g. is_composer_of (composer, song) is_contained_in (song, album) • T = {T (p)| p  P} , with • T(p) { (s, p, o) | s I (c), oI (c’ )}for each p P • the set of true statements (triples).

  6. Problem statement • For a partially given ontology O’= (C, I’, P, T’)of a given knowledge domain, • with I’  I and T’ T, extend I’ to I’’ and T’ to T’’ • to approximate I and T as well as possible. • In other words: how can we populate databases. • Research questions: • - Can this be automated ? • - Can we do this by extracting information the web ?

  7. Quality of Approximation • For each class c, we define precision and recall as follows: • precision (c) = • recall (c) = • For each property p, precision and recall • are defined likewise.

  8. Searching for enumerations on the web • basic idea: words in an enumeration tend to be of the same class. • Given a small subset of instances of a given class, • we want to automatically extend this subset:more-of-the-same. • algorithm: - select web pages in which a given sequence or • given subset of instances occurs, using Google. • - scan these pages for enumerations in which one or • more of the given instances occurs. • - extract other terms that are in these enumerations. • Similar approach has been applied on a corpus of documentsin molecular biology [Nenadić, Spasić & Ananiadou, 2002].

  9. General structure of the algorithm Preselection of relevant web pages Extraction of Instances/Statements Filter to remove false positives

  10. Examples "bach vivaldi mozart"611 --> [63] bach[154], mozart[46], vivaldi[45], haydn[17], beethoven[14], ensembles[9], handel[9], chopin[7], haendel[5], schubert[5], bizet[4], j[4], albinoni[3], brahms[3], s[3], sanz[3], tartini[3], 2[2], chaconne[2], corelligeminiani[2], gershwin[2], gluck[2], http[2], inteacutegrale[2], minor[2], paganini[2], ravel[2], strauss[2], stravinsky[2], tchaikovsky[2], teleman[2], telemann[2], albeniz[1], bellini[1], benda[1], berlioz[1], bloch[1], boccherini[1], boellman[1], boieldieu[1], bruch[1], caccini[1], caldera[1], corelli[1], diabelli[1], dowland[1], giuliani[1], grieg[1], homekcrrcom[1], jsbach[1], martin[1], milano[1], ortiz[1], pergolesi[1], prokofiev[1], purcell[1], rimskykorsakov[1], schumann[1], smetana[1], title[1], torelli[1], vieuxtemps[1]

  11. Examples (2) "france germany england italy" 246 --> [54] france[322], germany[259], brazil[257], italy[239], argentina[223], england[218], spain[215], holland[212], yugoslavia[140], croatia[133], denmark[129], norway[122], chile[91], belgium[88], nigeria[83], romania[83], mexico[66], bulgaria[59], colombia[54], scotland[34], austria[33], cameroon[30], team[25], usa[22], sth[18], states[16], morocco[13], ar[12], netherlands[12], saudi[11], africa[10], bahamas[10], paraguay[10], czech[8], jamaica[8], scandinavia[8], canada[7], japan[7], acquitane[4], australia[4], bali[4], caribbean[4], china[4], czechoslovakia[4], luxembourg[4], poland[4], us[4], flanders[2], acadeacutemiques[1], asn[1], cortona[1], europe[1], korea[1], park[1]

  12. Examples (3) poincare hilbert brouwer 1110 --> [90] brouwer[20], hilbert[20], abel[18], deligne[18], gregory[18], mandelbrot[18], taylor[18], turing[18], cavalieri[17], poisson[17], banach[16], kolmogorov[16], wiener[16], goldbach[15], grassmann[15], cohen[13], hausdorff[13], jacobi[13], kronecker[13], torricelli[13], vinogradov[13], riemann[12], dedekind[11], frege[11], artin[10], babbage[10], barrow[10], boole[10], bourgain[10], eukleidõs[10], euler[10], fraenkel[10], heaviside[10], legendre[10], möbius[10], shannon[10], tchebychev[10], borel[9], fibonacci[9], fisher[9], grothendieck[9], aryabhata[8], birkhoff[8], bolyai[8], cayley[8], church[8], descartes[8], hypatie[8], markov[8], minkowski[8], bolzano[7], cramer[7], dee[7], painlevÕ[7], cantor[6], morgan[6], puthagoras[6], gauss[5], haldane[5], hauptman[5], irons[5], lejeune[5], schwartz[5], lie[4], bayes[3], poincareacute[3], poincarÕ[3], biography[2], brahmagupta[2], carnap[2], goumldel[2], gödel[2], …

  13. Hypernym-based filtering Patterns that indicate hypernym relations are distinguished: ”h such as i1 , i2 , …, in” and ”i1 , i2 , …, inand other h ” [Hearst, 1992] In these patterns h is the plural of the intended class.

  14. Geographic Data Extract all countries: Input set Precision Recall France, China, Germany 0.89 0.99 Georgia, Ghana, Latvia 0.84 0.99 Kiribati, Monaco, Togo 0.79 0.99 Find out which countries havea border in common.

  15. Case Study: Finding Famous Persons on the Web Objective: generate a long list of famous persons, by searching the web. - A famous person is a person that gets enough hits when being Googled. - We restrict ourselves to persons that have already died.

  16. Definition of number of hits Using only the last name is not specific enough. e.g. Bach, Smith Even the full name might not be specific enough. e.g. Theo van Gogh In addition, some persons score better with middle name, others without. e.g. Johann Sebastian Bach vs. Johann Bach Antonio Vivaldi vs. Antonio Lucio Vivaldi While others are best known with initials only. e.g. HG Wells, DH Lawrence

  17. Definition of number of hits We use the number of hits that are found with query: “<last name> (<year of birth> - <year of death>)” e.g. “Bach (1685 – 1750)” By not using the full name, we combine different variants. e.g. Johann Sebastian BachandJS Bach For kings, queens, popes, etc, the Latin ordinal number is used as last name. This combines the variants in different languages. e.g. Charles V Carlos V Karel V

  18. Basic idea We use potential time intervals “(<year of birth> - <year of death>)” as starting point to search for persons. Issue exact queries to Google of the following form: allintitle: “(y1 – y2)” where y1 ∈ [1000..1999] and y2-y1 ∈ [20..110], and analyse the summaries Google returns. Look for the six words that precede “(y1 – y2)” and analyse these words.

  19. Google batch processing To process the Google queries we use a program that allows batch processing (Nick de Jong): Program allows parallel execution of multiple queries. GoogleQuery file with queries file with results

  20. Main Problem: how to separate person names from other names. • Art Blakey Art Deco • West Mae West Virginia • Raul Delcroix Real Decreto • HP Lovecraft HP Inkjet • Koye Somefun Have SomeFun • Potential approaches: • filter out non-persons by using a list of stop words. • filter out non-persons by using an exhaustive list of first names. • carry out further tests (“X was born in”). • We only used a list of 500 stop words, including: • Album, Anniversary, Archive, Articles, Biographie, Biography, Births, Boats, Burials, Catalog, Census,…

  21. Additional Problem: a single person can be presented in various ways Vasilij Kandinskij Wassily Kandinsky Vasily Kandinsky Vassily Kandinsky Kandinsky, Wassily Kandinsky Wassily Johann Sebastian Bach JS Bach Johann Sebastian Sebastian Bach Bach, Johann Sebastian

  22. Example of the word sequences that are found: [allintitle: "(1769 - 1852)" -genealogy -genealogie] 111 Rose-Philippine Duchesne ( Rose-Philippine Duchesne ( Wellesley, 1st Duke of Wellington ( Home Study Service Rose Philippine Duchesne Arthur, 1st Duke of Wellington ( The Duke of Wellington ( Wellesley, 1st Duke of Wellington ( Arthur Wellesley, Duke of Wellington. ( Wellesley, first Duke of Wellington ( People > Duke of Wellington ( > Pobl > Dug Wellington ( medal depicting Duke of Wellington ( Arthur Wellesley Wellington ( Wellesley, 1st Duke of Wellington ( John Landseer ( Wellington, Arthur Wellesley,Duke of, Learning Library: WELLINGTON, DUKE OF (

  23. Another Example: George Frederick Handel ( GEORGE F. HANDEL ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel, ... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British Classical DVD: Handel, George Frederic (German/British, George Frederic Handel ( ... George Frideric HANDEL ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by HANDEL, Georg Friedrich (

  24. 1. first reduce capitals: If a word consists of capitals only, then replace all but the first. e.g. HANDEL Handel Unless the word contains a hyphen. e.g. SAINT-SAENS Saint-Saens Unless the word represents a latin ordinal number. e.g. Louis XIV  Louis XIV Unless the word starts with ‘MC’. e.g. MCCULLOCH  McCulloch Unless the word is an abbreviation (initials). e.g. DE KNUTH  DE Knuth

  25. Example: George Frederick Handel ( GEORGE F. HANDEL ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel, ... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British Classical DVD: Handel, George Frederic (German/British, George Frederic Handel ( ... George Frideric HANDEL ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by HANDEL, Georg Friedrich (

  26. Example: George Frederick Handel ( George F. Handel ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel, ... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British, Classical Dvd: Handel, George Frederic (German/British, George Frederic Handel ( ... George Frideric Handel ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by Handel, Georg Friedrich (

  27. 2. delete pre- and suffixes: Delete parts that cannot be part of the name. First delete suffix. Next, scan through the words from back to front, until e.g. a colon or point is encountered.

  28. Example: George Frederick Handel ( George F. Handel ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel, ... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British, Classical Dvd: Handel, George Frederic (German/British, George Frederic Handel ( ... George Frideric Handel ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by Handel, Georg Friedrich (

  29. Example: George Frederick Handel George F. Handel X. George Frederick Handel Handel, George Frideric George Frederic Handel George Frederic Handel Handel, George Frederic Handel, George Frederic George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich

  30. 3. correct inversions: If two words remain, where the first ends with a comma, then reverse. e.g. West, Mae Mae West If three words remain, where the first ends with a comma, then reverse. e.g. Handel, George Frederick George Frederick Handel If three words remain, where the second ends with a comma, then reverse. e.g. Van Gogh, Vincent Vincent van Gogh Problem: not all inverted names contain commas.

  31. Example: George Frederick Handel George F. Handel X. George Frederick Handel Handel, George Frideric George Frederic Handel George Frederic Handel Handel, George Frederic Handel, George Frederic George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich

  32. Example: George Frederick Handel George F. Handel X. George Frederick Handel George Frideric Handel George Frederic Handel George Frederic Handel George Frederic Handel George Frederic Handel George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich

  33. 4. save two- and three-word names Scan the list of strings and those consisting of two or three words are stored, provided that they do not contain stop words. In addition, count how often they are found.

  34. Example: George Frederick Handel George Frederic Handel 5 George F. Handel George Frideric Handel 2 X. George Frederick Handel George F. Handel 1 George Frideric Handel George Frederick Handel 1 George Frederic Handel Georg Frideric Handel 1 George Frederic Handel by GF Handel 1 George Frederic Handel George Frederic Handel George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich For each lastname/years combination the form that was found most often is used.

  35. Unexpected Observations • Franz-Eugen Schlachter (1859 – 1911) has 64,500 hits, but all from the same server! It concerns an on-line bible, where each bible page is implemented as a separate web page, with Franz-Eugen Schlachter in the title. We can use the similar pages information that Google gives, to filter these out. • Koop Juliana (1948 - 1980) has 8,200 hits. “Koop Juliana” results in considerably less hits than “Juliana (1948 – 1980)”. • That can be an indication that the first name is not correct.

  36. Number of Persons Found 1000 – 1099: 40 1100 – 1199: 42 1200 – 1299: 79 1300 – 1399: 106 1400 – 1499: 357 1500 – 1599: 1050 1600 – 1699: 2258 1700 – 1799: 7239 1800 – 1899: 28637 1900 – 1999: 12101 Total 51909

  37. Top 16 born between 1500 and 1599 1 William Shakespeare (1564 - 1616) 51300 2 Rene Descartes (1596 - 1650) 33400 3 Galileo Galilei (1564 - 1642) 27300 4 Francis Bacon (1561 - 1626) 25200 5 John Dowland (1563 - 1626) 25000 6 Orlandus Lassus (1532 - 1594) 23200 7 Johannes Kepler (1571 - 1630) 22700 8 Thomas Hobbes (1588 - 1679) 15400 9 Frescobaldi Girolamo (1583 - 1643) 11900 10 Claudio Monteverdi (1567 - 1643) 11600 11 Peter Paul Rubens (1577 - 1640) 11400 12 Tycho Brahe (1546 - 1601) 11000 13 Michel de Montaigne (1533 - 1592) 10700 14 John Calvin (1509 - 1564) 9990 15 Elizabeth I (1558 - 1603) 7520 16 Andrea Palladio (1508 - 1580) 7140 17 Gibbons Orlando (1508 – 1580) 7030 18 Nicolas Poussin (1594 - 1665) 6790

  38. Top 16 born between 1600 and 1699 1 Johann Sebastian Bach (1685 - 1750) 86600 2 Antonio Vivaldi (1678 - 1741) 39700 3 Henry Purcell (1659 - 1695) 37600 4 Georg Philipp Telemann (1681 - 1767) 35700 5 Georg Friedrich Haendel (1685 - 1759) 33600 6 Voltaire (1694 - 1778) 32800 7 Isaac Newton (1642 - 1727) 31700 8 Domenico Scarlatti (1685 - 1757) 28300 9 Arcangelo Corelli (1653 - 1713) 27300 10 Francois Couperin (1668 - 1733) 27100 11 Jean-Philippe Rameau (1683 - 1764) 26700 12 Alessandro Scarlatti (1660 - 1725) 25600 13 Tomaso Albinoni (1671 - 1751) 25000 14 Jean-Baptiste Lully (1632 - 1687) 24900 15 Giuseppe Tartini (1692 - 1770) 23800 16 de la Barca (1600 - 1681) 23000 17 John Locke (1632 - 1704) 22800 18 Blaise Pascal (1623 - 1662) 22700

  39. Top 16 born between 1700 and 1799 1 Wolfgang Amadeus Mozart (1756 - 1791) 79000 2 Ludwig van Beethoven (1770 - 1827) 69400 3 Franz Schubert (1797 - 1828) 62300 4 Napoleon Bonaparte (1769 - 1821) 61500 5 Joseph Haydn (1732 - 1809) 50300 6 Johann Wolfgang Goethe (1749 - 1832) 45800 7 Immanuel Kant (1724 - 1804) 35800 8 Gioacchino Rossini (1792 - 1868) 34300 9 Benjamin Franklin (1706 - 1790) 28600 10 Washington Irving (1783 - 1859) 26900 11 Luigi Boccherini (1743 - 1805) 25100 12 Luigi Cherubini (1760 - 1842) 24100 13 William Blake (1757 - 1827) 22000 14 Arthur Schopenhauer (1788 - 1860) 21900 15 Thomas Jefferson (1743 - 1826) 20100 16 Jean-Jacques Rousseau (1712 - 1778) 19400 17 Boyce William (1711 - 1779) 17400 18 Heinrich Heine (1797 - 1856) 15900

  40. Top 16 born between 1800 and 1899 1 Charles Darwin (1809 - 1882) 73400 2 Albert Einstein (1879 - 1955) 70500 3 Johannes Brahms (1833 - 1897) 60600 4 James Joyce (1882 - 1941) 59300 5 Peter Iljitsch Tschaikowsky (1840 - 1893) 47600 6 47600 Robert Schumann (1810 - 1856) 45300 7 Frederic Chopin (1810 - 1849) 41200 8 Giuseppe Verdi (1813 - 1901) 41100 9 Claude Debussy (1862 - 1918) 39400 10 Winston Churchill (1874 - 1965) 39300 11 Franz Liszt (1811 - 1886) 38500 12 Richard Wagner (1813 - 1883) 38300 13 Richard Strauss (1864 - 1949) 37800 14 Antonin Dvorak (1841 - 1904) 35700 15 Maurice Ravel (1875 - 1937) 35300 16 Gustav Mahler (1860 - 1911) 34300

  41. Top 16 born between 1900 and 1999 16 nov. 200429 nov. 2004 1 Ronald Reagan (1911 - 2004) 44800 Yasser Arafat (1929 - 2004) 84200 2 Benjamin Britten (1913 - 1976) 31700 Ronald Reagan (1911 - 2004) 46600 3 John Peel (1939 - 2004) 27400 Benjamin Britten (1913 - 1976) 32000 4 Samuel Barber (1910 - 1981) 26600 Samuel Barber (1910 - 1981) 26300 5 John Fitzgerald Kennedy (1917 - 1963) 24100 John Peel (1939 - 2004) 21700 6 Robertson Davies (1913 - 1995) 18900 Robertson Davies (1913 - 1995) 18800 7 Yasser Arafat (1929 - 2004) 16600 John F. Kennedy (1917 - 1963) 17300 8 Peter Ustinov (1921 - 2004) 16500 Peter Ustinov (1921 - 2004) 16700 9 Kurt Cobain (1967 - 1994) 14800 Kurt Cobain (1967 - 1994) 14400 10 Salvador Dali (1904 - 1989) 14600 Salvador Dali (1904 - 1989) 14000 11 Christopher Reeve (1952 - 2004) 13900 Jon Lee (1968 - 2002) 13900 12 Jon Lee (1968 - 2002) 13900 Marlon Brando (1924 - 2004) 11200 13 Marlon Brando (1924 - 2004) 11200 Christopher Reeve (1952 - 2004) 10800 14 Van Gogh (1957 - 2004) 10900 Jean-Paul Sartre (1905 - 1980) 9790 15 Albert Camus (1913 - 1960) 9730 Chostakovitch Dimitri (1906 - 1975) 9640 16 Jean-Paul Sartre (1905 - 1980) 9630 Albert Camus (1913 - 1960) 9180 17 Ted Hughes (1930 - 1998) 8970 Van Gogh (1957 - 2004) 9050 18 Jim Morrison (1943 - 1971) 8930 Steve Reich (1965 - 1995) 8370

  42. Top 16 born between 1000 and 1999 1 Johann Sebastian Bach (1685 - 1750) 86600 2 Wolfgang Amadeus Mozart (1756 - 1791) 79000 3 Charles Darwin (1809 - 1882) 73400 4 Albert Einstein (1879 - 1955) 70500 5 Ludwig van Beethoven (1770 - 1827) 69400 6 Franz Schubert (1797 - 1828) 62300 7 Napoleon Bonaparte (1769 - 1821) 61500 8 Johannes Brahms (1833 - 1897) 60600 9 James Joyce (1882 - 1941) 59300 10 Leonardo da Vinci (1452 - 1519) 53400 11 William Shakespeare (1564 - 1616) 51300 12 Joseph Haydn (1732 - 1809) 50300 13 Peter Iljitsch Tschaikowsky (1840 - 1893) 47600 14 Johann Wolfgang Goethe (1749 - 1832) 45800 15 Robert Schumann (1810 - 1856) 45300 16 Ronald Reagan (1911 - 2004) 44800

  43. Testing recall Herinneringen in Steen 195 persons recall: 0.77 150 found: James Baldwin, Olaf Palme, Simone Signoret, Henry Moore, Carel Willink, Joan Miro, Theolonius Monk, Georges Brassens, John Lennon, Jean-Paul Sartre, Simone de Beauvoir, Mae West, Kurt Gödel, Elvis Presley, Maria Callas, Charlie Chaplin, Benjamin Britten, Paul Robeson, Mao Zedong, Agatha Christie, Lotte Lehmann, Robert Stolz, Edward Kennedy, Pablo Picasso, Pablo Casals, Maurits Cornelis Escher, Ezra Pound, Jim Morrison, Louis Armstrong, Igor Stravinsky, Jimi Hendrix, Barnett Newman, Charles de Gaule, Judy Garland, Dwight David Eisenhower, Ho Tsji Minh, Martin Luther King, Robert Kennedy, Erneste Guevara, John William Coltrane,… 45 not found:Louis Paul Boon, Adriaan Roland Holst, Stijn Streuvels, Ernest Claes, Johannes XXIII, Dag Hammarskjöld, William Christopher Handy, Lucien Guitry, Antony Fokker, Pieter Jelles Troelstra, Paul van Ostaijen, Hugo Verriest,…

  44. Testing recall Het Kunst Boek of the first 200 (dead) persons recall: 0.84 167 found: Jaques-Laurent Agasse, Josef Albers, Allesandro Algardi, Washington Allston, Jacopo Amigoni, Fra Angelico, Antonello da Messina, Alexander Archipenko, Giuseppe Arcimboldo, Hendrick Avercamp, Francis Bacon, Giacomo Balla, Fra Bartolommeo, Jean-Michel Basquiat, Jacopo Bassano, Pompeo Batoni, Willi Baumeister, Frederic Bazille, Domenico Beccafumi, Max Beckmann, Gentille Bellini, Giovanni Bellini, Hans Bellmer, Gianlorenzo Bernini, Josef Beuys, Albert Bierstadt,… 45 not found: Andrea del Sarto, Sofonisba Anguissola, Jean Arp, John James Audubon, Hans Baldung, Andre Beauneveu, Bernardo Bellotto, George Bellows,…

  45. Testing recall The Science Book of the 156 (dead) persons recall: 0.70 109 found: Leon Battista Alberti, Nicolas Copernicus, Andreas Vesalius, Conrad Gesner, Tycho Brahe, William Gilbert, Johannes Kepler, Galileo Galilei, John Napier, William Harvey, Blaise Pascal, Pierre de Fermat, Christiaan Huygens, James Clerk Maxwell, Robert Boyle, Nicolaus Steno, Giovanni Domenico Cassini, Isaac Newton, Edmond Halley, Carolus Linnaeus, Lazzaro Spallanzani, Johan Heinrich Lambert, Joseph Priestley, Antoine Laurent Lavoisier, William Herschel, Henry Cavendish, James Hutton, Edward Jenner, Pierre-Simon Laplace, Georges Cuvier, Thomas Robert Malthus, Alexander von Humboldt, Allesandro Volta, Thomas Young,... 45 not found: Fibonacci, Piero della Francesca, Jeremiah Horrocks, Antoni van Leeuwenhoek, Rudolph Jacob Camerarius, George Hadley, Carl Wilhelm Scheele, James Hall, Joseph von Frauenhofer, William Smith,…

  46. Testing precision precision Counting false positives: 4900 – 4999 0.90 9900 – 9999 0.88 14900 – 14999 0.96 19900 – 19999 0.97 Povijest Jugoslavije (1918 - 1991) Oeuvre Poetique (1925 - 1965) Alabama Wills (1808 – 1870) Black Tennesseans (1900 - 1930) Nippon Porcelain (1891 - 1921) Personal Favorites (1977 - 1998) Wheeling Glass (1829 - 1939) Political Impact (1770 - 1814) Movie Set (1959 - 1980) Transatlantic Dialogues (1775 - 1815) Sailing Navy (1775 - 1854) Home Children (1869 - 1930) Peace Pilgrim (1908 - 1981) Briton Riviere (1840 - 1920) La Regle (1917 - 1947) Farm Tractors (1890 - 1960) Western Warfare (1775 - 1882) Le Peintre (1877 - 1968) Exakta Cameras (1933 - 1978) Offene Briefe (1945 - 1968) Portraitmatilde Muti (1862 - 1943) Nature Morte (1946 - 1993) Dessins Inconnus (1901 - 1954) Jacques Lacan-Seminaires (1952 - 1980) Legendary Parties (1922 - 1972) Memory Joggers (1940 - 1989) Klondike Ho (1897 - 1997) Events From (1907 - 1977) estimated precision for first 5000: 0.90

  47. Some observations • Composers dominate the top for some centuries. • Recently-died persons have relatively high score. • Person names only consisting of one word, such as pseudonyms Voltaire, Caravaggio, and Nadar are not yet found. • Likewise, names consisting of four or more words are not yet found, • such as Joost van den Vondel. • Also, persons that died as teenagers are not found, • such as Jeanne d’ArcandAnne Frank. • More advanced approximate pattern matching is required to better cluster • the name variations of one person and potential errors in years.

  48. Concluding remarks • Enumeration search offers an interesting approach to find more-of-the-same, since it is generally applicable. • The famous-persons case study indicates that with simple techniques already non-trivial results can be obtained. • Further research: extend the case study to also include information on nationality, profession, etc. of persons. Automatically search for biographic data. • Other intended application domains: music and medical domain.

  49. Fun Section Election of ‘De Grootste Nederlander’: Vincent van Gogh

  50. Fun Section Persons that are born and died in the same years: Sir Christopher Wren (1632 – 1723) Anthony van Leeuwenhoek (1632 – 1723) Leo Tolstoy (1828 - 1910) Henri Dunant (1828 - 1910) Edouard Manet (1832 - 1883) Gustave Dore (1832 - 1883) JRR Tolkien (1892 – 1973) Pearl Buck (1892 – 1973) Miles Davis (1926 – 1991) Klaus Kinski (1926 – 1991)

More Related