1 / 67

Can computers understand our language

Would life be easier if. You could tell the computer what you wanted and it understood you (no programming skills required)?You could dictate a letter to the computer, it printed it and then saved it as a file?Having no time to read a 1000 page book, you could ask the computer to summarise it for

hada
Download Presentation

Can computers understand our language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Can computers understand our language? Ruslan Mitkov Research Institute of Information and Language Processing University of Wolverhampton

    2. Would life be easier if You could tell the computer what you wanted and it understood you (no programming skills required)? You could dictate a letter to the computer, it printed it and then saved it as a file? Having no time to read a 1000 page book, you could ask the computer to summarise it for you and it produced a one page summary in a few minutes? You could ask the computer to translate for you a text in Japanese which you did not understand?

    3. The Beginning: Machine Translation Weaver (1947): Source language can be encoded and then decoded into a target language

    4. Limitations of the time Computers were slow and unreliable Programming languages were almost non-existent There was no adequate theory of language

    5. Why did Weaver fail? Weaver would have failed even if he had the supercomputers of today View was too simplistic Language is not Mathematics

    6. Understanding Language involves processing at various levels morphological (structure of words) syntactic (structure of sentences) semantic (meaning of words and sentences) discourse (topic of sentences, anaphora) pragmatic (interpretation of utterances in different contexts)

    7. Initial research focused on syntax Colourless green ideas furiously dream Me speaker, you audience Thank you for coming here today

    8. Understanding language requires not only linguistic but also extra-linguistic knowledge

    9. Bar-Hillel’s famous example The box is in the pen. I do not believe that machines whose programs do not enable them to learn, in a sophisticated sense of the word, will ever be able consistently to produce high-quality translations

    10. The ALPAC Report (1966) “There is no immediate or predictable prospect of useful Machine Translation”

    11. Revival in the eighties • Japanese project for computers 5. generation • Japanese investment in Machine Translation • In 1985 (Japan alone) 500 000 pages translated by computer!

    12. Why is language different from Mathematics? In Mathematics relationships can be formulated in strict theorems. The sum of the lengths of any two sides of a triangle is greater than the length of the third side.

    13. Can we formulate theorems for language? Each word has only one meaning. Each sentence can be interpreted in only one way. The utterances produced by humans are sincere. The size of the vocabulary in English is the linear function y = 2x of the size of the vocabulary in Chinese.

    14. LANGUAGE IS IRREGULAR AND AMBIGUOUS

    15. Ambiguity of words (Lexical ambiguity) Bank File Chair

    16. Ambiguity of sentences (Syntactic ambiguity) John saw the man with the telescope. John saw the man in the park with the telescope.

    17. Syntactic ambiguity (continued) Sentences that no human would deem ambiguous can cause problems to computers:  She boarded the airplane with two suitcases She boarded the airplane with two engines

    18. Ambiguity of meaning (Semantic ambiguity) The rabbit is ready for lunch. We serve only men here.

    19. Ambiguity of language use (Pragmatics) You owe me twenty pounds Fact? Request?

    20. Anaphoric ambiguity John put the vase on the plate and broke it. The soldiers shot at the women and they fell. The soldiers shot at the women and they missed.

    21. Anaphoric ambiguity (interference of preferences and constraints) (Example Y. Wilks) Jack drank the wine on the table. It was brown and round. World War II leaflet (Britain) If an incendiary bomb drops next to you, don’t lose your head. Put it in a bucket and cover it with sand.

    22. Understanding language successfully is not enough ANALYSIS INFERENCE GENERATION

    23. The production of language is another challenging task

    24. The production of language is another challenging task Many lecturers and students attended today's talk. They took part in the discussions. Many lecturers and students attended today's talk and took part in the discussions.

    25. Generation as a selection process Many lecturers and students attended today's talk, taking part in the discussions. Today's talk was attended by many lecturers and students. They took part in the discussions. Today's talk was attended by many lecturers and students who took part in the discussions.

    26. The computer programs should be able to reason....

    27. The computer programs should be able to reason.... Researchers should model the computers so that they can simulate human thinking in a reasonable way and if possible, learn from each conversation.

    28. Do we have good speech technology?

    29. The problem of resources Designing and developing a program which has a huge amount of knowledge (knowledge base), the ability to understand and produce natural language, to think and to learn, is an extremely difficult, time-consuming and labour-intensive task.

    30. The problem of resources The development of a Machine Translation Program at Kyoto University took 200 human years!

    31. Any realistic, short-term solution?

    32. Machine Translation High quality output translation (sublanguages, controlled languages, post-editing) Low quality output translation (casual translation)

    33. Low quality Machine Translation Gisting (indicative translation) Web-page translation Email translation Chat room translation

    34. Machine Translation: when can it be really successful? Restricting the genre, the grammar or the vocabulary Restricting the role of the computer

    35. The Sublanguage Solution Sublanguages are used by people sharing common specialised knowledge. They have restricted vocabulary, word order and avoid ambiguity of meaning. Sublanguage of weather forecasts Sublanguage of medical reports

    36. METEO: English-to-French Machine Translation METRO TORONTO. TODAY... MAINLY CLOUDY AND COLD WITH OCCASIONAL FLURRIES. BRISK WESTERLY WINDS TO 50 KM/H. HIGH NEAR MINUS 7. TONIGHT... VARIABLE CLOUDINESS. ISOLATED FLURRIES. DIMINISHING WINDS. LOW NEAR MINUS 15. FRIDAY... VARIABLE CLOUDINESS.. HIGH NEAR MINUS 6. LE GRAND TORONTO. AUJOURD HUI... GENERALEMENT NUAGEUX ET FROID AVEC QUELQUES AVERSES DE NIEGE. VENTS VIFS D'OUEST A 50 KM/H. MAXIMUM D'ENVIRON MOINS 7. CETTE NUIT.. CIEL VARIABLE. AVERSES DE NIEGE EPARSES. AFFAIBLISSMENT DES VENTS. MAXIMUM D'ENVIRON MOINS 15. VENDREDI... CIEL VARIABLE. MAXIMUM D'ENVIRON MOINS 6

    37. The Controlled Language Solution Controlled Language is a specially simplified version of a language which contains short, unambiguous sentences and uses restricted vocabulary.

    38. Typical writing rules in a controlled language Keep sentences short (use only simple sentences) Use only one sense per word Do not use anaphors Omit redundant words

    39. The Selective Solution Machine-Aided Translation The translator sends the simple sentences for translation to the computer and translates the more difficult, complex ones him(her)self.

    40. Increased efficiency: the Penang experiment Books/manuals averaging about 250 pages translated manually by a translation bureau and by a Machine-Aided Translation program (SISKEP). Manual translation took 360 hours on average Translation by a Machine-Aided Translation program needed 200 hours on average.

    41. The Human Intervention Solution (Human pre-edits text) Computer translates Human post-edits

    42. Examples of daily use of MT (with post-editing) EC use SYSTRAN since 1976 (since 1990, around 260 000 pages a year) SAP uses METAL extensively for translations from German to English (technical texts) Ericsson Language Services (ELS) use LOGOS for the translation of technical documentation from English to French, Spanish and German METEO

    43. Translation tools: proven track record Dictionary look-up Term extraction/translation Translation memory Bilingual concordancer

    44. Memory-based Translation Mainly for professional translators Uses a database of previously translated texts and compares how much a current sentence matches previously translated ones Ensures that no sentence need be translated twice Ensures consistency Responds to the industrial need for high-quality and ‘high-speed’ translations

    45. When to use Translation memory Translation of repetitive texts Translation of voluminous texts ? most suitable for technical manuals

    46. TRADOS A Translation Memory is a linguistic database that collects all your translations and their target language equivalents as you translate. A Translation Memory is a database that collects all your translations and their target language equivalents as you translate. Match 87% linguistic ??linguistische

    47. A case study (Webb 1998) Client saves 40% money, 70% time Translator / translation agency saves 69% money, 70% time

    48. Bilingual concordancer A tool that allows the user/translator to see how a word, phrase or technical term is used throughout a source language text and how it has been translated in a target language Snapshot of ParaConc

    49. The Statistical, Corpus-based and "Text as strings" Solution Words, sentences regarded as strings of symbols, without any meaning

    50. Lecture (1) A discourse with educational purpose

    51. Example (2): Lecture Topic: Lecturer: Audience: Venue: Date: Time:

    52. Example (3): Lecture Lecture is a sequence of 7 symbols: The symbols are "L", "e", "c", "t", "u", "r", "e"

    53. Simple and sophisticated statistical, corpus techniques Frequency Pattern matching Fuzzy matching (Neural network-based matching of strings) Probabilistic theories (e.g. Baysian approach on the basis of evidence)

    54. Language Identification Identifying a language on the basis of frequency and combination of words.

    55. Automatic Abstracting The production of an abstract from a longer document Using surface clues, keywords and statistical frequencies to assign weights to sentences Sentences with highest aggregate score are extracted as the most important one

    56. Literary texts? Statistical methods for identification of authorship

    57. What is Computational Linguistics / Natural Language Processing? The study of computer systems for understanding, producing and in general, for processing natural languages Typical applications: Machine Translation Automatic abstracting Question Answering Information Extraction Textual entailment For more details see Mitkov R. (2003, 2005) The Oxford Handbook of Computational Linguistics, Oxford University Press.

    58. My current topics of interest Anaphora resolution Automatic generation of multiple-choice tests Automatic identification of cognates and false friends Centering Memory-based translation NLP applications in medicine and education

    59. Anaphora resolution Anaphora resolution: the automatic identification of references in text Examples: Sophia Loren says she will always be grateful to Bono. The actress revealed that the U2 singer helped her calm down when she became scared by a thunderstorm while travelling on a plane. If Peter Mandelson had been in Tony Blair’s shoes he would have demanded his resignation the day the Prime Minister forced him to leave the Cabinet. For more details see Mitkov. R . 2002. Anaphora resolution. Longman.

    60. Research Group in Computational Linguistics 1 professor 1 lecturer 4 research fellows 4 PhD students

    61. Research topics/projects Anaphora resolution Text summarisation Information extraction Question answering Term extraction Multilingual NLP Lexical acquisition Translation memory Corpus construction and annotation NLP pre-processing tools Textual entailment Generation Machine Translation (resources) Named Entity Recognition

    62. Externally-funded projects BIRD (ESRC-funded) CAST (AHRB-funded) Automatic translation of emails (industry-funded) Automatic generation of multiple-choice tests (NBME-funded) Projects funded by the British Academy, British Council, international organisations

    63. Implementations, resources and demos The Research Group is also well-known for the variety of tools, resources and demos developed They are available on the web site of the group and can be accessible by all researchers

    64. CONCLUSIONS Computers find it very difficult to understand human languages Practical, less ambitious solutions have proved to be more successful in the short term Increased interest, growing number of projects and large investments are promising Computers are getting more and more able in understanding languages

    65. RECOMMENDATIONS FOR TRANSLATORS Use MT for casual translation Use MT for gisting (indicative translation) before you send document to professional translators Use MT in conjunction with (pre-editing and) post-editing Use MT in controlled languages Use MT in sublanguages Use TM for professional translation of repetitive and voluminous texts Use translation tools widely in translation projects

    66. A FINAL WORD Translators are not an "endangered species“! Computers are not trying to replace humans. They are just trying to help. Computers do not have the creativity and imagination of humans. But they are good at routine jobs.

    67. Contact details Ruslan Mitkov’s web page: http://www.wlv.ac.uk/~le1825 Research Group’s web page: http://clg.wlv.ac.uk

More Related