1 / 32

SEL2211: Contexts Lecture 15: Digital electronic language technology

SEL2211: Contexts Lecture 15: Digital electronic language technology. This lecture deals with the correlation between language representation technology and cultural complexity since 1975. Why 1975?

schaffer
Download Presentation

SEL2211: Contexts Lecture 15: Digital electronic language technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SEL2211: ContextsLecture 15: Digital electronic language technology This lecture deals with the correlation between language representation technology and cultural complexity since 1975. Why 1975? It was about then that the explosion of information technology which now pervades the world began. Representation of language by print has since 1975 been complemented and in many areas of cultural activity superseded by a new technology: the computer.

  2. SEL2211: Contexts Lecture 15: Digital electronic language technology Computational representation, storage, and manipulation of language has revolutionized all aspects of government, commerce, and science/technology. It is as fundamental an innovation as print had been at the end of the Middle Ages, but its effects have been, and will continue to be, much more rapid and thoroughgoing. The full impact of computational language representation technology is still unclear because new applications continue to appear in quick succession. What is clear is that the importance of this new technology can hardly be overstated.

  3. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.1 Symbols To understand digital electronic representation of language it is necessary to be clear about the nature of symbols and how language can be symbolized. What a symbol is: some physical thing What a symbol does: representation The arbitrariness of a symbol relative to what it represents Since the invention of writing, language has been symbolized using visible marks on some surface: stone, clay, papyrus, parchment, paper. But, given the nature of symbols, this is not in principle the only way to symbolize language: any physical medium will do, and that includes electricity.

  4. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code i. History Scientists had been working on an electrical device for communication, the telegraph, since the mid-18th century. It was an American named Samuel Morse who proposed the first workable system in 1838, and with it the idea of electronic representation of language. The usefulness of this invention for fast, long-distance communication was quickly appreciated. By 1854, there were 23,000 miles of telegraph wire in operation in the US. In 1851, Western Union was founded, and in 1868, the first successful trans-Atlantic cable link was established. 

  5. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code ii. How Morse Code works In an alphabetic writing system, language is represented, or encoded, by assigning a symbol to every phoneme of a language. In the West, this has for many centuries been done using the familiar alphabet: /a/ is represented as A /b/ is represented as B and so on. But the shape of the symbols used to represent phonemes is entirely arbitrary, and the result of a particular historical development.

  6. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code Morse's idea was to use a different representation. For every letter in the conventional alphabet, he proposed a corresponding symbol consisting of dots and dashes: Using this system, the word CAB would look like this: * * * - * - *    * -    - * * * - * * * This recoding of phonemes looks superfluous at best --we already have a perfectly good alphabetic system-- and silly at worst, but in fact it is fundamental to computational language representation technology, as we shall see.

  7. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code iii. How a telegraph works When the finger press is pushed down the electrical contacts move apart and the circuit is broken; the buzzer stops sounding. When the finger press is released the electrical contacts come together and the buzzer sounds. By pressing and releasing alternately and for different intervals, it is possible to generate a sequence of sound patterns from the buzzer.

  8. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code iv. Telegraph and Morse Code combined The key insight in the marriage of Morse code and telegraph stems once again from the nature of symbols, and in particular from the arbitrariness of symbols relative to what they represent. We have seen that, for each letter in the conventional alphabet, Morse proposed a symbol consisting of a sequence of dots and dashes.

  9. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code iv. Telegraph and Morse Code combined Now, there is no particular reason why the dots and dashes should be, not marks on a piece of paper, but electrical pulses: a dot could be a short pulse, and a dash a long pulse. In other words, Morse Code can be translated from a visual code directly into an electronic code. This is the crucial step: for the first time, there was an alternative to the traditional representation of language as visible marks on some surface, and that alternative was an electronic representation. And how can such an electronic representation be generated?

  10. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code iv. Telegraph and Morse Code combined By using a telegraph: By releasing the finger press for a short time and allowing the electrical contacts to come together only briefly, this device generates a short electrical pulse, and by releasing it for longer, it generates a long one. For a short pulse, the buzzer sounds briefly, and for a longer one it sounds for longer.

  11. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code Thus, the telegraph version of the Morse Code for the letter D looks (or rather sounds) like this: An operator who is familar with Morse Code can therefore encode and send any text message as a sequence of  beeeeeeeep and beep keystrokes. All one needs is a network of electrical lines that the electronic pulses can travel along.

  12. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code iv. Telegraph and Morse Code combined In fact, such a network was quickly constructed in 19th-century America, and a cable was laid across the Atlantic to allow electronic communication with Europe. By 1854, there were 23,000 miles of telegraph wire in operation in the US. In 1851, Western Union was founded, and in 1868, the first successful trans-Atlantic cable link was established. 

  13. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code v. Morse Code & ASCII ASCII, an acronym for American Standard Code for Information Exchange, has been the standard text encoding scheme for representation of text in computers for the past two decades. It differs from Morse in two ways: 1. It uses 0 and 1 instead of dots and dashes to make letter codes 2. The code length is a constant 8 places, whereas in Morse the number of dots and dashes varies Though different in detail, however, ASCII is no different in principle from Morse.

  14. SEL2211: Contexts Lecture 15: Digital electronic language technology 1. Digital electronic representation of language 1.2 The first step: the telegraph and Morse Code v. Morse Code & ASCII In ASCII, the word CAB looks like this: Electronically, a dot in Morse is a short electrical pulse, and a dash a long pulse. When ASCII is realized electronically, there is no short and long. Instead, 1 is 'electrical on' and 0 is 'electrical off'. And, like Morse, a sequence of electronic ASCII codes can transmit any text along electrical lines.

  15. SEL2211: Contexts Lecture 15: Digital electronic language technology 2. Text storage in computers 2.2 Computer memory A computer is a physical electronic device, and its stores text as physical electronic representations of ASCII codes. The details of this electronic representation require specialist expertise to understand, but it is the principle of how text is stored that is important here, and the principle is straightforward. A computer memory is essentially a numbered list: Each location in the column labelled 'Content' contain one piece of electronic data. The computer keeps track of the addresses of data items that it has inserted into the content locations and, when it needs these data items again, it goes to those addresses and retrieves them.

  16. SEL2211: Contexts Lecture 15: Digital electronic language technology 2. Text storage in computers 2.2 ASCII text in computer memory We have seen that ASCII codes can be converted to electronic form by interpreting 1 as 'electrical on' and 0 as 'electrical off', and also that a computer memory is a sequence of storage locations, where each location contains one item of electronic data. That data can be ASCII codes. Storing text in a computer memory is therefore simply a matter of putting the relevant codes in known memory locations in the right sequence. Thus, the word CAB would look like this in memory:

  17. SEL2211: Contexts Lecture 15: Digital electronic language technology 2. Text storage in computers 2.2 ASCII text in computer memory To get at this word in memory, the computer only needs to know the starting location (here 1); the rest of the word follows. Given a sufficiently large memory, any number of ASCII letter codes can be stored in this way, and thus text of any length (ie, the works of Shakespeare) can be electronically represented and stored in a computer.

  18. SEL2211: Contexts Lecture 15: Digital electronic language technology 2. Text storage in computers 2.3 How ASCII text gets into computer memory Text gets into computer memory by means of an input device. There are various such devices, but the most familiar and commonly-used is the keyboard, so we look at that. As with memory itself,  the operation of a computer keyboard is electronically complex but conceptually very simple: every time a letter key is pressed, the electronic ASCII code corresponding to the key is generated and sent up the wire connecting the keyboard to the computer. When it arrives at the computer, the operating system places it into the memory.

  19. SEL2211: Contexts Lecture 15: Digital electronic language technology 3. Historical development of computers i. The theory of computation was invented in the 1930s to solve the mathematical problem of computability: what sorts of problem can mathematics solve? It had nothing to do with representation of language ii. The first physical computer was built to break enemy codes during WW2 by numerical calculation. iii. The earliest peacetime computers, built soon after WW2, were still used exclusively for numerical calculation. iv. In the mid-later 1950s, it was realized that there could be useful applications of computers to representation and analysis of language. These applications developed slowly, and the main use of computers was still numerical.

  20. SEL2211: Contexts Lecture 15: Digital electronic language technology 3. Historical development of computers v. Prior to 1975, computers had been huge and very expensive devices, well out of the reach of all but governments and large businesses. Thereafter, advances in electronics made possible the construction of smaller and much less expensive computers, bringing them within financial reach of interested individuals. These were called 'microcomputers'. vi. As microcomputers became more popular, some enthusiasts saw their educational and commercial potential and developed (relatively) easy to use software; in this period --the late 1970s and early 1980s-- Microsoft and Apple were established. vii. A crucial piece of software in this period was the word processor, effectively an electronic typewriter, which revolutionized language representation by moving it from print produced by typewriters to electronic text files produced by computers.

  21. SEL2211: Contexts Lecture 15: Digital electronic language technology 3. Historical development of computers viii. Computers had from early on been connected together with wires so that electronic data could be moved between and among them. Such early networks were relatively small and slow. From the mid-1980s, as the number of computers rapidly grew, the networks became much larger and very much quicker. This formed the basis for the present-day Internet. ix. As the Internet developed, software that could exploit it was invented. Email came first, and then Web browsers like Netscape. This network software was at first used mainly by governments, the military, and academics. These made it possible to move electronic text quickly and cheaply from computer to computer. x. By the early 1990s, computers and the software that runs on them had become consumer products. The consequences are observable around us.

  22. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity There have been three language representation technologies in human history, each superseding but not entirely supplanting the previous one: Handmade physical marks on physical media: cuneiform on clay tablets, alphabetic writing on papyrus and parchment, etc. Automated physical marks on physical media: print on paper Electronic text The revolutionary nature of electronic text for language representation cannot be overstated. Electronic text is currently supplementing printed text and, in many politically, economically scientifically, and technologically important applications is phasing it out. In what follows we look at the reasons for this, and then go on to consider the implications.

  23. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.1 Why electronic text is superseding print Efficiency of production: Like handwritten text or manual setup of print pages, keyboard entry of electronic text remains a bottleneck in text production. More efficient input methods like scanning and voice recognition alleviate this under some circumstances. Efficiency of editing: Handwritten and printed text are difficult to edit, that is, to correct errors and to make additions or deletions. This presents no problem for electronic text. Efficiency of reproduction: Reproduction by handwriting id a laborious and time-consuming task, and print reproduction, though much quicker, still takes time. Both, moreover, require additional writing and binding materials. Electronic text, on the other hand, can be copied virtually instantaneously.

  24. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.1 Why electronic text is superseding print Efficiency of storage: The physical space that handwritten and printed documents like books and newspapers occupy, together with the need for controlled environments to forestall deterioration, have become increasingly problematic for libraries as the number of legacy printed books and other types of document has increased. By contrast, electronic media like flash memories can store large amounts of text in tiny spaces, and memory disks are space-efficient, permanent, and robust. Efficiency of reference: Where information is required quickly, search of electronic text using appropriate software is much more efficient than sequential or index-aided search of handwritten or printed text.

  25. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.1 Why electronic text is superseding print Efficiency of transmission: Electronic text can be transmitted almost instantaneously between and among networked computers. Handwritten and printed documents must be physically transported to where they are required or, where this is not possible, readers must travel to where the documents are. Cost: Because of the foregoing efficiencies, electronic text is much cheaper in financial terms than handwritten and printed text, which makes it more accessible to more people, at least in principle.

  26. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.2 Implications for cultural complexity Contemporary information technology is rapidly coming to dominate our working and social lives, and the majority of applications in that technology are based on electronic text --for example word processing and internet-based applications like the Web, email, and the various kinds of social media. This has huge social implications; a few important ones are…

  27. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.2 Implications for cultural complexity i. Dissemination of knowledge and human freedom It is a truism that knowledge is power, and control of access to knowledge is therefore control of access to power. A major factor in the development of human culture has been the progressive liberation of access to knowledge by an ever-greater proportion of humanity as language technology has developed. From its first invention in Mesopotamia to the end of the European Middle Ages, language technology was controlled by educated and literate priesthoods in the service of political elites; the vast majority of humanity was politically powerless.

  28. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.2 Implications for cultural complexity i. Dissemination of knowledge and human freedom Printing increased both the supply and cost of text, making it and the knowledge it contained available to a wider readership; though resisted by vested political and religious interest groups, the result has been a drastic increase in the levels of economic and political freedom, at least in the West. Electronic text has made knowledge available worldwide, particularly via internet-based applications, and this has resulted in much greater oversight of the conduct of ruling political and economic elites, and recently challenges to and even termination of their authority by the will of the people.

  29. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.2 Implications for cultural complexity ii. Flowering of science and technology As access to knowledge has increased with the development of language technology, humanity's understanding of nature of if its place in the natural order has grown, and technology based on this knowledge has greatly augmented the quality of human lives. While language technology was controlled by literate priesthoods in the service of political elites, science was essentially religion and, in the absence of an empirically-based science, technology remained rudimentary. Most people's lives were, in the words of the philosopher Thomas Hobbes (1588–1679), 'solitary, poor, nasty, brutish, and short'.

  30. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.2 Implications for cultural complexity ii. Flowering of science and technology The greater access to language technology afforded by the invention of printing allowed scientific ideas to be disseminated beyond the control of established elites, which led to a rapid and in the twentieth century an explosive growth of empirically-based science together with a wide range of technologies in areas like medicine, transport, and communication which have greatly enhanced the length and quality of people's lives. Access to scientific knowledge via electronic text and the internet has accelerated the growth of science and technology in the developed world and has made it readily accessible, sometimes for the first time, in the developing world.

  31. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.2 Implications for cultural complexity iii. Threats Digital electronic language technology is not an unalloyed cultural good. Who can access digital text? Not all digital text is accessible. An increasing amount is held by special interest groups like governments, the police, and corporations, and the information it contains can have important and potentially damaging effects on people's lives. More generally, access to digital text is restricted to the technologically literate; much of the world's population is thereby excluded.

  32. SEL2211: Contexts Lecture 15: Digital electronic language technology 4. Electronic language representation and cultural complexity 4.2 Implications for cultural complexity iii. Threats How reliable is electronic text? The Web contains a huge amount of information, and it's not clear how much of it is reliable. The Web's primary advantage, freedom of access, is also its main drawback: anyone can post text on it, and there is no quality control. Are there inadvertent errors, and if so how severe is the problem? More worryingly, is there deliberate misinformation or disinformation?

More Related