1 / 33

Texts and Digital Objects

Texts and Digital Objects. What seems to have changed. The web as universal library. Generation I the ASCII text Generation II the XML text Generation III the book as object. The web as universal library. Generation I the ASCII text A web of text nodes with documents at the nodes

webb
Download Presentation

Texts and Digital Objects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Texts and Digital Objects What seems to have changed

  2. The web as universal library • Generation I the ASCII text • Generation II the XML text • Generation III the book as object

  3. The web as universal library • Generation I the ASCII text A web of text nodes with documents at the nodes • Generation II the XML text A web where the documents retain deep structure but the web is still the library • Generation III the book as object The library will be imported to the web. Page by page. Library by library. The web is simply a way of accessing the universal library of print objects.

  4. But are we going backwards?

  5. But are we going backwards? Some of the movement looks a trifle retrograde

  6. Generation I • The primacy of texts Nodes can in principle also contain non-text information such as diagrams, pictures, sound, animation etc. The term hypermedia is simply the expansion of the hypertext idea to these other media. (Tim Berners Lee 1989 proposal for a www written at CERN) • Texts: hypertext, http, and ASCII will do

  7. Generation I circa 1995 A forest of connected texts which frankly doesn’t look too great.

  8. Project Gutenberg • Texts are what matter • Accuracy matters • Page numbering doesn’t • Typography doesn’t matter either

  9. But a good deal is lost • Typography may not matter, but good web design does • Typography carries a lot of meta-data • Meta-data and the formal structure of the text needs to be kept • Variety, flexibility, and machine-readability ……. xml

  10. Generation II circa 2000 Books repurposed for the web look a lot better than flat ASCII. But there is a big overhead.

  11. Republished for the web • Inevitable duplication • Page numbers don’t matter • Typography can be optimised for web browsers • Structure and added value is preserved • Links and HTTP connections are fine • But this re-purposing is a hassle and ultimately confusing

  12. So Google has a better idea • Words matter • Pages matter • Books matter • Libraries matter • And they should be searched in the way that all other digital objects and collections can be searched

  13. Generation III circa 2005 Put books on the web just as they are. Books not texts are the primary resource for a library.

  14. Keep it simple • Scan every page of every book • OCR every word and symbol • Store every word and symbol in a database • Store an image of every page in the database • Know precisely where every word is on every page

  15. How the Google system works • The browser has a JPEG and some HTML around it • The web page is an image with search terms highlighted • The intelligence is in the database • Search is precise and fast • The Google database would be the universal library

  16. Pages really matter • Every print page is a web page • A book is just a collection of web pages • The concept of a ‘union catalogue’ will now have its co-relative a ‘union library collection’ (ie what is a duplicate?) • There is no such thing as a Google edition • Are the Google standards of preservation good enough?

  17. Simplicity and Conservatism • Publishers should be flattered • Book designers, editors and typographers should be more than flattered • Authors are still authors • Catalogues and references work with minimal adjustment • Book warehouses become obsolete

  18. So what is lost? • Perhaps publishers and authors lose profits???? • The text is lost. The text is readable and searchable…. But there is no text. • A searchable text, but not an entire and complete text. A collection of pages (JPEGs). • Certainly none of the deep structure of the xml is retained • Linkages and references are absent

  19. What is gained? • Books: all texts, documents and libraries become fully searchable. • Automation of reading and accessibility of rare editions. • Incredibly cheap in relation to the enhanced availability • Bibliographies and Catalogues and other systems of metadata are preserved

  20. There is much left to do • No fine structure in the pages • Poor navigation within the books • The commercial model has to be invented • It will not all be advertising driven

  21. Exact Editions uses a Google-style platform for magazines Technology is similar but the sociology is different.

  22. Similar to Google Book Search • Platform for publishers of magazines • Publishers can add web functionality (links and advertisements) • PDF as input and automated production • Subscription or free access • Full web functionality (statistics and integration with web apps)

  23. Adam Hodgkin adam.hodgkin@exacteditions.com

More Related