300 likes | 407 Views
Explore web basics like web pages, sites, hyperlinks, and delve into managing vast digital content through classification and hierarchies in the information age. Learn how to efficiently search and categorize information to navigate the digital realm effectively.
E N D
Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission
World Wide Web – The Basics • our next topic examines how to find information on the web • we consider a few basic terms here (which you’re probably familiar with): • page/web page • link/hyperlink • site/web site • later in semester, we will revisit web technologies in much more detail
World Wide Web • a system of linked documents accessed via the internet • often simply referred to as the web • sometimes used interchangeably with the internet, but this isn’t exactly correct • the internet is the global network of interconnected devices (computers, routers, etc) that exchange data • the web refers to the documents being stored, the software that broadcasts and receives them, and the protocols used for transmission
Web Page • a document stored and accessed on the web • identified by a unique URL (Uniform Resource Locator) • often referred to simply as a page • today’s web pages are very rich in content • text • images • hyperlinks • videos
Web Site • a collection of related webpages on the internet • typically belong to a common organization or event • example • all pages served by the University of Lethbridge make up its website
Hyperlink • a part of a web page that refers to a different location • often just called a link • hyperlinks can reference: • another place on the same page • another webpage • hypertext: text containing hyperlinks
The Age of Information • the computer, internet, and web have changed how we interact with information • information storage • the amount of available information is significantly greater (and growing rapidly) than even a generation ago • information transmission • large amounts of information are available with a single mouse click, and transfer almost immediately
Information Age – Rapid Onset • the situation has transformed tremendously in your lifetimes • consider the global information capacity: • in 1986: 2.6 exabytes (< 1 CD per person) • in 1993: 15.8 exabytes • in 2000: 54.5 exabytes • in 2007: 295 exabytes (61 CDs per person) • how does one successfully navigate such a mountain of digital content? Martin and Lopez. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science 332:6025 2011
Information Access • even in pre-internet days, there was a wealth of information • large-scale: library • medium-scale: Encyclopaedia set • small-scale: newspaper • strategies developed to manage information • categories • hierarchies • indices
Classification • systematic arrangement in groups or categories according to established criteria – Merriam Webster • in other words, the information is categorized according to relevant features • consider our course notes: • terminology (4 sets of slides) • information searching (2-3 sets of slides) • etc ...
Classification • classification is not specific to digital information • library classification: Library of Congress Classification Dewey Decimal Classification
Classification • classification is not specific to digital information • newspaper classification
Classification • classification level of detail leads to tradeoffs • consider a coarse level of detail • e.g. taxonomy of living organisms • classify organisms according to Domain (Archaea, Bacteria, Eukarya) • advantage: small number of groups • disadvantage: each group is massive
Classification • classification level of detail leads to tradeoffs • consider a fine level of detail • e.g. taxonomy of living organisms • classify organisms according to Genus (Canis, Felis) • advantage: each group reasonably small • disadvantage: massive number of groups • solution: hierarchy
Hierarchy • a decomposition of classifications according to detail • hierarchies contain levels • at the top (root) level, there is typically a small number of broad categories • each category is decomposed into small categories • a classification group is defined by categorization at each level
Hierarchy • organism taxonomy hierarchy: • each Domain categorized into Kingdoms Eukarya Domain: Kingdom: Protista Animalia Fungi Plantae
Hierarchy • organism taxonomy hierarchy: • each Kingdom classified in Phylum • each Phylum classified into Class • and so on .. http://ag.arizona.edu/pubs/garden/mg/entomology/intro.html
Hierarchy • an object is still categorized, but by multiple levels (instead of one) http://schoolworkhelper.net/scientific-taxonomy/
Hierarchy • facilitates efficient searching through exclusion • example (text): • suppose you have a collection of a million items • these items organized into 10 equal-sized groups • each top-level group is also organized into 10 equal subgroups • choosing first category eliminates 900000 items • choosing second category eliminates 90000 items • and so on …
Hierarchy • hierarchies are very popular • consider our previous examples: • Library of Congress Classification
Hierarchy • hierarchies are very popular • consider our previous examples: • Newspaper
Index • a detailed list of words, phrases, and/or topics indicating place of occurrence • in essence, it maps keywords of interest to their location • e.g. a page number • a bottom-up approach to information organization • as opposed to the top-down structure of a hierarchy • particularly popular in printed material • books, magazines, volumes, etc
Index • typically used on small-scale • books and volumes vs. libraries • made efficient through organizational scheme • alphabetical is very common • some overlap with hierarchies • e.g. subtopics
Finding Information – The Web • as discussed, the amount of information on the web is immense • many of the discussed techniques for information finding also apply digitally • classification/hierarchies • indexing
Classification • many commercial websites have a classification structure • navigation bars
Hierarchies • many websites, especially large ones, will also arrange their categories in hierarchical fashion
Partition • a hierarchy where every object occurs only once • organism taxonomy – every species appears only once • some hierarchies are necessarily partitions • e.g. a particular book will only occur at one point in a library classification • however, a partition in some case is not natural • an object might have an inherent fit in more than one classification
Partitions • digital content is often stored using overlapping hierarchies (non-partition) • potentially more intuitive • with hyperlinking, it’s easy to accomplish (two links to the same page) • example (text): • Three Books for Frugal Fashionistas was stored on NPR’s website under: • Home > Arts & Life > Books > Three Books for Frugal Fashionistas • Home > Listen > Latest Program > Three Books for Frugal Fashionistas
Indexes for the Web • unlike hierarchies, indexes are much less common on individual websites • site maps might be considered an index of sorts • however, there are analogous technologies to indexes that pertain to the web as a whole • Search Engines!