Uncorking the varietals social tagging folksonomies controlled vocabularies
1 / 50

Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies. Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services. In wine making - What is a Varietal?. A wine made from a single, named grape variety.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies

Margaret Maurer

Head, Catalog and Metadata

Kent State University Libraries and Media Services

In wine making - What is a Varietal?

  • A wine made from a single, named grape variety.

    • Cabernet Sauvignon wines are made from cabernet sauvignon grapes

    • Chardonnay wines are made from chardonnay grapes

In information seeking – on the Web or in the catalog

  • Access and identification systems may be controlled by librarians–controlled vocabularies

  • Access and identification systems may be dynamically generated by users–social tagging, folksonomies

  • These are different varieties of access and identification systems

This presentation

  • Controlled vocabularies

  • Social Tagging

  • Folksonomies

  • My recommendations

First we’ll talk about the cabernet sauvignons – the controlled vocabs

Purpose of a controlled vocabulary

  • To create sets of objects

  • To serve as a bridge between the searcher’s language and the author’s language

  • To provide consistency

  • To improve precision and recall

Characteristics of a controlled vocabulary

  • Features a single, authorized form of heading

  • Often features a syndetic structure of cross-references

  • Based on belief that the successful use of the catalog is based on the quality of the individual records

The authority record structure

  • Records the standardized form

  • Ensures the gathering together of records via that access point

  • Enables standardized catalog records

  • Documents decisions taken

  • Records all other heading forms

    and provides links from them to

    the standardized form

Benefits of controlled vocabularies

  • Promotes discovery generally

  • Promotes discovery when the aboutness of something has nothing to do with words in the resource or its representation

    • Imaginative literature (Genre headings)

    • Humanities

  • Promotes pre-coordinated displays expand access–http://cinema.library.ucla.edu

Benefits when combined with keyword searching

  • Keywords hook into strings of terms most efficiently

  • Users can be routed by pre-coordinated strings

Controlled vocabularies support faceted catalogs

  • Encore

  • Evergreen

  • Endeca

  • WorldCat Local

    All provide hyperlinks to authorized headings

Weaknesses of controlled vocabularies

  • The artificially controlled language is not necessarily natural language—Cookery anyone?

  • Subject searches are the most problematic for users

  • It may work better in theory than in practice

  • It is costly to perform necessary maintenance

  • Cost is seen to outweigh the benefits by many administrators

Library of Congress Subject Headings - LCSH

  • Has a long and well-documented history

  • Commonly used

  • Is contained in millions of bibliographic records

  • Strong institutional support from LC

More benefits of LCSH

  • The rich vocabulary covers most subjects

  • It imposes synonym and homograph control

  • There are machine assisted authority control mechanisms

  • There is pre-coordination with LCC

  • The music subject heading system is well developed

Weaknesses of LCSH

  • It is a generalist taxonomy that can’t always provide needed granularity

  • Terminology currency

  • It doesn’t allow for post-search coordination (it is pre-coordinated)

  • It suffers from LC Collection bias

More weaknesses of LCSH

  • Training needed

    • Requires some orientation to use effectively

    • Is not always accurately applied by catalogers

  • Maintenance

    • It is difficult to maintain when changes occur

Authority control outside the catalog

  • Data critical mass  tipping point?

    • Homogeneity of data in terms of subject matter

    • Requirements within data community’s users for specificity

    • Size

    • Computing power

  • Wikipedia’s “disambiguation”

ZoomInfo http://www.zoominfo.com/Default.aspx

What if we did open up our authority files to the web?

  • National Library of Australia’s People Australia Project


  • Wikipedia Persondata-Tool


Is ontology overrated?

  • Physicality requires ontologies for searching, but systems with hyperlinks do not

  • Browse versus search may eliminate the need for creating lists of authorized headings

Ontological classification

  • Works well when the domain to be organized is small, has formal categories, has stable entities, is restricted and has clear edges

  • Does not work well when the domain to be organized is large, has no formal categories, is unstable, is unrestricted and has no clear edges

Ontological classification

  • Works well when the participants are expert catalogers, authoritative sources of judgement, coordinated users or expert users

  • Does not work well when the participants are uncoordinated, armature, naïve or non-authoritative

Now we talk about the Chardonnays – social tagging and folksonomies

What are tags?

  • Keywords or terms associated with or assigned to a piece of information

  • They enable keyword-based classification and search of information

Common Web sites that use tags include

  • Del.icio.us – Social bookmarking site

  • Flickr – Image tagging

  • LibraryThing

  • Gmail - Webmail

  • YouTube

Tags, and therefore social tags and folksonomies are

  • Dynamic categorization systems

  • Often created on-the-fly

  • Chosen as relevant to the user – not to the creator, cataloger or researcher

  • A social activity (more on this later)

  • Hopefully one small step toward a more interactive and responsive library system

Social tags are

  • Non-hierarchical

  • A way to create links between items by the creation of sets of objects

  • A means of connecting with others interested in the same things

Way baaack in 2003…

  • Del.icio.us includes identity in its social bookmarking

  • Flickr includes tags

  • Lists of tags became a tool for serendipitous discovery (folksonomies)

Why is tagging so popular?

  • It is easy and enjoyable

  • It has a low cognitive cost

  • It is quick to do

  • It provides self and social feedback immediately

People tag things

  • To find them again

  • To get exposure and traffic

  • To voice their opinions

  • Incidentally as they perform other tasks

  • To take advantage of functionality built on top of a folksonomy

  • To play a game or earn points

Putting the social in tagging

  • Tags allow for social interaction because when we navigate by tags we are directly connecting with others

  • People tag for their own benefit

Don’t confuse tags with keywords or full-text searching

  • Keywords are behind the scenes, tags are often visibly aggregated for use and browsing

  • Keywords can not be hyper-linked

  • Keywords imply searching, tags imply linking

  • Full-text searching is passive, tagging is active

  • It’s more about connecting items rather than categorizing them.

What is a Folksonomy?

  • Folksonomy refers to an “emergent, grassroots taxonomy”

    • An aggregate collections of tags

    • A bottom-up categorical structure development

    • An emergent thesaurus

  • A term coined by Thomas Vander Wal

How do folksonomies work?

  • The searcher defines the access, but

  • The aggregation of the terms has public value

  • It’s a typically messy democratic approach

What makes folksonomies popular?

  • Their dynamic nature works well

    with dynamic resources

  • They’re personal

  • They lower barriers to cooperation

Tagging and the consequent folksonomies work best when

  • It’s easy to do

  • It’s not commercial in nature

  • Taggers have ownership

  • Taggers are more likely to tag their own stuff than they are your stuff

  • It has been shown to work

    well on the Web

The unexpected development: terminological consensus

  • Collective action yields common terms

  • Stabilization may be caused by imitation and shared knowledge

  • The wisdom of the crowd

Is your tagging influenced by my tagging?

  • Of course it is!

  • People are beginning tag in ways that make it easier for others to fine like stuff

  • Shared meaning consequently evolves for tags

  • Most used tags become most visible

Strengths of folksonomies

  • Cost-effective way to organize Internet

  • Social benefits

  • It’s inclusive

  • For many environments, they work well

Issues with meaning

  • They do not yield the level of clarity that controlled vocabularies do

  • Term ambiguity – words with multiple meanings

  • No synonym control

Issues with specificity

  • Variable specificity for related terms

  • Broadness of terms impacts precision – terms are often imprecise

  • Mixed perspectives

Issues with structure

  • Singular and plural forms create redundant headings

  • No guidelines for the use of compound headings, punctuation, word order

  • No scope notes

  • No cross references

Issues with accuracy

  • Collective ‘wisdom’ of the tagging community

  • How does wrong information impact retrieval

  • Conflicting cultural norms

  • Sometimes authority counts

“Spagging” and other problems

  • Opening doors to opinion tags

  • Tagging wars

  • “Spagging”  Spam tagging

Tidying up the tags…?

  • Lists of tagging norms have been developed

  • Are there programmatic solutions?

  • Users know they are looking at tags

  • By tidying, do we destroy the essence of why this works?

  • Do we realistically have the resources?


Don’t assume that one size fits all

  • Retain controlled vocabularies in the catalog

  • Explore ways to use controlled vocabularies to help organize the internet by re-purposing controlled vocabularies that already exist

  • Invite Folksonomies to the party in the catalog to gain their benefits

  • Explore ways to combine the two systems


When you invite folksonomies into the catalog, do so strategically, and carefully

  • Don’t put terms in the same

    index as controlled vocabularies

  • Find ways to associate terms applied across editions of works

  • Need for mediation, or at least observation

  • The crowd is not necessarily the best arbiter of specific terminology


Always remember why people tag

  • People tag things because they want to find them, not because they want others to find them

  • Be aware that this will impact the quality of the terms, and their frequency


Controlled vocabularies could be better utilized than they currently are

  • Subject structures are underutilized in the ILS

  • Controlled vocabularies that exist are not being exported to the Web

  • Well-connected terms foster discovery – let’s connect them. Index those cross references where available


Margaret Maurer


  • Login