Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Alternatives to Metadata IMT 589 February 28, 2004
Automatic Indexing • Rule-based systems • Legacy from early AI days • Require intensive upfront effort to build • Usually pretty domain specific • Don’t tend to scale well • Bayesian • Rely on similar document types for good success • Requires training sequence • Problems with scaling again IMT589- Applied and Structural Metadata
Automatic Indexing • Natural language approaches • Requires sophisticated processing techniques to obtain word matches • Highly computing intensive • Again problems with scaling • Other approaches • Clustering algorithms- http://www.entrieva.com • Latent Semantic Indexing- http://lsi.research.telcordia.com/ IMT589- Applied and Structural Metadata
Google • Uses inherent characteristics of HTML markup to build associations • Relies on human linking for relevance • Enhances with markup characteristics • New approach, based on widespread adoption of a simple standard • Relies on large body of self-referring content for success IMT589- Applied and Structural Metadata
Semantic Web • Ambitious undertaking to provide context for everything • Example of automated metadata generation dependent on existing classification scheme • High processing overhead for large quantities • Probably not sufficient for precise access in local content sets • Shirky’s cautions reflect the realities of the world- but it’s a noble goal IMT589- Applied and Structural Metadata
Where Does Metadata Fit? We tend to think that the hard problems are the big ones. So we believe that searching the Web is hard because it's so huge. But I've been thinking lately that the really hard problems are actually the ones in the middle. In the middle, many algorithms don't work that well with moderate document sets, context becomes more important, interaction is critical, and you can't get the user "in the ballpark" anymore--you have to get them to right to the thing they're looking for. Karl Fast- http://lists.ibiblio.org/mailman/private/aifia-members/2004-February/001129.html IMT589- Applied and Structural Metadata