Galaxy of News: An Approach to Visualizing and Understanding Expansive News Landscapes Earl RennisonIn UIST `94, ACM Symposium on User Interface Softwareand Technology. New York: ACM Press, 1994. Paper presentation by Mark Sharp 17:610:554 Information Visualization, Prof. Spoerri 11/11/2002 554 paper pres.
Paper Summary • PROBLEM: Accessing and understanding news information is not well-supported by the information infrastructure. • VISION: An intelligent infrastructure that automatically builds the correlations and relationships between news articles and constructs an environment that allows readers to dynamically explore and gain understanding. 554 paper pres.
How does it work? • Articles have features (metadata) extracted by parsing algorithms, then they are clustered by ARN (a neural network algorithm) and mapped to a 3D space layout. • Nodes: keyword hierarchy / headlines / full text • Zoom in with left mouse button, out with right. [direct manipulation] • Animation (4D) helps user understand what system is doing. [motion: an early/pre-attentive visual cue] 554 paper pres.
Model components Temporal and behavior interaction: controls level-of-detail, user orientation cues, transition to new views. Spatial construction: can be 2-, 3-, or n-dimensional; uses relationships; dynamic (appropriate for news).. Relationships: designer-specified; e.g. temporal ordering… . News base:not raw data; objects and annotations (keywords, slugwords, location, time, subject, etc.); manually or automatically derived from raw data. 554 paper pres.
Which early / pre-attentivevisual processes are leveraged? Position Proximity Motion Brightness Size Color 554 paper pres.
What is working? • Principled (algorithmic) feature extraction and clustering. • Direct manipulation. • True zooming (seamless exploration of categories, document labels, and full texts). • Dynamic updating of content (new articles). 554 paper pres.
What is not working or clear? • Clustering based on skinny metadata rather than full text vectors. • Keywords are single words, not terms. • “Relationships”? 554 paper pres.
What surprised you? • Naivete about “understanding” and media studies. 554 paper pres.
Key Insights: what I learned • Detailed look into the architecture of a true large text corpus info viz system with many desirable features. 554 paper pres.
What is the key contribution? • True zooming (seamless integration of all levels) is feasible in large text corpora. 554 paper pres.
Take-away messages?What can be generalized? • Computational feasibility forces some compromises. • “What is not working” • Human heuristics (“relationships”?) BUT help is on the way (bigger iron) 554 paper pres.
3 questions for groupand class discussion. • Is volume and lack of organization really our biggest problem with modern news information? • Would you use Galaxy of News? Why or why not? • What other kinds of text data would you like to see this approach applied to? How might a different domain affect the specification of metadata object representations and/or “relationships”? 554 paper pres.
TileBars: Visualization of Term Distribution Information in Full Text Information AccessMarti HearstProceedings of the ACM SIGCHI Conference onHuman Factors in Computing Systems (CHI), pp. 59-66, Denver, CO, May 1995. Paper presentation by Mark Sharp 17:610:554 Information Visualization, Prof. Spoerri 11/11/2002 554 paper pres.
Paper Summary • PROBLEM: Traditional IR is focused on text databases consisting of titles and abstracts; assumptions are not necessarily appropriate for full text. • VISION: Utilize term distributionwithin the text as well as overall frequency to model document relevance. Replace opaque rankingwith a transparent means for swift appraisal of the query-document relationship. 554 paper pres.
How does it work? • TextTiling algorithm partitions full text into adjacent, non-overlapping, multi-paragraph segments reflecting subtopic structure based on term co-occurrence and repetition. • Segments are scored for similarity to query terms. • Display shows document length, term frequency, and term distribution across segments. 554 paper pres.
Length of rectangle : length of document Each gray square = 1 tile (segment) Tile darkness : term freq. Query term sets : tile rows 554 paper pres.
Which early / pre-attentivevisual processes are leveraged? Length Position Darkness (gray scale) 554 paper pres.
What is working? • Elegant rep. of document length. • Adjacency of tiles between term rows => overlap. • Gray scale leverages relative (vs. absolute) judgment. • Meaningful labels (start of text). • Direct click link from tiles to text segments. • Starting TREC/TIPSTER evaluation. 554 paper pres.
What is not working or clear? • Depends on skillful Boolean query formulation (e.g. no stopwords). • Doesn’t appear to be scalable to large queries (>3 conjunctive terms). 554 paper pres.
What surprised you? • “Because they do have a natural visual hierarchy, varying shades of gray show varying quantities better than color.” 554 paper pres.
Key Insights: what I learned • Relevance ranking is not the only game in town for putting cognitive cues on multi-document retrievals. 554 paper pres.
What is the key contribution? • Text segmentation can enhance traditional (whole-document) IR as well as “fact retrieval.” • Novel paradigms for text retrieval can be both principled and computationally efficient. 554 paper pres.
Take-away messages?What can be generalized? • Marti Hearst is a major player in text mining / text visualization. 554 paper pres.
3 questions for groupand class discussion. • Instead of integer term frequency, what else could be used to “color” the tiles for relevance? • How might documents be ranked? 554 paper pres.