Marko Grobelnik Jasna Škrbec Jozef Stefan Institute

Social Context as a part of News-Archive-ExplorerWeb application for exploratory browsing of news streams and archives Marko Grobelnik JasnaŠkrbec Jozef Stefan Institute

Introduction • News publishers generate content archives • The goal is to build a system to make such archives usable through text mining & visualization • Archive characteristics: • Large corpora (up-to few M articles) • Rich meta data (specific for each archive) • Different input formats (xml structure) • Poor search interfaces (not specialized for archives)

What we want? • Application to… • help user search and browse through archives • help user read more about topics related to search • visualize how things are connected in time, place, stories, etc. • get user’s attention and interest in other related issues • tell more about searched content

Architecture Server side Client side Archive Preprocessing Enrycher SQL Server

Database model

Already done • Import archive xml files • New York Times archive (15M articles) • NYTimes LDC (1.7M articles) • Nature (300k articles), • Reuters (830k articles) • Server side • Import to database - PostgreSQL • Preprocessed with enrycher • Client side • Faceted Search interface (author, entity, keyword, publish date, category) • Showing context around searched content/article

Current version of the GUI

Showing relationships between entities

Plans for the future • Improve search (with narrowing criteria, suggestions) • Adding visualizations to show content in time, space and other contexts • Adding links to similar content (stories) • Adding links to outside resources (like dbpedia) or bring this resources inside this application • Integrate with tools developed in AILab to improve search and presentation of articles (SearchPoint, DocAtlas, …) • Improve usability & appearance of user interface

Topic landscape of the query “Clinton” from Reuters news 1996-1997 Query Search Results Topic Map Selected group of news Selected story

Visualization of social relationships between “Clinton” and other entities Query Named entities in relation

Topic Trends Tracking of the documents including “Clinton” US Elections US Budget Query Result set NATO-Russia Topic Trends Visualization Mid-East conflict Topics description

WW2 query “Pearl Harbor” into NYTimes archive Dec 7th 1941

WW2 query “Belgrade” into NYTimes archive Apr 6th 1941

WW2 query “Normandy” into NYTimes archive June 1944

Marko Grobelnik Jasna Škrbec Jozef Stefan Institute

Marko Grobelnik Jasna Škrbec Jozef Stefan Institute

Presentation Transcript

WG3: Innovative e-dictionaries

Michael Moll (CERN/PH)

B. Koroušić Seljak 1 , A. Kadvan 2 , H. Pakkala 3 , S. Bell 4

WG3: Innovative e-dictionaries

Rok Žitko Institute Jožef Stefan Ljubljana, Slovenia

1 Jožef Stefan Institute, Slovenia 2 CIMNE, Spain 3 Atos, Spain 4 Ibermática, Spain

Ivo Kodeli Jožef Stefan Institute , Slovenia ivan.kodeli@ ijs.si

Triplet Extraction from Sentences

Working prototype ready for TT searchpoint.ijs.si

Development of ATLAS Radiation Monitor

ICP Architecture: Execution and Control

INFLUENCES OF IRRIGATION AND N FERTILIZATION ON MAIZE (Zea mays L.) PROPERTIES Hrvoje PLAVSIC1

Article 6.3 Habitats Directive Implementation in Slovenia

The Role of Cosylab and the J. Stefan Institute in ACS