1 / 15

EVENT IDENTIFICATION IN SOCIAL MEDIA

EVENT IDENTIFICATION IN SOCIAL MEDIA. Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University. Social Media Sites Host Many “Event” Documents. “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]

Download Presentation

EVENT IDENTIFICATION IN SOCIAL MEDIA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

  2. Social Media Sites Host Many “Event” Documents • “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99] • Popular, widely known eventsPresidential Inauguration, Thanksgiving Day Parade • Smaller events, without traditional news coverageLocal food drive, street fair • … Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook Social media documents for “All Points West” festival, Liberty State Park, New Jersey, 8/8/08

  3. Identifying Events and Associated Social Media Documents • Applications • Event search and browsing • Local search • … • General approach: group similar documents via clusteringEach cluster corresponds to one event and its associated social media documents

  4. Event Identification: Challenges • Uneven data quality • Missing, short, uninformative text • … but revealing structured context available: tags, date/time, geo-coordinates • Scalability • Dynamic data stream of event information • Unknown number of events • Necessary for many clustering algorithms • Difficult to estimate

  5. Clustering Social Media Documents • Social media document representation • Social media document similarity • Social media document clustering • Clustering task: definition • Ensemble algorithm: combining multiple clustering results • Preliminary evaluation

  6. Social Media Document Representation Title Description Tags Date/Time Location All-Text

  7. Social Media Document Similarity Title • Text: tf-idf weights, cosine similarity Title Description A A A B B B Description • Time: proximity in minutes Tags Tags Date/Time-Keywords time Location-Keywords Date/Time • Location: geo-coordinate proximity Date/Time-Proximity Location Location-Proximity All-Text All-Text

  8. Social Media Document Clustering Framework Social media documents Document feature representation Event clusters

  9. Clustering: Ensemble Algorithm Ctitle Ensemble clustering solution Consensus Function: combine ensemble similarities Wtitle f(C,W) Wtags Ctags Wtime Ctime Learned in a training step

  10. Clustering: Measuring Quality • Homogeneous clusters ✔ • Complete clusters ✔ • Metric: Normalized Mutual Information (NMI)Shared information between clustering solution and “ground truth”

  11. Experimental Setup • Data: >270K Flickr photos • Event labels from Yahoo!’s “upcoming” event database • Split into 3 parts for training/validation/testing • Clusterers: single pass algorithm with centroid similarity • Weighing scheme: Normalized Mutual Information (NMI) scores on validation set • Consensus function: weighted average of clusterers’ binary predictions • Final prediction step: single pass clustering algorithm

  12. Preliminary Evaluation Results • Individual clusterer performance • Highest NMI: Tags, All-Text • Lowest NMI: Description, Title • Ensemble performance, compared against all individual clusterers • Highest overall performance in terms of NMI • More homogenous clusters: each event is spread over fewer clusters Details in paper

  13. Future Work: Alternative Choices Document similarity metric • Ensemble approach • Weight assignment • Choice of clusterers • Train a classifier to predict document similarity • Features correspond to similarity scores • All-text, title, tags, time, location, etc. • Numeric values in [0,1] • State-of-the-art classifiers: SVM, Logistic Regression, …

  14. Future Work: Alternative Choices • Final clustering step • Apply graph partitioning algorithms Requires estimating the number of clusters • Evaluation metrics: beyond NMI • Datasets • Flickr LastFM, YouTube • Exploit social network connections

  15. Conclusions • Identified events and their corresponding social media documents • Proposed a clustering solution • Leveraged different representations of social media documents • Employed various social media similarity metrics • Developed a weighted ensemble clustering approach • Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs

More Related