1 / 20

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al. Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith. Introduction. People increasingly publish their reactions to public events using a blog

dara
Download Presentation

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al Thomas van der Elsen, Richard Lawrence,Jumi Oladimeji, Alastair Smith

  2. Introduction • People increasingly publish their reactions to public events using a blog • A tool that enables this info to be published quickly • A journal that is available on the web • Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web) • “Our goal is to develop a method of capturing hot conversations by automating readers’ processes for characterizing and monitoring blogs.”

  3. Overview • Data-mining techniques • Creation of blog link structure • Analysing link structure • Types of important bloggers • Agitators • Summarisers • Applications, analysis and conclusions • Real-world applications and extensions • Pros and cons of the paper

  4. Data-Mining Techniques Crawling blogs Extracting hyperlinks Extracting blog threads

  5. System crawls through RSS list registering for each entry: Title Permalink List entry date Aggregator: gathers RSS feeds from multiple sources and organises them OPML: file format used to share RSS feed lists RSS: A format for distributing content on the web Crawling blogs OPML Aggregators RSS feeds RSS list

  6. Extracting hyperlinks • Problem: Different tag structures per server RSS feed from list Blog entries Description Hyperlink list

  7. Extracting blog threads Hyperlink If replyLink If sourceLink Check departure URL exists in thread data Check destination URL points to entry on list Check links exist in thread data && 11 00 Add 10 Add dest entry to thread 01 Create new thread Add destination entry to entry list and add to thread Add departure entry to thread

  8. Example Results

  9. Types of users Agitators Summarisers Joe Bloggs

  10. Discussion stimulator Threads often grow after an agitator’s entry Three discriminants for an agitator Link (Agi1) Popularity (Agi2) Topic (Agi3) The three discriminants can be weighted using the following formula: Agitators

  11. Link-based Discriminant ex is an agitator if (kx) > θ1 • ex = a blog entry • kx = no of entries in threadi with a replyLink to ex

  12. Popularity-based discriminant ex is an agitator if (lx/mx) > θ2 • ex = a blog entry • lx = no of entries in threadipublished t days after ex • mx = no of entries in threadi published t days before ex

  13. Topic-based discriminant ex is an agitator if • ex = a blog entry • n = number of entries

  14. Summarizers • Publish entries that collate and compact previous posts • Provide a convenient way of digesting an entire thread • The discriminant for summarizers is link-based: ex is a summarizer if (px) > θ4 • ex = a blog entry • px = number of entries in threadi that have a replyLink from ex

  15. Analysis Applications Pros and Cons Conclusions

  16. Applications • Supplementary info e.g. TV, news site etc • Home and Away – who shot Josh West • Agitator • Sports, etc. – used by studios and media to highlight points of interest in a match • Summariser

  17. Analysis – Pros • Basis for future research – a brief intro to the subject. • Multiple thread analysis • Identification of areas of bloggers’ expertise • Highly effective in certain specific areas • News and reviews • Implementation of theory (feature vector)

  18. Analysis – Cons • Only 25 sites used in sample (but 1000s of blogs) • Does not take context into consideration • E.g., an agitator may be posting offensive entries • No measurement of summary success • Comments are not analysed • Inappropriate for certain areas • MySpace, Bebo, et al. (due to target audience)

  19. Conclusions • Created a data-mining framework for future research • May instigate research into further work • Nice idea and potentially useful but needs to be extended

  20. Any Questions? Thank you for your time

More Related