discovering important bloggers based on analyzing blog threads by nakajima et al n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al PowerPoint Presentation
Download Presentation
Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al

Loading in 2 Seconds...

play fullscreen
1 / 20

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al. Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith. Introduction. People increasingly publish their reactions to public events using a blog

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al' - dara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
discovering important bloggers based on analyzing blog threads by nakajima et al

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al

Thomas van der Elsen, Richard Lawrence,Jumi Oladimeji, Alastair Smith

introduction
Introduction
  • People increasingly publish their reactions to public events using a blog
    • A tool that enables this info to be published quickly
    • A journal that is available on the web
  • Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web)
  • “Our goal is to develop a method of capturing hot conversations by automating readers’ processes for characterizing and monitoring blogs.”
overview
Overview
  • Data-mining techniques
    • Creation of blog link structure
    • Analysing link structure
  • Types of important bloggers
    • Agitators
    • Summarisers
  • Applications, analysis and conclusions
    • Real-world applications and extensions
    • Pros and cons of the paper
data mining techniques
Data-Mining Techniques

Crawling blogs

Extracting hyperlinks

Extracting blog threads

crawling blogs
System crawls through RSS list registering for each entry:

Title

Permalink

List entry date

Aggregator: gathers RSS feeds from multiple sources and organises them

OPML: file format used to share RSS feed lists

RSS: A format for distributing content on the web

Crawling blogs

OPML

Aggregators

RSS feeds

RSS list

extracting hyperlinks
Extracting hyperlinks
  • Problem: Different tag structures per server

RSS feed from list

Blog entries

Description

Hyperlink list

extracting blog threads
Extracting blog threads

Hyperlink

If replyLink

If sourceLink

Check departure URL exists in thread data

Check destination URL points to entry on list

Check links exist in thread data

&&

11

00

Add

10

Add dest entry to thread

01

Create new thread

Add destination entry to entry list and add to thread

Add departure entry to thread

types of users
Types of users

Agitators

Summarisers

Joe Bloggs

agitators
Discussion stimulator

Threads often grow after an agitator’s entry

Three discriminants for an agitator

Link (Agi1)

Popularity (Agi2)

Topic (Agi3)

The three discriminants can be weighted using the following formula:

Agitators
link based discriminant
Link-based Discriminant

ex is an agitator if

(kx) > θ1

  • ex = a blog entry
  • kx = no of entries in threadi with a replyLink to ex
popularity based discriminant
Popularity-based discriminant

ex is an agitator if

(lx/mx) > θ2

  • ex = a blog entry
  • lx = no of entries in threadipublished t days after ex
  • mx = no of entries in threadi published t days before ex
topic based discriminant
Topic-based discriminant

ex is an agitator if

  • ex = a blog entry
  • n = number of entries
summarizers
Summarizers
  • Publish entries that collate and compact previous posts
  • Provide a convenient way of digesting an entire thread
  • The discriminant for summarizers is link-based:

ex is a summarizer if (px) > θ4

  • ex = a blog entry
  • px = number of entries in threadi that have a replyLink from ex
analysis
Analysis

Applications

Pros and Cons

Conclusions

applications
Applications
  • Supplementary info e.g. TV, news site etc
    • Home and Away – who shot Josh West
      • Agitator
    • Sports, etc. – used by studios and media to highlight points of interest in a match
      • Summariser
analysis pros
Analysis – Pros
  • Basis for future research – a brief intro to the subject.
    • Multiple thread analysis
    • Identification of areas of bloggers’ expertise
  • Highly effective in certain specific areas
    • News and reviews
  • Implementation of theory (feature vector)
analysis cons
Analysis – Cons
  • Only 25 sites used in sample (but 1000s of blogs)
  • Does not take context into consideration
    • E.g., an agitator may be posting offensive entries
  • No measurement of summary success
  • Comments are not analysed
  • Inappropriate for certain areas
    • MySpace, Bebo, et al. (due to target audience)
conclusions
Conclusions
  • Created a data-mining framework for future research
    • May instigate research into further work
  • Nice idea and potentially useful but needs to be extended
any questions
Any Questions?

Thank you for your time