1 / 21

Web Community Mining and Web log Mining : Commody Cluster based execution

Web Community Mining and Web log Mining : Commody Cluster based execution. Romeo Zitarosa. Overview. Introduction Web Community Mining Web log mining on MIS Parallel Data Mining on Pc Cluster Performance Evaluation Conclusion. Introduction. Proposed two application of web mining:

telyn
Download Presentation

Web Community Mining and Web log Mining : Commody Cluster based execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Community Mining and Web log Mining : Commody Cluster based execution Romeo Zitarosa Mining di Dati Web

  2. Overview • Introduction • Web Community Mining • Web log mining on MIS • Parallel Data Mining on Pc Cluster • Performance Evaluation • Conclusion Mining di Dati Web

  3. Introduction • Proposed two application of web mining: 1) Extract web Communities 2) Understand Behaviour of Mobile Internet Users (Usage Mining) Mining di Dati Web

  4. Web Community Mining • Web Community def: A web Community is a collection of web pages created by individuals or association that have common interests on a specific topic. Mining di Dati Web

  5. Proposed technique • Starts from a set o seed • Based on RPA • Create a Community Chart Mining di Dati Web

  6. Authorities and Hubs • Authority : page with good contents on a topic linked by many good hub pages. • Hub : page with a list of hyperlink to valuable pages on a topic, that points to good authorities. • Community Core = Authority + Hubs Mining di Dati Web

  7. Web Community Mining • Algorithm: 1. Seed set 2. Apply RSA to each seed: Built web subgraph and extract (using HITS) hubs and authority. 3. Investigate how seed derive other seed as related pages. Mining di Dati Web

  8. Example 1. Consider that s derivest as related page and vice versa. “s” and “t” are pointed to by similar set of hubs. 2. Consider that s derivest as related page and but t doesn’t derives s. “t” is pointed to by many different hubs so “t” derives a different set of related pages Mining di Dati Web

  9. Observation In this way we define a symmertic derivation relationship for identify Communities. Def. Community : Set of pages strongly connected by “s.d.r”. Two Communities are related if a member of one community derives a member of the other community. Mining di Dati Web

  10. Web Community Chart • Def. Is a Graph that consist of communities as nodes and weighted edges between nodes. The weight represents the relevance of the community • We need a tool to browse Communities Mining di Dati Web

  11. Web Community Chart(2) • Label assigned manually • Box = list of URLs sorted by connectivity score. • Def. Connectivity score: number of derivation relatioship from the node to others node of the community. Mining di Dati Web

  12. Example Mining di Dati Web

  13. Mobile Info Search (MIS) • NTT laboratories • Goal : provide location aware information from internet collecting, structuring, filtering and organizing. • www.kokono.net Mining di Dati Web

  14. kokono There is a database-type resource between user and information souces (online maps,yellow pages, etc.) Mining di Dati Web

  15. MIS Functionalities • User Location Acquisition - GPS,PHS,postal number • Location Oriented Robot-Based Search(kokono) - search documents close to a location - display documents in order of distance written in the doc and user position • Location Oriented Meta Search - backbone database accessed by CGI programs. Mining di Dati Web

  16. Association Rule Mining • Support , confidence • Hierarchy => Taxonomy • Hierarchy allow to find not only rules specific to a location but also wider area that covers that location. • Identify Acces patterns of MIS users. • Prefetch information. • Reduce acces time. • Spatial information gives valuabel information to mobile users. Mining di Dati Web

  17. Sequential Rule Mining • Sequential Patterns • Derive how different services are used together. Example: Define the plan after checking the weather: Submit_weather = Wether Forecast  subimit_shop = Shop Info && shop_web = townpage  Submit_kokono = KOKONOSearch  Submit_map = MAP Mining di Dati Web

  18. Parallel DM and Pc Cluster • Parallel Apriori - nodes keep all candidate itemsets - scan indipendently the dataset - comunicate only at the end of the phase Problem : Too much memory used!!! Solution (Partial) : Hash Partitioned Apriori (HPA). - candidates are partitioned using hash function - each node buils candidate Itemsets - a lot of disk I/O when support is small Mining di Dati Web

  19. Parallel Algorithm for Association Rule Mining • Non partitioned generalized (NPGM) • Hash Partitioned (HPGM) - reduce communications • Hierarchical HPGM (H-HPGM) - candidate whoose root is identical allocated on the same node • H-HPGM with Fine Grain Duplicates (H-HPGM-FGD) - use remaining free space Mining di Dati Web

  20. Performance evaluation Oss. Time increase when support becomes small Mining di Dati Web

  21. Conclusion • Real web Mining application need high performance computing system • Pc Cluster with his scalable performance (and high costs) is a promising platform… Mining di Dati Web

More Related