1 / 21

Evolving dynamic web pages using web mining

Evolving dynamic web pages using web mining. Kartik Menon Smart Engineering Systems Laboratory Engineering Management Department University of Missouri-Rolla. Overview. Goal Web Mining General Principle behind web mining Web Data Web Access Pattern Clustering

Download Presentation

Evolving dynamic web pages using web mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolving dynamic web pages using web mining Kartik Menon Smart Engineering Systems Laboratory Engineering Management Department University of Missouri-Rolla

  2. Overview • Goal • Web Mining • General Principle behind web mining • Web Data • Web Access Pattern Clustering • Evolving web pages using cluster information • Clustering Techniques • Fuzzy C means • Experimental Set-up • Results • Conclusion and Future work • Questions

  3. Goal Cluster similar web access traversal patterns and train the system to understand the needs and demands of different users accessing the website and use this information to evolve web pages.

  4. Web Mining • Web Mining Learning about different users accessing a web page. • The needs and requirements of the user • Web Access Traversal Patterns • Links which are more popular than others • For example www.yahoo.com • Emails • Search engine • News • Greeting cards

  5. General Principle behind web mining • Gather web data from Web Log servers • Cluster web traversal patterns • Evolve web pages

  6. Web Data • What information is important for Mining • Links traversed (URL’s requested) • Documents downloaded • Time spent on the web page as compared total time spent • Web Traffic • GET or POST messages

  7. Web Access Pattern Clustering • Find users with similar web access patterns • Grouping and separating users • Concise representation of a system's behavior • Generalize about user needs and interests

  8. Evolving Web Pagesusing cluster information • The cluster information can be used • To know about users • Modify the web page • Web personalization • Evolving Web pages

  9. Clustering Techniques • Neural Nets • Kohonen’s Self Organizing Maps (SOMs) • Statistical • K-Means • Fuzzy Logic • Fuzzy C Means • Fuzzy ISODATA

  10. Fuzzy C Means • Is a data clustering technique where each data point belongs to a cluster to some degree that is specified by a membership function • If • X is a set of n data sample vectors • U is a partition of X in c part, • V are cluster centers • d^2 is an inner product induced norm • u grade of membership of xk to the cluster i between 0 and 1 • m is a parameter to increase or decrease the fuzziness

  11. Fuzzy C Means (contd)

  12. Experimental Set-up • Target the website http://campus.umr.edu. • Mine the web log files for web data. • The main problem is to convert the web sites accessed into numeric values. • Identify all the URLs from where you can go from this web page • Number these URLs from 1 to N where N is the Nth URL which can be accessed • Assign fuzzy weights (w(j)) to each URL that can be accessed • A Boolean variable s(j) is defined which is set to 1 if the jth URL is accessed by the user else s(j) is set to null.

  13. Experimental Set-up (contd.) • Define the data point x as the number corresponding to the for all the sites accessed by the user in that particular user session. • Apply fuzzy c-means by calculating Euclidean distance between the data sample as dij=|xj-ci| where xjbeing the data point and ci being the center of cluster i.

  14. Results : For 2 and 3 clusters

  15. Results :For 2 and 3 clusters(contd)

  16. Web Page Evolution • Use the clustered information as an input to modify the web page so that users having similar access patterns get same web page as compared to others • Adjust the placement of links • Remove certain links (if possible)

  17. Conclusions • Fuzzy c-means is an easy way of clustering similar web access patterns for different user sessions • The use of Euclidean distance was very helpful to learn more about these web access patterns. • The experiment provided easy results and plots which was highly interpretable • We observe that that fuzzy c-means provided stable results for the different data sets we took.

  18. Future Work • Use other clustering algorithms and compare • Developing self evolving web sites - sites that improve themselves by learning from user access patterns • The results which we got using the fuzzy clustering algorithms could be used to recommend the web master of the http://campus.umr.edu • Increase the popularity of the web page by tailoring it more to the needs of the users accessing it

  19. Questions ???

More Related