is698 web mining n.
Skip this Video
Loading SlideShow in 5 Seconds..
IS698 – Web Mining PowerPoint Presentation
Download Presentation
IS698 – Web Mining

Loading in 2 Seconds...

play fullscreen
1 / 19

IS698 – Web Mining - PowerPoint PPT Presentation

  • Uploaded on

IS698 – Web Mining. Min Song, Ph.D. Course Web Page and Moodle. Course structure. The course has two parts: Lectures - Introduction to the main topics One projects (done either individually or group)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'IS698 – Web Mining' - gaenor

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
is698 web mining

IS698 – Web Mining

Min Song, Ph.D.

Course Web Page



course structure
Course structure
  • The course has two parts:
    • Lectures - Introduction to the main topics
    • One projects (done either individually or group)
      • 1 research project.
  • Lecture slides will be made available on the course web page and on Moodle.
  • Class Participation: 10%
  • Assignments: 20%
  • Midterm: 25%
  • Projects: 45%
  • Knowledge/Experience of
    • Java programming
teaching materials
Teaching materials
  • Required Text
    • Web Data Mining: Exploring Hyperlinks, Contents and Usage data. By Bing Liu, Springer, ISBN 3-450-37881-2.
  • References:
    • Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann, ISBN 1-55860-489-8.
    • Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth, The MIT Press, ISBN 0-262-08290-X.
    • Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson/Addison Wesley, ISBN 0-321-32136-7.
    • Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7
  • Introduction
  • Data pre-processing
  • Association rules and sequential patterns
  • Classification (supervised learning)
  • Clustering (unsupervised learning)
  • Post-processing of data mining results
  • Question Answering
  • Full-Text mining
  • Partially (semi-) supervised learning
  • Opinion mining and summarization
  • Link analysis
feedback and suggestions
Feedback and suggestions
  • Your feedback and suggestions are most welcome!
    • I need it to adapt the course to your needs.
    • Let me know if you find any errors in the textbook.
  • Share your questions and concerns with the class – very likely others may have the same.
  • No pain no gain
    • The more you put in, the more you get
    • Your grades are proportional to your efforts.
rules and policies
Rules and Policies
  • Statute of limitations: No grading questions or complaints, no matter how justified, will be listened to one week after the item in question has been returned.
  • Cheating: Cheating will not be tolerated. All work you submitted must be entirely your own. Any suspicious similarities between students' work will be recorded and brought to the attention of the Dean. The MINIMUM penalty for any student found cheating will be to receive a 0 for the item in question, and dropping your final course grade one letter. The MAXIMUM penalty will be expulsion from the University.
  • Late assignments: Late assignments will not, in general, be accepted. They will never be accepted if the student has not made special arrangements with me at least one day before the assignment is due. If a late assignment is accepted it is subject to a reduction in score as a late penalty.
web mining examples
Web mining: Examples
  • Link analysis
    • How does Google work?
    • How to find communities on the Web?
  • Structured data extraction
  • Web information integration
example web data extraction
Example: Web data extraction

Data region1

A data record

A data record

Data region2

  • Data mining related conferences
    • Data mining: KDD, ICDM, SDM, …
    • Databases: SIGMOD, VLDB, ICDE, …
    • Web: WWW, …
    • Information retrieval: SIGIR, CIKM, …
  • Kdnuggets:
    • News and resources. You can sign-up!
  • Our text and reference books
what is web mining
What is web mining?
  • The process of discovering knowledge from web page content, hyperlink structure, and usage data
  • Builds on existing data and text mining techniques, but adds many new tasks and algorithms
  • Three types, based on sources of data (often combined in practice):
    • Web structure mining
    • Web content mining
    • Web usage mining
importance of web data mining
Importance of web data mining
  • The web is unique!
  • Amount of information is huge and still growing, on almost any topic, and changes continuously
  • No single editorial control: significant variations in quality, much duplication, and data formats vary widely
  • Significant information is linked (within and between web sites)
  • Web reflects a virtual society ---interactions among people, organizations, and automated systems, no longer limited by geography
  • The Web presents challenges and opportunities for mining
how to make best use of data
How to make best use of data?
  • Knowledge discovered from web data can be used for competitive advantage.
  • Online retailers (e.g., are largely driven by data mining.
  • Web search engines are based on information retrieval (text mining) and data mining, and NLP.
  • Web surfers/searchers need tools to find, recommend, organize, and extract useful information from the Web
semester research project
Semester Research Project
  • Individual, or groups of two (will grade each other)
    • Plus formal and informal feedback from instructor
  • Should be the beginning of what could be a publishable project.
    • On some aspect of web mining
  • Topic will be given by instructor or proposed by student and approved by instructor
  • Students present
    • Ideas early in the semester for feedback
    • Completed project at the end of the semester
  • Write a scientific paper at the end.
  • Publish as a technical report if not more (some have been published at AMIS and under review)
project biomedical fulltext mining
Project: Biomedical Fulltext Mining
  • Input data for Web Mining (particularly web content mining) consists of document surrogates, short web pages, email messages, etc.
  • Fulltext data (books and online articles) has become publically available.
  • Currently fulltext mining is not well studied.
  • Study fulltext mining in the context of Biomedical research problems.
required software
Required Software
  • Java (jdk1.6.0 or above)
  • Tomcat 6
  • Apache-ant-1.7.1
  • Eclipse 3.4
  • (