Is698 web mining
Download
1 / 19

IS698 – Web Mining - PowerPoint PPT Presentation


  • 310 Views
  • Uploaded on

IS698 – Web Mining. Min Song, Ph.D. Course Web Page http://web.njit.edu/~song/courses/web_mining/is698_webmining_syllabus.html and Moodle. Course structure. The course has two parts: Lectures - Introduction to the main topics One projects (done either individually or group)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'IS698 – Web Mining' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Is698 web mining l.jpg

IS698 – Web Mining

Min Song, Ph.D.

Course Web Page

http://web.njit.edu/~song/courses/web_mining/is698_webmining_syllabus.html

and

Moodle


Course structure l.jpg
Course structure

  • The course has two parts:

    • Lectures - Introduction to the main topics

    • One projects (done either individually or group)

      • 1 research project.

  • Lecture slides will be made available on the course web page and on Moodle.


Grading l.jpg
Grading

  • Class Participation: 10%

  • Assignments: 20%

  • Midterm: 25%

  • Projects: 45%


Prerequisites l.jpg
Prerequisites

  • Knowledge/Experience of

    • Java programming


Teaching materials l.jpg
Teaching materials

  • Required Text

    • Web Data Mining: Exploring Hyperlinks, Contents and Usage data. By Bing Liu, Springer, ISBN 3-450-37881-2.

  • References:

    • Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann, ISBN 1-55860-489-8.

    • Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth, The MIT Press, ISBN 0-262-08290-X.

    • Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson/Addison Wesley, ISBN 0-321-32136-7.

    • Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7


Topics l.jpg
Topics

  • Introduction

  • Data pre-processing

  • Association rules and sequential patterns

  • Classification (supervised learning)

  • Clustering (unsupervised learning)

  • Post-processing of data mining results

  • Question Answering

  • Full-Text mining

  • Partially (semi-) supervised learning

  • Opinion mining and summarization

  • Link analysis


Feedback and suggestions l.jpg
Feedback and suggestions

  • Your feedback and suggestions are most welcome!

    • I need it to adapt the course to your needs.

    • Let me know if you find any errors in the textbook.

  • Share your questions and concerns with the class – very likely others may have the same.

  • No pain no gain

    • The more you put in, the more you get

    • Your grades are proportional to your efforts.


Rules and policies l.jpg
Rules and Policies

  • Statute of limitations: No grading questions or complaints, no matter how justified, will be listened to one week after the item in question has been returned.

  • Cheating: Cheating will not be tolerated. All work you submitted must be entirely your own. Any suspicious similarities between students' work will be recorded and brought to the attention of the Dean. The MINIMUM penalty for any student found cheating will be to receive a 0 for the item in question, and dropping your final course grade one letter. The MAXIMUM penalty will be expulsion from the University.

  • Late assignments: Late assignments will not, in general, be accepted. They will never be accepted if the student has not made special arrangements with me at least one day before the assignment is due. If a late assignment is accepted it is subject to a reduction in score as a late penalty.


Web mining examples l.jpg
Web mining: Examples

  • Link analysis

    • How does Google work?

    • How to find communities on the Web?

  • Structured data extraction

  • Web information integration


Example web data extraction l.jpg
Example: Web data extraction

Data region1

A data record

A data record

Data region2



Resources l.jpg
Resources

  • ACM SIGKDD

  • Data mining related conferences

    • Data mining: KDD, ICDM, SDM, …

    • Databases: SIGMOD, VLDB, ICDE, …

    • AI: AAAI, IJCAI, ICML, ACL, …

    • Web: WWW, …

    • Information retrieval: SIGIR, CIKM, …

  • Kdnuggets: http://www.kdnuggets.com/

    • News and resources. You can sign-up!

  • Our text and reference books


What is web mining l.jpg
What is web mining?

  • The process of discovering knowledge from web page content, hyperlink structure, and usage data

  • Builds on existing data and text mining techniques, but adds many new tasks and algorithms

  • Three types, based on sources of data (often combined in practice):

    • Web structure mining

    • Web content mining

    • Web usage mining


Importance of web data mining l.jpg
Importance of web data mining

  • The web is unique!

  • Amount of information is huge and still growing, on almost any topic, and changes continuously

  • No single editorial control: significant variations in quality, much duplication, and data formats vary widely

  • Significant information is linked (within and between web sites)

  • Web reflects a virtual society ---interactions among people, organizations, and automated systems, no longer limited by geography

  • The Web presents challenges and opportunities for mining


How to make best use of data l.jpg
How to make best use of data?

  • Knowledge discovered from web data can be used for competitive advantage.

  • Online retailers (e.g., amazon.com) are largely driven by data mining.

  • Web search engines are based on information retrieval (text mining) and data mining, and NLP.

  • Web surfers/searchers need tools to find, recommend, organize, and extract useful information from the Web


Semester research project l.jpg
Semester Research Project

  • Individual, or groups of two (will grade each other)

    • Plus formal and informal feedback from instructor

  • Should be the beginning of what could be a publishable project.

    • On some aspect of web mining

  • Topic will be given by instructor or proposed by student and approved by instructor

  • Students present

    • Ideas early in the semester for feedback

    • Completed project at the end of the semester

  • Write a scientific paper at the end.

  • Publish as a technical report if not more (some have been published at AMIS and under review)


Project biomedical fulltext mining l.jpg
Project: Biomedical Fulltext Mining

  • Input data for Web Mining (particularly web content mining) consists of document surrogates, short web pages, email messages, etc.

  • Fulltext data (books and online articles) has become publically available.

  • Currently fulltext mining is not well studied.

  • Study fulltext mining in the context of Biomedical research problems.



Required software l.jpg
Required Software

  • Java (jdk1.6.0 or above)

  • Tomcat 6

  • Apache-ant-1.7.1

  • Eclipse 3.4

  • BioFulltextMiner.zip (http://base.njit.edu/vline/BioFullTextMiner.zip)