110 likes | 188 Views
Explore the vast world of web data management. Learn to search, understand and rank results, combine sources, and make recommendations. Dive into types of web data, challenges, and advanced topics for efficient data handling.
E N D
Web Data • The web has revolutionized our world • Data is everywhere • Constitutes a great potential • But also a lot of challenges • Web data is huge, unstructured, heterogonous, partially incorrect.. • Just the ingredients of a fun topic!
Goals • Searching for relevant web-pages • E.g. given keywords • Understanding the results • Ranking the results • Combining results from different sources • E.g. Social networks +Search history • Combining rankings • Recommendations • Movies, restaurants..
Types of Data On the Web • Text • XML • Tables • Hyperlinks • Semantic tags • …
Challenges • Scale • The web is huge.. • Heterogonous sources • Different models and analysis techniques need to be designed • Uncertainty • A lot of errors (intentional or not) in data • A lot of errors in understanding data • Probabilistic modeling will be needed
Ingredients (Unordered) • Web Data Types • Semi-structured • Structured • Unstructured • Modeling & Storage • XML, text and relational DB representation • XML Typing & querying • Text models • Search and Retrieval • Crawling • Querying • Information Retrieval and Extraction (basics)
Text Analysis • POS tagging • Ranking • HITS algorithm • Google PageRank • Rank Aggregation and Top-K algorithms • Recommendations • Collaborative Filtering • The NetFlix Million Dollars Challenge
Semantic Web • Onthologies • Data Integration • Deriving semantic information • Wikipedia as an example • Web Services and Business Processes • BPEL, WSDL standards • Orchestration • Mashups • Analysis
Advanced Topics (time permitting) • Querying the deep web • Online advertisements • Models • Algorithms • Distributed Data Management • MapReduce and PigLatin
Resources • Web-site • Accessible from http://cs.tau.ac.il/~danielde • Slides, exercises, links.. • Book • http://webdam.inria.fr/Jorge/index.php • Free full version available online • Papers • Links will be available when relevant
Your Duties • 70% Final Exam • 30% Exercises • Including programming tasks