Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu PowerPoint Presentation
Download Presentation
Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu

Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu

196 Views Download Presentation
Download Presentation

Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Intelligent Internet Agents for Distributed Data Mining{yzhang, sowen, sprasad, raj}@cs.gsu.edugjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad and Raj Sunderraman Department of Computer Science Georgia State University George Vachtsevanos School of Electrical and Computer Engineering Georgia Institute of Technology

  2. Outline • Motivation • Architecture of Intelligent Internet Agents • Program Libraries of Intelligent Middleware • Smart Web Search Agents • Intelligent Soft Computing Agents • Benefits • Deliverables • Conclusion

  3. Motivation • Distributed Web KDD: Useful information and knowledge mined in distributed Web databases • QoS (Efficiency, Web Speed, User Time) : Huge amounts of useless data flow on the Internet • From Data Web to Information Web: Upgrade a current data-flow-oriented Internet to a future information-flow-oriented Internet • Intelligent Web Middleware: with reusable, portable and scalable intelligent functionality • Smart E-Business: Use intelligent Web agents to do better E-Business on the Internet

  4. Architecture of Intelligent Internet Agents Application Layer: E-Commerce, E-Education, other E-B Intelligent Layer: Data Mining, Soft Computing, ES, etc Network Layer: Backbone, gigaPoPs, other hardware

  5. Program Libraries of Intelligent Middleware • Binary Association Rule Generator • 2. Fuzzy Association Rule Generator • Neural-Net-based Data Classifier and Pattern Generator • Fuzzy c-means Program for Data Clustering • Genetic Algorithms for Data Refinement and Optimization • Granular Neural Nets for Linguistic Data Mining • XML-based Smart Web Search Sub-Programs • Connection Programs between Database and Middle Layer • Local Cache Database Manager • Local Cache Informationbase Manager • Basic GUI Programs • Client-Server Creation and Communication Programs • Distributed Operation Manager • Distributed Data Mining Synchronization, • Web Customer Log Miner, .….. , and so on.

  6. Smart Web Search Agents • Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: - Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc) All of these involve keyword searches; Drawback: not easily personalized, too many results (although many give relevancy factors)

  7. - Smart Search Agents will provide - more personalized searches - domain-based search, - more efficient searches

  8. Smart Search Agents will employ - local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!) - local cache information base (containing mined information and discovered knowledge for efficient personal use) - domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)

  9. Some initial results: • M. Nagarajan, Metagenie - A metasearch engine for multi-databases, M.S. thesis, GSU (July 1999) Domains: Jobs, Books • S. Ahmed, EXACT-FINDER: A cache-based meta-search engine, M.S. thesis, GSU (May 2000) Local cache database storing personalized frequently asked queries and results, updated periodically •  R. Sunderraman, ReQueSS: Relational Querying of semi-structured data, ICDE 2000 (demo session),San Diego, CA, March 2000. • X. Li, Querying unified sources of Web data, M.S. thesis, GSU (July 1999) Data wrappers for Web sources (NBA stats/box scores, DBLP Bibliography database)

  10. Intelligent Tools for E-Business • Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems • Learning Algorithms, Heuristic Searching • Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery • Prediction & Time Series Analysis • Information Retrieval, Intelligent User Interface • Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems

  11. Enhancing E-Business Process Through Data Mining • Quality of discovered knowledge • Having right data • Having appropriate data mining tools!!! • Traditional Data Mining Tools • Simple query and reporting • Visualization driven data exploration tools, OLAP • Discovery process is user driven

  12. Intelligent Data Mining Tools • Automate the process of discovering patterns/knowledge in data • Require hypothesis, exploration • Derive business knowledge (patterns) from data • Combine business knowledge of users with results of discovery algorithms

  13. Intelligent Information Agents • The Data Mining Problem: • Clustering/ Classification • Association • Sequencing • Viewed as an Optimization Problem • Tools: Genetic Algorithms

  14. Fuzzy Rules Discovering • Rules discovering : The discovery of associations between business events, i.e. which items are purchased together • In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge • Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query • Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data

  15. Example of 3 Service Provider’s Features 3 R R R R ( isk- esponse- etention ( ) Model) Fuzzy Decision Making:Match Users with Dynamic Products, Services, and Pricing Low Risk High Response High Retention -> Customer: Preferred Pricing: according to Life-time Value Cross-Selling: Bundle Extra Liability Insurance Loss Ratio R Low Medium High ( isk) Persistency Low Medium High R ( etention) Low MediumHigh R esponse

  16. Measuring Performance of Intelligent Agents • Accuracy : distance or variance measure of IAs’ performance from their goal, i.e. Fuzzy Entropy • Speed : latency of response • Cost : resources consumed, consequences of failures • Benefit : payoff for goals achieved

  17. Performance Assessment, Learning and Optimization Learning/ Adaptation Performance Evaluation Module Goals/ Objectives

  18. Examples • Product Information Clustering • Use a GA as the Heuristic Search Engine • Apply the GA selection and inversion operators • Evaluate information content • Estimate system entropy • Apply reinforcement learning strategy • Dynamic Pricing • In addition to above steps, explore association and sequencing relations

  19. The “New Technology” Paradigm Internet Related Technologies Euphoria/ Optimism Reality Back to Basics Time

  20. INFORMATION IS SELLING NOW! Intelligent Agents will give your information product bargaining power

  21. Benefits • Better QoS: - Web users get information (not raw data) - Smart agents can make decisions for users - Smart agents can save users’ surfing time • Faster Internet: - Information flows on the Internet quickly (e.g., 1k information << 100 k raw data) - Reduce data redundancy on the Internet - Reduce Web communication congestion

  22. Deliverables • Intelligent Middle Layer - Data Mining Program Libraries - Soft Computing Program Libraries (e.g., Neural Networks, Fuzzy Logic, Genetic Algorithms, Neuro-fuzzy Systems) • Application Layer - Smart Web Search Agents - Intelligent Soft Computing Agents

  23. Conclusion • To make the future Internet more intelligent and more efficient, it is necessary to design relevant "Intelligent Middleware" between network hardware and high-level Web application systems. • We will first design basic intelligent middle layer with basic intelligent functionality, and then implement two Web application systems for distributed data mining and E-Business.