1 / 32

EAS313 Content Capture Technology Suite: EAI for the Web

EAS313 Content Capture Technology Suite: EAI for the Web. Scott McReynolds, Sr Manager, scottmc@sybase.com / 925 236 4558 Prashanth Ponnachath, Software Engineer pponnach@sybase,coml / 925 236 6286 Date 08/07/2003. Session Objectives. Information Management Challenges.

carlow
Download Presentation

EAS313 Content Capture Technology Suite: EAI for the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EAS313 Content Capture Technology Suite: EAI for the Web Scott McReynolds, Sr Manager, scottmc@sybase.com / 925 236 4558 Prashanth Ponnachath, Software Engineer pponnach@sybase,coml / 925 236 6286Date 08/07/2003

  2. Session Objectives

  3. Information Management Challenges • Quantity of information within and outside of enterprises has grown exponentially • Challenge to extract relevant information from a multitude of sources • Integrating extracted content that may be in different formats (EAI issues)

  4. Information Management Challenges • Task Specific Customization or Personalization • Combine data from several different sources into a new data source • Data aggregation for mining and analysis • Bottled up data by artificial network or security barriers

  5. Existing Capture Methodologies By Other Vendors Static data stored in databases • Not equivalent to storing dynamic data • Need to refreshed at regular intervals • Legal problems • More infrastructure investment

  6. Existing Capture Methodologies By Other Vendors Screen Scraping • Snooping the contents of some display memory of a smart terminal through its auxillary port • Parsing the HTML with programs designed to mine out patterns of content • Ugly, ad-hoc very likely to break on even minor changes to the format of the data being snooped.

  7. Content Capture Technology Suite (CCTS) What does it do ? • Set of API that capture dynamic content from a variety of sources into individual elements • Deploy and replay captured elements in any portal framework • Aggregate data from multiple sources into XML

  8. Technology Map

  9. Technology Driving CCTS – Feature Extraction Traditional Extraction Methodology • Outside in, based on HTML tags • Content feed breaks if page changes slightly

  10. Technology Driving CCTS – Feature Extraction CCTS Extraction Methodology • Inside out, based on features of content desired

  11. Object Identification

  12. Technology Driving CCTS – Feature Extraction Feature Extraction (FE) ensures reliability of content aggregation • Parses out information on a page and breaks down into specific components • Fuzzy logic “digital signature” or symbolic reference rather than a static link ensures persistent extraction of desired content • Pattern recognition through “object specific” parsers enable an extendable set of aggregated object

  13. Technology Driving CCTS – CCL Content Collection Language (CCL) • ‘Content bundle’ of everything needed to collect and playback desired content • Designed to be programmed through a user interface instead of by hand • Simple as a URL, but as powerful as a web scripting language

  14. Technology Driving CCTS – Navigation • Tightly coupled with Content Collection Language • Written in Java • Servlet based and can be easily tied to a GUI

  15. Technology Driving CCTS – CCL (continued) • New commands are easily added, not keyword based language • Can reside on the client or the server • Parsing and error management are shared by all commands. • Fast execution. • Used to eliminate session/calls to DB

  16. CCTS Architecture GUI or API

  17. CCTS Components Content Capture Engine • Takes in user input via a navigation GUI and generates the CCL or XML Playback Engine • Translates CCL statements into content Content Repository Interface • Deploy captured content into any portal repository

  18. CCTS Components Content Capture Workbench • Eclipse based GUI that allows users to capture and deploy content using a GUI • Reference implementation of Capture and CRI API • Design pattern that can be used as a reference to integrate any custom GUI to the CCTS API

  19. Suite of Powerful Content Aggregation Tools DataParts reduces the number of data tasks that require a programmer, and makes the remaining tasks easy to accomplish.

  20. Range of Solution Options

  21. EAI Tools • Grid Charts • Messaging Portlets • Integrated Scripting Environment • DataParts

  22. Grid Chart & Database Capture

  23. Messaging Portlets

  24. Integrated Scripting Environment

  25. DataParts: Overview

  26. DataParts: Find Content

  27. Find Content: Extract Article from Web Page

  28. DataParts: Content into XML Schema

  29. DataParts: HTML to XML Schema

  30. Demo : Sailing Event Web Application Scenario • You are a portal developer for a company managing sailing events • Assigned a task of creating a portal containing following information • Race Sites • Live weather information • Wind speed for last 12 hours as a graph • Tide information as a graph • Marine weather

  31. Demo : Sailing Event Web Application

  32. Questions

More Related