Eas313 content capture technology suite eai for the web
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

EAS313 Content Capture Technology Suite: EAI for the Web PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

EAS313 Content Capture Technology Suite: EAI for the Web. Scott McReynolds, Sr Manager, [email protected] / 925 236 4558 Prashanth Ponnachath, Software Engineer [email protected],coml / 925 236 6286 Date 08/07/2003. Session Objectives. Information Management Challenges.

Download Presentation

EAS313 Content Capture Technology Suite: EAI for the Web

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Eas313 content capture technology suite eai for the web

EAS313 Content Capture Technology Suite: EAI for the Web

Scott McReynolds, Sr Manager, [email protected] / 925 236 4558

Prashanth Ponnachath, Software Engineer [email protected],coml / 925 236 6286Date 08/07/2003

Session objectives

Session Objectives

Information management challenges

Information Management Challenges

  • Quantity of information within and outside of enterprises has grown exponentially

  • Challenge to extract relevant information from a multitude of sources

  • Integrating extracted content that may be in different formats (EAI issues)

Information management challenges1

Information Management Challenges

  • Task Specific Customization or Personalization

  • Combine data from several different sources into a new data source

  • Data aggregation for mining and analysis

  • Bottled up data by artificial network or security barriers

Existing capture methodologies by other vendors

Existing Capture Methodologies By Other Vendors

Static data stored in databases

  • Not equivalent to storing dynamic data

  • Need to refreshed at regular intervals

  • Legal problems

  • More infrastructure investment

Existing capture methodologies by other vendors1

Existing Capture Methodologies By Other Vendors

Screen Scraping

  • Snooping the contents of some display memory of a smart terminal through its auxillary port

  • Parsing the HTML with programs designed to mine out patterns of content

  • Ugly, ad-hoc very likely to break on even minor changes to the format of the data being snooped.

Content capture technology suite ccts

Content Capture Technology Suite (CCTS)

What does it do ?

  • Set of API that capture dynamic content from a variety of sources into individual elements

  • Deploy and replay captured elements in any portal framework

  • Aggregate data from multiple sources into XML

Technology map

Technology Map

Technology driving ccts feature extraction

Technology Driving CCTS – Feature Extraction

Traditional Extraction Methodology

  • Outside in, based on HTML tags

  • Content feed breaks if page changes slightly

Technology driving ccts feature extraction1

Technology Driving CCTS – Feature Extraction

CCTS Extraction Methodology

  • Inside out, based on features of content desired

Object identification

Object Identification

Technology driving ccts feature extraction2

Technology Driving CCTS – Feature Extraction

Feature Extraction (FE) ensures reliability of content aggregation

  • Parses out information on a page and breaks down into specific components

  • Fuzzy logic “digital signature” or symbolic reference rather than a static link ensures persistent extraction of desired content

  • Pattern recognition through “object specific” parsers enable an extendable set of aggregated object

Technology driving ccts ccl

Technology Driving CCTS – CCL

Content Collection Language (CCL)

  • ‘Content bundle’ of everything needed to collect and playback desired content

  • Designed to be programmed through a user interface instead of by hand

  • Simple as a URL, but as powerful as a web scripting language

Technology driving ccts navigation

Technology Driving CCTS – Navigation

  • Tightly coupled with Content Collection Language

  • Written in Java

  • Servlet based and can be easily tied to a GUI

Technology driving ccts ccl continued

Technology Driving CCTS – CCL (continued)

  • New commands are easily added, not keyword based language

  • Can reside on the client or the server

  • Parsing and error management are shared by all commands.

  • Fast execution.

  • Used to eliminate session/calls to DB

Ccts architecture gui or api

CCTS Architecture GUI or API

Ccts components

CCTS Components

Content Capture Engine

  • Takes in user input via a navigation GUI and generates the CCL or XML

    Playback Engine

  • Translates CCL statements into content

    Content Repository Interface

  • Deploy captured content into any portal repository

Ccts components1

CCTS Components

Content Capture Workbench

  • Eclipse based GUI that allows users to capture and deploy content using a GUI

  • Reference implementation of Capture and CRI API

  • Design pattern that can be used as a reference to integrate any custom GUI to the CCTS API

Suite of powerful content aggregation tools

Suite of Powerful Content Aggregation Tools

DataParts reduces the number of data tasks that require a programmer, and makes the remaining tasks easy to accomplish.

Range of solution options

Range of Solution Options

Eai tools

EAI Tools

  • Grid Charts

  • Messaging Portlets

  • Integrated Scripting Environment

  • DataParts

Grid chart database capture

Grid Chart & Database Capture

Messaging portlets

Messaging Portlets

Integrated scripting environment

Integrated Scripting Environment

Dataparts overview

DataParts: Overview

Dataparts find content

DataParts: Find Content

Find content extract article from web page

Find Content: Extract Article from Web Page

Dataparts content into xml schema

DataParts: Content into XML Schema

Dataparts html to xml schema

DataParts: HTML to XML Schema

Demo sailing event web application

Demo : Sailing Event Web Application


  • You are a portal developer for a company managing sailing events

  • Assigned a task of creating a portal containing following information

    • Race Sites

    • Live weather information

    • Wind speed for last 12 hours as a graph

    • Tide information as a graph

    • Marine weather

Demo sailing event web application1

Demo : Sailing Event Web Application



  • Login