data integration for big data n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data Integration for Big Data PowerPoint Presentation
Download Presentation
Data Integration for Big Data

Loading in 2 Seconds...

play fullscreen
1 / 18

Data Integration for Big Data - PowerPoint PPT Presentation

  • Uploaded on

Data Integration for Big Data. Pierre Skowronski Prague le 23.04.2013. IT is struggling with the cost of Big Data. Growing data volume is quickly consuming capacity. Need to onboard, store, & process new types of data. High expense and lack of big data skills.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data Integration for Big Data' - overton

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data integration for big data

Data Integration for Big Data

Pierre Skowronski

Prague le 23.04.2013

it is struggling with the cost of big data
IT is struggling with the cost of Big Data
  • Growing data volume is quickly consuming capacity
  • Need to onboard, store, & process new types of data
  • High expense and lack of big data skills
prove the value with big data deliver value along the way
Prove the Value with Big Data Deliver Value Along the Way

Cost: Lower Big Data Project Costs

(helps self-fund big data projects)

Risk: Minimize Risk of New Technologies

(design once, deploy anywhere)

Delivery: Innovate Faster With Big Data

(onboard, discover, operationalize)

powercenter big data edition lower costs
PowerCenter Big Data EditionLower Costs

Optimizeprocessing with low cost commodity hardware

Traditional Grid

Transactions,OLTP, OLAP


Documents and Emails


Social Media, Web Logs

Increase productivity up to 5X

Machine Device, Scientific


5 x better productivity for similar p erformance
5 x better productivity for similar performance

In the worst, only 20% slower the hand-coding

Mostly, equal or faster

Inormatica 1 week vs hand-coding 5-6 weeks



PowerCenter Big Data EditionMinimize Risk

Quickly staff projects with trained data integration experts

Design once and deploy anywhere

Deploy On-Premise or in the Cloud

Traditional Grid

Pushdown to RDBMS or DW Appliance

graphical processing logic test on native deploy on hadoop
Graphical Processing LogicTest on Native, Deploy on Hadoop

Select incomplete

partial records

Separate incomplete and complete partial records

Partial records


Aggregate all completed and partial-completed records

Sort records by

Calling number

Separate partial records from completed records

Completed records only


run it simple on hadoop
Run it simple on Hadoop

Choose execution environment

Press Run

View hive query


minimaize risk with informatica partners and certified developer community





With Informatica

Expertise & best practices

Best practices & reusability

Minimaize Risk with Informatica Partners and Certified Developer Community

Global Systems Integrators

Informatica Developers

9,000+ trained developers

  • 45,000+ developers in Informatica TechNet
  • 3x more developers than any other vendor*


* Source: U.S. resume search on, December 2008

lower costs of big data projects saved 20m 2 3m on going by archiving optimization
Lower Costs of Big Data ProjectsSaved $20M + $2-3M On-going by Archiving & Optimization

The Challenge Data warehouse exploding with over 200TB of data. User activity generating up to 5 million queries a day impacting query performance

The Solution

The Result

Business Reports

  • Saved 100TBs of space over past 2 ½ years
  • Reduced rearchitecture project from 6 months to 2 weeks
  • Improved performance by 25%
  • Return on investment in less than 6 months





Archived Data

Phase 2

Interaction Data

Large Global Financial Institution

large global financial institution lower costs of big data projects
Large Global Financial InstitutionLower Costs of Big Data Projects

The Challenge. Increasing demand for faster data driven decision making and analytics as data volumes and processing loads rapidly increase

The Solution

The Result

  • Cost-effectively scale performance
  • Lower hardware costs
  • Increased agility by standardizing on one data integration platform

Near Real-Time





Traditional Grid

Phase 2




Phase 2

Web Logs

large government agency flexible architecture to support rapidly changing business needs
Large Government AgencyFlexible Architecture to Support Rapidly Changing Business Needs

Traditional Grid

The Challenge Data volumes growing at 3-5 times over the next 2-3 years

The Solution

The Result

  • Manage data integration and load of 10+ billion records from multiple disparate data sources
  • Flexible data integration architecture to support changing business requirements in a heterogeneous data management environment

Business Reports





Data Virtualization

Phase 2


Phase 2

Unstructured Data

why powercenter big data edition
Why PowerCenter Big Data Edition
  • Repeatability
    • Predictable, repeatable deployments and methodology
  • Reuse of existing assets
    • Apply existing integration logic to load data to/from Hadoop
    • Reuse existing data quality rules to validate Hadoop data
  • Reuse of existing skills
    • Enable ETL developers to leverage the power of Hadoop
  • Governance
    • Enforce and validate data security, data quality and regulatory policies
    • Manageability