Roger Rea IBM InfoSphere Steams Product Manager

Telecommunications Event Data Analytics Accelerator Overview and FunctionalityInfoSphere Streams Version 3.0 Roger Rea IBM InfoSphere Steams Product Manager

Important Disclaimer THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: • CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR • ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE. The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion. THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Accelerator for Telco Event Data Analytics – TEDA Contents • TEDA Overview and Functionality • Installation, Configuration, Troubleshooting • Live Demo

TEDA Overview • Real-time mediation and analytics on large volume of Call Detail Records (CDR) and Event Detail Records • Solution template, based on a proven application design • High processing throughput and scalability, inherent to the Streams platform • Special reliability measures are built into the application to allow recovery after fault conditions (hardware/database failures, etc.) • The application template can be adapted for customer use casesIBM Service, system integrators or customer developers to perform that task

98 Million subscribers Tier 1 operator in India Operate in 22 circles Mediation & Revenue Assurance Performance at IDEA 1.01 Billion CDRs in 2 hours for all circles running Telcordia IN average rate of 140K per second • 2 HS22 blade dual CPU quad core servers • 8 cores each, 2.5 GHz, 64 GB memory (total 16 cores) • Avg CPU utilization: 75% • Avg. memory utilization ~6GB

Accelerator for Telco Event Data Analytics – TEDA DashboardsVisualization Telco NetworkElements Master Script Application Control, Recovery Control, Monitoring Region Z Region Y Region X Aggregation Statistics Database DirectoryScan Enrichment, Lookup Parser, Decoder Deduplication Bloom Filter Call Detail Records (CDR) Input Data CDR Output Lookup Table Rules CDR Repository Database, Archiving Input Data Format: ASN.1, ASCII, Binary Lookup Data CDR Output Files Metadata Processing Status, Bloom Filter Dumps, Checkpoints, Statistics

Repository Statistics Rules EnrichmentsLookups Visualization Main Functional Blocks DeduplicationBloom filter In-memory Aggregations CDR File Reading Parsing InfoSphere Streams (ODBC)

CDR Parsers CDR data formats • Multiple data records in files • Formats are often proprietary (vendor specific) • Data usually encoded in ASN.1, binary or CSV format • Sample ASN.1 parser for a industry standard CDR format is built in Customer specific parsers • The customer or IBM Service can add new parsers. Format specification (e.g. ASN.1 grammar and description) required • The customer may integrate own parsers via C++ or Java interface Parsers used in telco projects • IBM built parsers for several switches so far (Ericsson, Nokia, Huawei, etc.)

Rule Representation and Transformation • Rules transcribed to standard format • Resolution of most inconsistencies in format and rule descriptions • Intent: CDR processing rules can be written/modified by a non-programmer • Text user interface for Rule Development and Management • Optimization • Developed optimizing compiler for automatic translation of CDR processing rules into Streams Processing Language (SPL) • Reuses code, processing, minimizes branches and long-running operations • Optimizes records to propagate only fields required downstream • Optimizes calls to lookup operators to minimize memory/processing requirements

De-duplication with Standard Bloom Filter a2 hash ak (a1,a2,..,ak) a1 CDR (…) M bits • The hash bits are checked against an in memory Streams table • No false negatives (guaranteed to catch all duplicates) • There will be false positives (label some non duplicates as duplicates) • CDRs labeled duplicates (true + false positive) are checked against warehouse to determine if they are truly duplicates • By varying the number of bits that come from the hash algorithm • An optimal hash size can be determined • Balances memory required for hash table, and database lookups Configuration Options • Number of expected entries per day • Rate of acceptable false positives, for example 0.000001 • Memory allocation is calculated on startup, based on these parameter

Aggregators • Statistics per cellDropped calls and call types (voice/SMS) per cell. Based on CDR record type and release cause. • Dropped calls for priority customers Number of dropped calls based on customer type and release cause. • Statistics for international calls (per target contry code)Dropped calls and call types (voice/SMS) per contry. Based on called number, CDR record type and release cause. • Statistics per provider (per target network code)Dropped calls and call types (voice/SMS) per provider. Based on called number, CDR record type and release cause. • Call termination codes per cell Number of termination codes (dropped/good calls,etc.) per cell. • Call termination codes per subscriber Number of termination codes (dropped/good calls,etc.) per subscriber type.

Visualization • A set of sample Cognos Real Time Monitoring (RTM) dashboards are supplied to demonstrate the possibilities • Streams Visualization feature can be used to show data (one example view/chart is documented in TEDA) • Data for visualization (counters) provided in database tables • Dashboards/Streams charts can be adapted and added either by IBM Service or System integrators to cover customer use cases

Visualization (Cognos RTM Dashboards)

Visualization (Streams Charts)

Dashboards - Example: Terminated Calls

Dashboards - Example: Voice and SMS calls

Customization/Configuration Configuration • Input directory names • Bloom filter history/probabilities • Application infrastructure (regions, parallelization, hosts,etc.) • Reliability parameters (commit count, bloom checkpoints) • Database configuration (names, schemata, user ids, etc.) Customization • Parser and CDR formats • Transformation rules • Aggregations, define types, DB table layouts, etc. • Dashboards can be adapted via Cognos RTM GUI

Installation, Configuration, Troubleshooting

TEDA: Product – Blueprint? • What is product like? • Is installed by an Installer • Provides its function just after installation (is able to process files and write to DB) • Can be configured. • This function is supported by IBM (e.g. Standard defect and usage support)Customer brings-up TEDA first time on its environment • But ...all projects will customize the function of TEDA by using own SPL code, primitive operators, new SPL schema etc. a new project specific solution is builtThis is development work, which has to be done normally by customer development department or IBM project teams. SWG services should be ordered for this if customer is not able to do it by its own. • IBM does not support the derivative works

TEDA requirements • General pre-condition for TEDA as an InfoSphere Stream application • Redhat release as supported by InfoSphere Streams 3.0 • Environment as needed for InfoSphere Streams 3.0 • InfpSphere Streams 3.0 installed and running for user starting TEDA • TEDA specific requirements • unixODBC • DB2 9.7 ESE (local server or client) • Perl DBI / DBD modules

TEDA environment variables .bashrc #Streams env source ~/InfoSphereStreams/bin/streamsprofile.sh # DB2 client env source /home/db2inst1/sqllib/db2profile export DB2HOME=$DB2DIR # unixODBC env export UNIXODBC_HOME=/usr/local/unixODBC export ODBCSYSINI=$UNIXODBC_HOME/etc export PATH=$UNIXODBC_HOME/bin:$PATH export LD_LIBRARY_PATH=$UNIXODBC_HOME/lib:$LD_LIBRARY_PATH

Installation ./teda_linux_x86_64.bin • IBM Installer GUI • ask for accepting licence conditions • ask if response file should be created from the input of the running installation • ask for directory where to install TEDA • Installation logs <install_dir>/logs/install

Configure for first run • If the customer installed the DB and host environment just as described in TEDA-Quickstart-Guide.pdf or InfoCenter, every single step!, TEDA is ready to run • If customer did not because having own DB already running,TEDA and DB has to be configured first • Setup the DB (depends on customer environment): • Client driver • Create instance • Create the tables (with TEDA delivered .ddl files) • Configure TEDA • DB names • DB user and credentials • DB schemas • hostname

Configuration if not the standard environment used NODES= DBNAME=ISS DBUSER= DBPWD= PROD_DBS=STAGING,COGNOS STAGING_DBNAME=TESTDB2 STAGING_DBUSER= STAGING_DBPWD= STAGING_DBSCHEMA=STAGING COGNOS_DBNAME=ISS COGNOS_DBUSER= COGNOS_DBPWD= COGNOS_DBSCHEMA=COGNOS_DEMO • No further configuration for initial run!!!First bring up TEDA as delivered before customizing.

Building TEDA • Login on a terminal as user having needed environment • Change to <teda_install_dir>/demo/application • make all • the complete application is built • Hint: If the ROLLSET is changed in configuration, the startup.pl script is building the whole application during application start with actual configuration settings.

Run TEDA – The startup.pl script • TEDA is completely controlled by startup.pl script • Simplest possible O&M interface for customer • No need to know something about Streams for O&M staff • Checks all DB connections • Creates instance with configured hosts • Starts instance • Builds application (if ROLLSET changed) • Submits job • Checks PE health • Controls recovery in case of failures • Cancels job • Deletes instance • Moves processed files

Run TEDA – The startup.pl script • Detailed description • TEDA-Operations.pdf or InfoCenter • ./startup.pl --help • ./startup.pl –manual • Sample usage • ./startup.pl –-rollset=1 –-retry=0 –-verbose=3 --log logfile.log • ./startup.pl –-shutdown • ./startup.pl –-info • ./startup.pl –-stop --force

Trouble shooting – stop escalation • Normal : ./startup.pl --shutdown • 1. level : ./startup.pl --stop • 2. level : ./startup.pl --stop --force • 3. level : ./startup.pl --infokill -9 <pid_startuplpl>streamtool canceljob <jobid> -i <instanceid>streamtool stopinstance -i <instanceid> streamtool rminstance -i <instanceid>rm ~/.streams/*.lock

Documentation • <install_dir>/documents • TEDA-ASN.1-Parsers.pdf • TEDA-Configuration.pdf • TEDA-Dashboards.pdf • TEDA-Operations.pdf • TEDA-Quickstart-Guide.pdf • TEDA is part of the InfoSphere Streams 3.0 Information CenterRule-Compiler documentation is available only there

Rules engine – Lookup example (1) lookup(?, T, ?) { @return TABLE("rules.ddl", T); @operator "lookup"; } ... RULE TR4 { VAR X = lookup([MscCallingSubsFirstCi], "CELL_SITE_ID_TO_REGION_ID_MAP", "CELL_SITE_ID") => [RegionId] = X.REGION_ID; }

Rules engine – Lookup example (2) • Use DB table description (.ddl file) as input for rule engine. Customer has this .ddl description because the lookup data normally comes from the DB (e.g. CRM, DIM tables etc.) -------------------------------------------------- -- create Table COGNOS_DEMO.CELL_SITE_ID_TO_REGION_ID_MAP -------------------------------------------------- create table COGNOS_DEMO.CELL_SITE_ID_TO_REGION_ID_MAP ( CELL_SITE_ID integer not null, REGION_ID integer ) ; • Use CSV file as lookup data input source according the table descriptin in DDL file. CSV filename is same as used table name. Normally provided by DB dump from CRM, DIM tables etc. • Lookup data is only read once at startup of the application!

Live Demo

THINK

Roger Rea IBM InfoSphere Steams Product Manager

Roger Rea IBM InfoSphere Steams Product Manager

Presentation Transcript

Nigel Adams IBM i Product Manager IBM Power Systems nigel_adams@uk.ibm.com

Product manager

Product Manager

IBM Infosphere Conference

Program Manager Product Manager

Roger Hey Future Networks Manager

Luis Casco-Arias IBM Tivoli Security WW Senior Product Manager

IBM SanFrancisco Product Evaluation

InfoSphere Integration and Product Roadmap

IBM Rational Quality Manager

InfoSphere Streams

Roger Rea IBM InfoSphere Streams Product Manager

IBM InfoSphere Clinical Analytics

IBM Endpoint Manager - Bigfix

IBM InfoSphere Master Data Management | MDM Online Training at VirtualNuggets

Roger Kazemier - Field Operation Manager