Agenda

Large scale near real-time log indexing with Apache Flume and SolrCloudHadoop Summit 2013 Ari FlinkOperations ArchitectCisco/WebExJune 27th, 2013

Agenda • Intro • Problem to solve? • How does Flume/Solr help? • Syslog indexing example • HA, DR & scalability

Me Ops Architect at Cisco CCATG(WebEx) • Ensure operational readiness for complex distributed services • HA, DR, monitoring, config, deployment Previously eBay, Excite@Home, IBM, VISA • Operations architecture, monitoring, event correlation

What’s that funny accent?

Cisco CCATG - Cloud Collaboration Application Technology Group • Cisco WebEx Meetings • Voice, video, desktop sharing • Meeting/Event/Support/Training • Centers • Integration with TelePresence • Cisco WebEx Messenger • IM, presence • Integrate with voice, video • XMPP • Cisco WebEx Social • Social networking • Content creation • Integrated IM

Cisco WebEx - Leader for SaaS Web Conferencing Participants from over 231 countries, 52% market share 9.4 million registered hosts worldwide 40.5 Million meeting attendees per month 2.2 Billion meeting minutes per month 4Million mobile downloads

Cisco WebEx Collaboration Cloud Global Scale: 13 datacenters & iPoPs around the globe Dedicated network: dual path 10G circuits between DCs Multi-tenant: 95k sites Real-time collaboration: voice, desktop sharing, video, chat Datacenter / PoP Leased network link

S#!% happens .. People make mistakes Hardware fails Software fails Even failovers sometimes fail Datacenter / PoP Leased network link

Recovery-Oriented Computing Philosophy “If a problem has no solution, it may not be a problem, but a fact, not to be solved, but to be coped with over time” — Shimon Peres (“Peres’s Law”) People/HW/SW failures are facts, not problems • Operations main goal is to maintain high service availability • Recovery/repair is how we cope with above facts • Improving recovery/repair improves availability • UnAvailability= MTTR / MTBF • 1/10th MTTR just as valuable as 10x MTBF

How could we make recovery & repair faster? Good: reactive Your search – What is the root cause of the outage? – did not match any documents. Even better: proactive

The goal: NRT full text log search

Cisco WebEx log collection overview Unstructured/semi-structured data Structured data HTTP/REST Syslog Thrift RDBMS Log4j AMQP Avro File Flume Sqoop Solr Sink Cisco UCS C240 M3 servers 12 x 3TB = 36 TB / server Other Sinks HDFS Sink Application state & APIs SolrCloud HDFS Solr index Raw data MySQL

Global Flume topology DC 1 DC N syslog log4j file syslog log4j file … Collector tier Flume Flume Flume Flume Flume Flume DC 2 DC 1 Flume Flume Flume Flume Flume Flume Storage tier SolrCloud SolrCloud HDFS HDFS

Flume Collector tier … Failover & load balancing agents agent agent agent All events replicated to both Channels Avro src Replicating fan-out flow File Channel 1 File Channel 2 Flume Collector server DC1 Avro sink DC2 Avro sink DC1 DC2 Flume Storage tier

Global Flume topology DC 1 DC N syslog log4j file syslog log4j file … Collector tier Flume Flume Flume Flume Flume Flume DC 2 DC 1 Flume Flume Flume Flume Flume Flume Storage tier SolrCloud SolrCloud HDFS HDFS

Flume Storage tier … Flume Collector Flume Collector Failover & load balancing agents Flume Collector Avro src Routing to Solr by Flume event header All events to HDFS Multiplexing fan-out flow File Channel 1 File Channel 2 Flume Storage tier server Solr Sink HDFS sink SolrCloud HDFS

Schema or no schema .. • Isn’t Big Data “schema on read”? • Why does Solr require a schema on write? • Dirty little secret: there’s always a schema • Performance & functionality vs flexibility • Optimize operations and storage based on field type - that's how you get sub second response times • There’s always a schema • Application code vs. central location

Cloudera Morphlines: streaming ETL Flume event = headers + body • Cloudera Morphlines • Framework to simplify event transformation • Compatible with existing grok patterns • Reusable across multiple index workloads: Flume & M/R SolrSink Command: readLine Record Command: grok Record … Command: addValues Record Record Command: tryRules Record Command: loadSolr Document matching schema.xml Solr

Flume Morphline example Convert syslog message.. <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 .. into Solr schema fields Severity=[3] Facility=[22] host=[colo01-wxp00-ace01b-connect.webex.com] timestamp=[2013-06-16T04:36:49.000Z] syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] severity_label=[error] access_token=[54asdf654] id=[b2f839c3-dece-404f-a535-e0141ad549bf] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 1: readLinereads in Flume event headers and body timestamp=[1371357409000] host=[colo01-wxp00-ace01b-connect.webex.com] category=[545f5sfsd5sf] Severity=[3] Facility=[22] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] Headers Body

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 2: convertTimestampconverts epoch to ISO 8601 format timestamp=[2013-06-16T04:36:49.000Z] host=[colo01-wxp00-ace01b-connect.webex.com] access_token=[545f5sfsd5sf] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] Severity=[3] Facility=[22]

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 3: addValuescreates new field access_token timestamp=[2013-06-16T04:36:49.000Z] category=[545f5sfsd5sf] access_token=[545f5sfsd5sf] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] host=[colo01-wxp00-ace01b-connect.webex.com] Severity=[3] Facility=[22]

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 4: tryRulescreates field severity_label for severity timestamp=[2013-06-16T04:36:49.000Z] severity_label=[error] access_token=[545f5sfsd5sf] message=[<179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] host=[colo01-wxp00-ace01b-connect.webex.com] category=[545f5sfsd5sf] Severity=[3] Facility=[22]

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 5: tryRules creates new fields syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 6: sanitizeUnknownSolrFieldsdrops non-schema fields timestamp=[2013-06-16T04:36:49.000Z] syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] severity_label=[error] access_token=[545f5sfsd5sf] host=[colo01-wxp00-ace01b-connect.webex.com] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 7: generateUUIDcreates an unique id for the document timestamp=[2013-06-16T04:36:49.000Z] syslog_message=[%ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234] severity_label=[error] access_token=[545f5sfsd5sf] id=[b2f839c3-dece-404f-a535-e0141ad549bf] host=[colo01-wxp00-ace01b-connect.webex.com] cisco_product=[ACE] cisco_level=[3] cisco_id=[251008] cisco_code=[%ACE-3-251008]

Flume Morphline example Convert syslog message <179>Jun 16 04:36:49 colo01-wxp00-ace01b-connect.webex.com Jun 16 2013 04:36:49 : %ACE-3-251008: Health probe failed for server 10.240.22.111 on port 1234 Step 8: loadSolrloads a record into a Solr server

Morphlines recap Flume syslog event = headers + body SolrSink Command: readLine Record Command: grok Record … Command: addValues Record Record Command: tryRules Record Command: loadSolr Document matching schema.xml SolrCloud

SolrCloud Add doc to syslog index Where can I index data? • Collections, shards & replicas • Pluggable file system • Central config & coordination with ZK • Full HA, automatic fail-over • NRT indexing • Automatic routing Collection ZooKeeper Shard2 Shard3 Shard1 zk1 leader1 replica2 replica3 leader3 zk2 replica1 leader2 leader3 zk3 SolrCloud cluster Pluggable filesystem (local, HDFS)

SolrCloud Collection “syslog” with three shards

Solr index management Special case of search • Logs are time series data: timestamp + data • High indexing rate, no updates • New data is more frequently searched than old Collection aliases • Time partitioned collections – e.g. one collection per day • Reduces the workload to near-real-time data only • One-to-many collection mapping: queries go to a logical representation mapped to multiple, same-schema collection • Simplifies for hot-warm-cold migration of data Index expiration • Old data is aged out by Collection Aliases • Remap only the latest collection to an alias

Solr & HDFS DR Solr • No multi-datacenter cluster support HDFS • No multi-datacenter cluster support Options? • All our services must survive DC outage • . . so should logging and indexing

DR option 1: Flume dual writes DC 1 DC 2 DC N syslog log4j file syslog log4j file syslog log4j file … Flume Flume Collector disk channel buffering DC1 events Flume Collector tier Flume Flume Flume Flume Flume Flume Flume DC1 Hadoop cluster back online after outage DC 2 Storage tier DC 1 Replicate aggregate data Planned or unplanned outage Flume Flume Flume Flume Flume Flume SolrCloud SolrCloud HDFS HDFS

DR option 2: distcp + M/R DC 1 DC 2 DC N syslog log4j file syslog log4j file syslog log4j file … Collector tier Flume Flume Flume Flume Flume Data sent only to a single DC Flume Flume Flume Flume DC1 back online, sync data from DC2 DNS CNAME change back to DC1 DC 1 DC 2 Flume buffering events at collector tier Create indexes with M/R Flume Flume Manual CNAME change to DC2 Flume Flume Flume Flume Storage tier Flip distcp the other way SolrCloud SolrCloud distcp HDFS HDFS distcp

Tiers to scale • Flume Collector tier • Flume Storage tier • SolrCloud

Flume Collector tier throughput 100 – 5000 servers per a datacenter … … agent agent agent agent agent agent More agents and data Max per server: 14MB/s 1.2 TB/day 70k events/s NIC: 100MB/sec Avro src Avro src FileChannel:14MB/sec … Replicating fan-out flow Replicating fan-out flow File Channel 1 File Channel 2 File Channel 1 File Channel 2 Flume Collector DC1 Avro sink DC2 Avro sink DC1 Avro sink DC2 Avro sink NIC: 100MB/sec

Scalability: Flume Storage tier DC 1 collectors DC 2 collectors DC N collectors … Avro sink 1 Avro sink 2 Avro sink N … Avro sink 1 Avro sink 2 Avro sink N Avro sink 1 Avro sink 2 Avro sink N … … Avro src Avro src Avro src Avro src Multiplexing fan-out flow Multiplexing fan-out flow Multiplexing fan-out flow Multiplexing fan-out flow File Chan2 File Chan1 File Chan1 File Chan2 File Chan2 File Chan1 File Chan1 File Chan2 Max per server: 14MB/s 1.2 TB/day 70k events/s HDFS sink HDFS sink HDFS sink Solrsink HDFS sink Solrsink Solrsink Solrsink DC 1 storage tier DC2 storage tier Flume 1

SolrCloud scalablity Search queries New logs to index 1000 tx/sec/core ZooKeeper Shard2 Shard3 Shard1 2x8 cores 16k tx/sec zk1 leader1 replica2 replica3 zk2 replica1 leader2 leader3 3 shards 3 x 16k = 48k tx/sec zk3 SolrCloud cluster Pluggable filesystem (local, HDFS)

Syslog indexing recap Central syslog servers • Network and OS system messages forwarded to several central syslog servers Forward syslog to Solr using Flume Morphline SolrSink • Parse messages with Morphline and grok patterns SolrCloud • Index log lines as documents into a Collection (i.e. index) HUE Solr search • Simple UI to build a customized search page layout with faceting, sorting. • Easy drill down with multiple facets: severity, datacenter, hostname, etc

Screen shots

Sort by select field Search by time Facets by selected fields

Wildcard query by field Highlight the query keywords

Summary • Data sources: REST/JSON, log4j, syslog, Avro, Thrift • Parsing: Cloudera Morphlines • NRT Indexing: SolrCloud embedded in CDH • Batch indexing: MapReduce • Analytics: Use your favorite tool, raw detailed data stored in HDFS

Questions ? • email: ari.flink@webex.com • twitter: @raaka

Agenda

Agenda

Presentation Transcript

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda:

Agenda

Agenda

AGENDA