1 / 57

Orchestrating Messaging, Data Grid and Database

Notes. Companies and ProductsOracle acquired Tangosol back in JuneCoherence is a Data Grid solutionQuestions are encouraged. Agenda. Technology Stack OverviewIntroduction to Data Grid technologyApplication StateTypes of StateChallengesPutting it togetherHow state is managed by application

silvio
Download Presentation

Orchestrating Messaging, Data Grid and Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Orchestrating Messaging, Data Grid and Database Jon Purdy Oracle Corporation

    2. Notes Companies and Products Oracle acquired Tangosol back in June Coherence is a Data Grid solution Questions are encouraged

    3. Agenda Technology Stack Overview Introduction to Data Grid technology Application State Types of State Challenges Putting it together How state is managed by application tiers How to integrate application tiers How Data Grids can “fill in the gaps”

    4. Technology Stack Overview There are many tools for building scalable, reliable systems Messaging Application Servers Data Grids Databases What types of state do these manage? When should each one be used?

    5. Technologies Messaging Integration between systems (queues) Distributing relevant data (topics) Application Servers Request processing Conversational state Data Grids Scalability and performance Conversational state and/or limited persistent state Databases Persistent state Reliable, shared conversational state (if needed)

    6. Technologies

    7. Data Grids: What are they? Special-purpose data management solution Live, transactional data at in-memory speed “First class” programmatic access Built from the ground-up for in-memory efficiency Avoids CPU overhead of disk management Usually a native “object” view of data Less flexible than a “true” database Query optimization is an “unsolvable” problem Three decades of RDBMS evolution offsets that Less focus on long-term storage large amounts of live, transactional information available at in-memory speed synchronous replication of information; data reliability, continuous availability, provides consistency; single, holistic view automatically and dynamically partitioned search, analyze and process the information in parallellarge amounts of live, transactional information available at in-memory speed synchronous replication of information; data reliability, continuous availability, provides consistency; single, holistic view automatically and dynamically partitioned search, analyze and process the information in parallel

    8. Data Grids Extend the coherency protocol to client applications Take advantage of the native “object” view of data Keep important data local for efficiency OR/M can sometimes be slower than the actual query Implementations Oracle Coherence GemStone GemFire IBM ObjectGrid

    9. A Brief History

    10. Relational DBMS Relational DBMS Relational structure allows any view of data Minimizes impact of data schema mistakes Databases for The People With 4GL tools, led to the Client-Server revolution And even power users: Microsoft Excel and Access The critical ingredient: Query Optimizer DBMS assumes responsibility for optimizing data access

    11. Relational DBMS But… Static optimization (RBO) is not 100% reliable Dynamic optimization (CBO) is not 100% reliable Mistakes magnified with scale and load Scalability and availability problems

    12. Object DBMS Brief appearance in late 80’s / early 90’s Some impressive performance feats Extremely efficient for intended access patterns Data schema coupled to business logic Difficult to evolve data schema Market segment as a whole has died A few stragglers left

    13. The best of all worlds Take the efficiency of an Object DBMS … In-memory data coupled to application access patterns Consistent access patterns at runtime … Add scale-out as a primary objective … … And leverage the RDBMS Existing storage resources and skills Loosely coupled data schemas

    14. How does it work?

    15. Partitioned Cache

    16. Partitioned Cache

    17. Partitioned Cache

    18. Near Cache

    19. Types of State Characteristics

    20. Types of State Messages Request/Response Source: user, message queue or another application tier “Show inventory list” (display web page in browser) Just a message from one system to another Conversational State Stateful Applications Spans multiple requests (a “conversation”) “Add item to shopping cart” (update HTTP session) Internal state Persistent State Typically stored in a database “Place order” (persist order to database) Externally visible The terms messages/requests will be used interchangeably Sun Blueprints: “A business object often needs to maintain state between method invocations. This state can be either conversational or persistent. Conversational state is state maintained in an object during the conversation between a client and the application. Persistent state is state that is stored in a database or other persistent store, outliving the conversation between a client and the application.”The terms messages/requests will be used interchangeably Sun Blueprints: “A business object often needs to maintain state between method invocations. This state can be either conversational or persistent. Conversational state is state maintained in an object during the conversation between a client and the application. Persistent state is state that is stored in a database or other persistent store, outliving the conversation between a client and the application.”

    21. Connecting the dots… Applications process requests, taking into account the context of those requests, to manage persistent data Therefore, effective applications must ensure that: Requests are properly processed Proper context is maintained Persisted data is correct All of this is done in a timely manner

    22. Characteristics: Messages Short-lived Interactive apps: milliseconds to a few seconds Integration: similar, unless one of the systems is down Immutable and single-writer pattern By definition, each request submitted by a single system Almost no way to corrupt state, and easy to avoid losing state “Stateless” applications are very easy to scale Simple request-response processing Requests are often retry-able (idempotent)

    23. Characteristics: Conversational State Longer-lived A few seconds to several minutes Mutable, but by a single user Not quite single-writer Simultaneous requests from a user Multiple portlets in a portal application Multiple clicks at the same time Load-balancing issues: failover/failback/rebalancing Often recoverable Worst case, by restarting the session

    24. Characteristics: Persistent State Long-lived Rarely less than a few days; often many decades Often have regulatory requirements for several years Mutable and globally shared Possible interaction and contention from all users Concurrency and data consistency are hard to combine The entire application shares one persistent state

    25. Summary: Managing State

    26. Types of State Challenges

    27. Challenges Messages Most considerations relate to interactions between systems These interactions are effectively distributed transactions It is critical to manage these “transactions” both reliably and efficiently

    28. Challenges Conversational state Most applications can tolerate modest corruption (or loss) of conversational state (or do anyway) Those that can’t assume this will generally place this state in a reliable data store, or avoid conversational state altogether While technology solutions exist, scaling stateful applications remains a challenge

    29. Challenges Persistent state As the “System of Record”, persistent state is the most valuable asset Databases are the default option for properly managing persistent state However, scaling and performance concerns often move data management out of the database, increasing the difficulty of managing it correctly

    30. Impact of lost/corrupted data Messages User gets a failed request User resubmits request (click again) Impact limited in scope (one user) and time (one request) Conversational State User’s session is corrupted or missing If detected by the system, user may need to log in again and start over If not detected, the user will usually (but not always) notice Impact limited in scope (one user) and time (one session) Persistent State Persistent State is the primary objective! For the user: Payment received but order not shipped For everyone: Inventory levels are incorrect Impact is global for all users and for all time!

    31. Critical Areas of Concern Messages Conversational State Persistent State

    32. Messaging Compare, Contrast, Integrate

    33. Messaging Topics One-to-many: subscribers sign up to topics of interest All subscribers receive messages as they occur Emphasis on fast delivery to many subscribers (performance, scalability) Queues Used primarily for communication between two systems Physical decoupling of sender and receiver Emphasis on reliable message delivery (durability) Implementations TIBCO Rendezvous, IBM MQSeries

    34. Messages Requests typically flow through multiple systems Message Queue ? App Server ? Database Browser ? Web Server ? App Server ? Database Ensure that each request is processed … even if a participating service fails Failure of either client or server can result in “dropped” or “duplicated” requests Most common requirement is “once and only once” but other variants may be acceptable (“at most once”, “at least once”)

    35. Traditional Message Processing Integrating multiple systems may require distributed transactions (XA) Distributed transactions Simple to integrate: minimal effect on application architecture E.g. enlist both the database and the queue Slow (“disk forces”) Tendency to cause lock contention (two-phase locking) Not 100% reliable (“heuristic failures”) Not widely supported (lack of support, compatibility issues) Each phase (prepare/vote, commit) requires a write to disk Each phase also requires database locksEach phase (prepare/vote, commit) requires a write to disk Each phase also requires database locks

    36. Idempotency Concept If the client knows the server can handle duplicate requests … Then the client can err on the side of re-sending “in doubt” requests A partial failure results in a complete retry No need to use XA to coordinate client and server Impact May have a noticeable impact on application architecture Fast Very reliable

    37. Message Processing with XA JMS begin TX DB begin TX Read message Write to database Prepare JMS Prepare DB Commit JMS Commit DB If the prepare phase fails in either JMS or DB, the DB transaction is rolled back, and the JMS message is left in the queue If the commit phase fails, that is a heuristic failure; the state of the transaction is “unknown”

    38. Idempotent Message Processing with Local Transactions JMS begin TX DB begin TX Read message Write to DB (Idempotent) Commit DB Commit JMS If commit to DB fails, the entire operation is aborted; the message is still in the queue If commit to JMS fails, the JMS de-queue is rolled back (but the DB commit isn’t) The next time the message is processed, the write to the DB will occur, but the operation won’t have undesired side effects

    39. Data Grid and Messaging Data Grids can be used as a messaging fabric But introduces global visibility of a new infrastructure piece Established players have more mature solutions And operations team know these products Messaging usually used within the Data Grid Not between disparate applications One exception Data Grids can use write-behind queueing to avoid the need for a dedicated message broker Queue the messages in memory, not on disk Slight reduction in durability but reduces operating costs

    40. Application Server Compare, Contrast, Integrate

    41. Application Servers Application “containers” Provide a framework for managing requests and (usually) conversational state May manage lifecycle of application deployment packages Also service directories (JNDI / Jini lookup services) Implementations JavaEE: WebLogic, WebSphere, JBoss, Oracle AS, etc. Compute Grid: Platform Symphony, DataSynapse GridServer Jini: Blitz, GigaSpaces Spring Requests Route incoming requests (e.g. from TCP socket) to application components Conversational State JavaEE: HTTP sessions (conversation between user and web server) Jini: JavaSpaces (conversation between multiple processes)

    42. Conversational State Topologies In-memory (no replication) Fastest, most scalable option Server failure results in data loss Single-server visibility (dependent on sticky load balancer) In-memory (replication) Fast, scalable (implementations vary) Widely available, sufficient for most use cases Most implementations are not fully coherent under load or failure Database persistence Higher complexity and lower performance Achieves data consistency, commonly available Scales with database server (for better or worse) In-memory replication may only scale to the number of users for a single system if session data is replicated to each web serverIn-memory replication may only scale to the number of users for a single system if session data is replicated to each web server

    43. Conversational State Unreliable conversational state No in-memory replication (data loss) Incoherent in-memory replication (data corruption) Tools Idempotent processing Reliable data store Concept Use application and data store to verify correctness on commit Verify order placement on web page Use optimistic concurrency on database to check values Use idempotent processing to retry request chain Buyer corrects shopping cart and resumes checkout process Or for closed-loop systems, recover missing conversational state by replaying requests or re-loading from database (selectively persisted for performance)

    44. Database Compare, Contrast, Integrate

    45. Database The only real solution for persistence? Permanent “System of Record” Guaranteed data consistency Operations Perhaps the most widely deployed technology In-house operations teams already know how to use Strongest query technology (robust cost-based optimizers) Plenty of support: 3rd party tool vendors, consultants, documentation, discussion forums, etc.

    46. Database Usually the easiest and most reliable solution for managing persistent state But supply … Absolute requirement for data consistency Consistency requirements make scaling difficult (but possible) … may not meet demand Front tiers are inexpensive and easy to scale Scaling on the front causes massive load on the back Offloading can help with managing persistent data Eventually faces diminishing returns from overhead and complexity

    47. Offloading via Caching Keep a local partial data set for faster access Beneficial for read-heavy applications Gained popularity by mitigating the EJB BMP N+1 problem Limited gains for transactions and queries Relatively transparent to application architecture Weak requirements for data consistency With optimistic concurrency, data consistency is delegated to SoR For presentation layer, dirty reads are often acceptable

    48. Offloading Analytics Run queries against a copy of the System of Record System of Reference Data consistency is important Depends on usage Generally operating against a point-in-time snapshot Data resilience is a Quality of Service consideration Recoverable from the System of Record Failure will affect availability but not results

    49. Offloading Events Changes to the System of Record may need to trigger additional processing Challenges Ensuring all changes of any relevant state are handled in a timely manner Absolute data consistency required for change events and the context of those events (ordering, subscribers, etc) Hard to do all of these Absolute data consistency “Fan-out” of events from transactions Timely delivery of events Transactions and Events both require absolute accuracy. The main challenge with events is that each triggering transactions may “fan out” a larger number of events, multiplying the load, and potentially eliminating the transactional source as a manager of the event stream. Additionally, these events are often very time-sensitive, making it even harder to combine robustness and scalability.Transactions and Events both require absolute accuracy. The main challenge with events is that each triggering transactions may “fan out” a larger number of events, multiplying the load, and potentially eliminating the transactional source as a manager of the event stream. Additionally, these events are often very time-sensitive, making it even harder to combine robustness and scalability.

    50. Offloading Transactions The System of Record must manage all transactions related to its “owned” data But a given piece of data may have different owners over even short periods of time Important to identify which system owns each piece of data Usually achieved by “owning” part of the permanent store Data consistency required

    51. Data Grids can help Conversational state Combine the data consistency of a database with the performance of local in-memory data Persistent state Running queries in the data grid can remove the query load on a database Committing transactions in-memory then persisting in batches can reduce the transaction load of a database Abstraction of data sources

    52. Data Source Integration - Read Through

    53. Data Source Integration - Write Through

    54. Data Source Integration - Write Behind

    55. Data Grid: Data Source Integration Data Integration occurs in the Data Service Integration uses the domain model The data is both live and shared Events provide bi-directional flow Applications can respond to events

    56. Summary of Data Grid Integration Points Messaging Data Grid can be used for internal application messaging Application Server Scale data availability reliably along with processing power Database Offload transactions and analytics to Data Grid for higher throughput

    57. The Spectrum

    58. Thank You!

More Related