1 / 52

AWS Summit Berlin - ANT303 - zero-ETL for DynamoDB with OpenSearch service

This content was presented in May 2024 at AWS Berlin Summit

Aman200
Download Presentation

AWS Summit Berlin - ANT303 - zero-ETL for DynamoDB with OpenSearch service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome slide B E R L I N | 1 5 + 1 6 M A Y 2 0 2 4 © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  2. A N T 3 0 3 Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service Muslim Abu-Taha he/him Senior Solutions Architect, OpenSearch Amazon Web Services Aman Dhingra he/him Senior Solutions Architect, DynamoDB Amazon Web Services © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  3. Amazon DynamoDB F A S T , F U L L Y M A N A G E D & S E R V E R L E S S K E Y - V A L U E D A T A B A S E Performance at Scale Secure & Reliable Serverless Built-in Integration with AWS Services Nothing to provision* Automated scaling with no availability impact Pay for what you use No maintenance windows or version updates AWS Lambda AWS Identity and Access Management (IAM) Amazon CloudWatch Amazon Kinesis Data Streams Amazon S3 Amazon Cognito AWS Backup AWS CloudTrail AWS Step Functions & more Consistent latency at any scale Unlimited throughput Unlimited storage Data encryption Highly available (99.999% SLA) Active-Active global replication Highly durable storage Continuous backups *using On-Demand Capacity © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  4. Example Data — Product Questions and Answers © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  5. § Amazon Standard Identification Number (ASIN) of the product § A unique question_id § Question text § Answer(s) § Product details § And then we've simulated metadata about the answerer: a rating, name, age, gender, and location https://registry.opendata.aws/amazon-pqa/ © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  6. DynamoDB access patterns 1. Retrieve all questions and answers for an ASIN 2. Add a question to an ASIN 3. Add an answer to a question © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  7. DynamoDB table layout © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  8. Additional access patterns 1. Find questions or answers containing certain words or phrases 2. Sort the results using a relevance algorithm 3. Generate a histogram chart of search results 4. Identify the most frequent answerer for certain topics 5. Limit results to answers from certain age ranges, genders 6. Limit results to those from a geographic area 7. Match using semantic meaning, not just words and phrases © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  9. Search: Improve the relevancy of your search results in near real time with cutting edge search innovations Analytics: Securely, easily, and efficiently visualize and analyze your operational data Amazon OpenSearch Service Vector Database: Ingest, store, and query vectors. Hybrid search and efficient filtering Cost effective: Optimize time and resources for strategic work Intelligent search and log analytics solution to help you get the most out of your data Deployment options: Serverless simplicity or managed control © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  10. Data replication pipeline with near real time AWS Cloud 1. Export a snapshot to S3 with ExportToPointInTime 2. Ingest the snapshot from S3 to the OpenSearch Service 2 1 Amazon Simple Storage Service (Amazon S3) 3. Application continues to send updates to DynamoDB while 1 & 2 are happening 3 Application Amazon DynamoDB Amazon OpenSearch Service 4. Updates to DynamoDB appear on the Stream 4 5 5. Process the stream with AWS Lambda and push into OpenSearch Stream AWS Lambda © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  11. Data replication pipeline with near real time AWS Cloud 1. Export a snapshot to S3 with ExportToPointInTime 2. Ingest the snapshot from S3 to the OpenSearch Service 2 1 Amazon Simple Storage Service (Amazon S3) 3. Application continues to send updates to DynamoDB while 1 & 2 are happening 3 Undifferentiated Work! Application Amazon DynamoDB Amazon OpenSearch Service 4. Updates to DynamoDB appear on the Stream 4 5 5. Process the stream with AWS Lambda and push into OpenSearch Stream AWS Lambda © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  12. New! Amazon DynamoDB Zero ETL integration with Amazon OpenSearch Service Near real time – Bootstraps an S3 export then replicates data with Amazon DynamoDB Streams within seconds of changes Autoscaling – Automatically scales to the demands of your application No code required – Routing, mapping, & transforms are defined via configuration Powerful – Built on Amazon OpenSearch Ingestion © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  13. Serverless Easy to administer with console or APIs. Automatic application of patches and security upgrades OpenSearch Ingestion is powered by Secure Redact and obfuscate sensitive information. Route data for compliance Cost Efficient Automatically scales to fit workload demands Open Source Part of the Apache 2.0-licensed OpenSearch project © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  14. Data Prepper has a rich set of Processors Manipulate data values Rename keys, combine data into new entries Route different data into different indexes Routing Sub-pipelines Send data through multiple pipelines for tight control Aggregate Combine, reduce, sample, and aggregate Native understanding of CSV and JSON Parsers © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  15. Rethinking integration AWS Cloud 1. Setup a pipeline between DynamoDB and OpenSearch Service Managed Pipeline 1 OpenSearch Ingestion 2 2. Application continues to send updates to DynamoDB, which are synced across Bootstrap and Change Data Capture Application DynamoDB Amazon OpenSearch Service © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  16. Creating a pipeline – DynamoDB console © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  17. Creating a pipeline – configure pipeline © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  18. Pipeline fundamentals: source Source – The DynamoDB table to pull from Can be 1 or many © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  19. Pipeline fundamentals: processor Processor – A hook to manipulate the data © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  20. Pipeline fundamentals: sink Sink – The target OpenSearch index Can be 1 or many © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  21. Amazon OpenSearch Service © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  22. To each technology, its purpose Amazon DynamoDB Amazon OpenSearch Service Use DynamoDB for consistent low latency, durability, and flexibility Use OpenSearch Service to provide rich search capabilities, and relevant results Most likely your “system of record” Index what you want to search then use the primary key to retrieve the items from DynamoDB © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  23. Deployment options Serverless Managed domains Provides fine-grained control over instance types for better resource and cost optimization, and larger scale Serverless Auto-scaled and simple to manage. Serverless collections manage OpenSearch indices © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  24. OpenSearch Service is a distributed solution AWS Cloud - region Customer domain VPC Application Load Balancing (ALB) SAML Data nodes UltraWarm nodes Cluster manager nodes IAM, Amazon Cognito, SAML for Dashboards Login Amazon DynamoDB Amazon CloudWatch AWS Amazon Kinesis Data Firehose AWS DMS Amazon MSK CloudTrail © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  25. OpenSearch Service is a distributed database Send documents to an index An index is comprised of shards Shards are distributed to data nodes © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  26. Data layout DynamoDB OpenSearch Service Table Index © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  27. Data layout DynamoDB OpenSearch Service Table Index Item Doc Item Doc Item Doc Item Doc Item Doc Item © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  28. Data layout DynamoDB OpenSearch Service Table Index Item Doc Item Doc Item Doc Item Doc Item Doc Item Partition Sort _id © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  29. Data layout DynamoDB OpenSearch Service Table Index Item Doc Item Doc Item Doc Item Doc Item Doc Item Partition Sort Attribute _id Field – typed, JSON field © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  30. OpenSearch Service data layout © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  31. OpenSearch core algorithms © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  32. Lucene indices provide the core search functionality Field indices Full name Term1 term2 Term 1 1, 4, 8, 12, 30, 42, 58, 100... Name Value Term 2 Term 3 Name Value Term 4 Name Value Doc Posting list Term 5 Name Value Term 6 Term 7 Name Value Document fields Analysis © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  33. Query processing Query text Matches Results Posting lists Egyptian cotton queen sheet set egyptian cotton queen sheet set Analysis Terms Index Merge Score and sort © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  34. Text analysis Source text Standard analyzer English analyzer Name: 1100tc egyptian cotton sateen brown yellow stripe sheet set queen Name: 1100TC Egyptian Cotton Sateen Brown & Yellow Stripe Sheet Set – Queen Name: 1100tc egyptian cotton sateen brown yellow stripe sheet set queen Description: 1100tc egyptian sateen sheet set includes 1 flat sheet brown yellow sky blue stripes on white on front gray stripe pattern on back 1 fitted sheet gray stripe pattern 2 sham covers gray stripe pattern available size queen king Description: 1100tc egyptian sateen sheet set includ 1 flat sheet brown yellow sky blue stripe white front grai stripe pattern back 1 fit sheet grai stripe pattern 2 sham cover grai stripe pattern avail size queen king Description: 1100TC Egyptian Sateen Sheet Set Includes: - 1 Flat Sheet: Brown, Yellow & Sky blue stripes on white on front (gray stripe pattern on back) - 1 Fitted Sheet: Gray stripe pattern - 2 Sham Covers: Gray stripe pattern (Available Size: Queen / King) © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  35. Text-text matching example Name: 1100tc egyptian cotton sateen brown yellow stripe sheet set queen Query text Description: 1100tc egyptian sateen sheet set includ 1 flat sheet brown yellow sky blue stripe white front grai stripe pattern back 1 fit sheet grai stripe pattern 2 sham cover grai stripe pattern avail size queen king Egyptian cotton queen sheet set Analyzed egyptian cotton queen sheet set © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  36. Searching scenarios © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  37. DynamoDB table layout © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  38. OpenSearch mapping defines the schema OpenSearch has a single mapping per index The mapping controls the storage and retrieval of the data Unmapped fields are dynamically mapped Caution: dynamic mapping detects strings for all Caution: item-name collision © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  39. What kinds of queries can I run? Text Full-text and single term Numeric Geo Exact match and range queries Geohash, geo-point, geo-polygon, and bounding box Exact and approximate nearest- neighbor search supporting vector matching and semantic search Percolate, distance, span, script, sparse, neural Vector Special You can sort results by most field types © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  40. Scenario: Text Search © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  41. Matching full text: “Egyptian cotton queen sheet set” © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  42. Complex queries; text, range, filters © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  43. Scenario: Analytics © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  44. Count questions and answers by ASIN © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  45. OpenSearch Dashboards for visualizations © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  46. Scenario: LLM-Backed, Semantic Search © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  47. Artificial intelligence / machine learning boom Large language models providing good baseline NLP ChatBots hit the public’s awareness “Semantic” capabilities are used in search workloads Other AI/ML techniques are used in search as well Amazon OpenSearch Service enables working with vector embeddings © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  48. Working with embeddings Indexing Document chunk Source document OpenSearch neural plugin OpenSearch Service provides connectors to 3P model- hosting systems – e.g. Amazon Bedrock Indexing: Select chunks from the source document, send to the 3P system for embeddings Search: Create embedding for the query then find nearest neighbors Amazon Bedrock Augmented Document K-nearest-neighbor (kNN) index Search OpenSearch neural plugin { } User query Amazon Bedrock User query with embedding K-nearest-neighbor (kNN) index Search results © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  49. OpenSearch Ingestion config Add an entry containing relevant portions of text Add a vector field to the mapping OpenSearch Service’s neural plugin does the rest © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  50. What sheets should I use if I have a cat? Top 5 questions are about dogs! Top 5 answers mention fur © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related