1 / 13

Flint: Making Sparks ( and Sharks and HDFSs too!)

Flint: Making Sparks ( and Sharks and HDFSs too!). Jim Donahue | Principal Scientist Adobe Systems Technology Lab. Flint: Bring BDAS to the AWS Masses @ Adobe. How to effectively evangelize BDAS @ Adobe? Looking for intrepid, curious users who want to experiment

kalani
Download Presentation

Flint: Making Sparks ( and Sharks and HDFSs too!)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flint: Making Sparks (and Sharks and HDFSs too!) Jim Donahue | Principal Scientist Adobe Systems Technology Lab

  2. Flint: Bring BDAS to the AWS Masses @ Adobe • How to effectively evangelize BDAS @ Adobe? • Looking for intrepid, curious users who want to experiment • Curiosity is always tempered by cost of startup • Most of the data for experimental applications likely in AWS

  3. Flint: Design Principles • Shared Nothing • Get your own AWS account and go • Simple Configuration • Write a little JSON, run a couple of scripts • Efficient, flexible scaling • As simple or complex as you want/need • Full access to tools • Batch, Spark/Shark shells, Shark Server, web UIs, … • Access to all the Spark/Shark tuning parameters • Very simple hardwired “spark-env.sh” • Tuned to Adobe environment • Port choices determined by our firewall

  4. LocalSpark/Shark Flint: Architecture RemoteAccess ClusterSetup • Local Spark/Shark, Slaves can use S3 storage for files • Remote Access runs shells on SSH Server • Components use S3, SimpleDB for state management • Flint distributes shared AWS credentials among components • Flint manages master, SSHServer startup • Slave elasticity managed by master, can leverage spot pricing Local FlintServer Flint Spark Master SSHServer(Shells) S3 SparkSlave(s) AWS SimpleDB

  5. Flint: Setup RemoteAccess LocalSpark/Shark ClusterSetup • Flint instance manages encrypted AWS credentials • Create S3 buckets to hold JAR files • Create SimpleDB tables to hold state • Create key pair, security group for instances Local FlintServer Flint S3 AWS SimpleDB

  6. Flint: Provisioning RemoteAccess LocalSpark/Shark ClusterSetup • Define clusters through JSON spec (“master instance configuration is x, slave instance configuration is y, scaling rule is …”) • Define configurations through JSON spec (“spark master uses AMI x, running service y, with properties a, b, …”) and JAR file containing services code • “Getting started” set of clusters, configurations provided • AMI provided with all the requisite Spark / Shark / Hadoop / Kafka bits Local FlintServer Flint S3 AWS SimpleDB

  7. LocalSpark/Shark Flint: Cluster Start RemoteAccess ClusterSetup • Local Flint Instance launches “master” instance (using cluster definition in SimpleDB) • Master reads SimpleDB and S3 for configuration and code, installs master services • Starting services launches Spark and/or HDFS masters through command line • Master puts “connect URL” in SimpleDB Local FlintServer Flint Spark Master S3 AWS SimpleDB

  8. LocalSpark/Shark Flint: Slave(s) Start RemoteAccess ClusterSetup • Master “scaling service” launches slave instance(s) • Slave reads SimpleDB and S3 for configuration and code, installs worker services • Slave gets master “connect URL” from SimpleDB • Slave launches Spark and/or HDFS workers through command line Local FlintServer Flint Spark Master S3 SparkSlave(s) AWS SimpleDB

  9. LocalSpark/Shark Flint: Client Start RemoteAccess ClusterSetup • Flint instance launches “client” instance (using cluster definition in SimpleDB) • Client reads SimpleDB and S3 for configuration and code, installs (SSHServer) services • Client reads SimpleDB for authentication info, master connect URL • Service startup starts SSHServer connected to right “shell factory” Local FlintServer Flint Spark Master SSHServer(Shells) S3 SparkSlave(s) AWS SimpleDB

  10. LocalSpark/Shark Flint: Client Connect (Remote Shells) RemoteAccess ClusterSetup • Flint server finds “appropriate client” • SSH client launched to connect • SSHServer connects to master on client’s behalf Local FlintServer Flint Spark Master SSHServer(Shells) S3 SparkSlave(s) AWS SimpleDB

  11. Flint: Client Asynchronous Requests • Flint clients can also make asynchronous requests • Each Flint master runs service that pulls request from SQS queue • Request progress/results stored in SDB • Requests include: • Move data between HDFS and S3 • Mount EBS volume and cache in HDFS (AWS public data sets) • Run batch job • Client can make request even if cluster not alive • Simplifies startup sequencing • Can use monitoring of “cluster queues” to start cluster “on demand”

  12. Flint: Where We Are Now • Have some intrepid, curious users • The big issue is always “Do I really want to use Spark/Shark?” • SQL is a big selling point • Scala is a mild put-off • Spark Streaming may help settle the issue • Open Sourcing is under discussion • If you’re interested, let me know! 

More Related