Introduction to elasticsearchon Microsoft Azure,for theMicrosoft AzureMeetup Group Chris Morley (@depahelix) Microsoft NERD Center, Cambridge, MA February 20, 2014
Agenda – 10 minute lightning sections • elasticsearch general introduction • create an elasticsearch node in azure • add some data and search it • create a web front end for it • create a Windows 8 front end for it • scale out with Azure plugin for elasticsearch • If there is time – look at other plugins, plus a bit of Q & A
Goals • show you how to get setup to use elasticsearch at a very basic level • give a general high level overview of practical plumbing • get you rolling so you can start to evaluate ES, for real, for whatever you might be thinking you could use ES for • explain why the new plugin for Azure is a good start, but needs work • convince you that elasticsearch is generally pretty cool • give you a “sense” of what’s going on
Not here to… • “sell you” on using elasticsearch • convince you that you must use elasticsearch, or else • it’s really like any other technology – there are alternatives out there • demonstrate the true power and flexibility and capability of elasticsearch (not enough time to go through all that) • Not a “big data” demo, by any stretch. • drill down into minute details or get bogged down on specifics • I am just a user, not a sales agent
What is elasticsearch? • In short, it can be thought of as “search engine software” • It provides the realistic potential for you to run your own search engine service (like a Bing or a Google) but with say, private, sensitive, or confidential data/documents that you don’t want on the public web • great extra capability for your company, enterprise, app, startup, client • elasticsearch is an open-source, distributed web application that runs on top of Lucene, and it is written in Java, and it sports a REST API • Apache Lucene is the best open-source search engine, and probably one of the best search engines available, and holds its own even when compared against the most expensive commercial alternatives • very fast search
Where did elasticsearch come from? • Originally there was a search application project called Apache Compass, which was primarily worked on by @kimchy • Compass also relied on Lucene, but was not distributed • kimchy decided to write elasticsearch to be distributed from the get go, and so you could say it was built with the cloud in mind • Add more servers and they play together nicely, and they know how to work together to split up the work load (and search queries can be resource intensive and expensive in terms of memory/disk requirements)
Why do I know so much about elasticsearch?(didn’t it just come out?) • I help support an implementation for work • We bought a company which was an early adopter/beta site, and it was setup a while ago, with help from elasticsearch people • We built a new implementation somewhat based on that earlier implementation • I maintain and add on to that implementation • I attended the elasticsearch 2 day training in NYC this past September • (which I highly recommend) • I worked on a Solr project for about 9 months a couple years ago
elasticsearch is an advanced distributed app • It has some very cool properties and abilities when it comes to operations that involve lots of nodes • It scales extremely gracefully • It has its own optimized binary protocol and makes its own “internal network” • …as long as you know what you are doing when it comes to configuration • It is open source
What elasticsearch is Not (1 of 3) • It is NOT safeas a primary persistent data store • Meaning – you should not trust it as a “system of record” • Always be prepared to reload from scratch, in case of data corruption • “Don't let yourself get attached to anything you are not willing to walk out on in 30 seconds flat if you feel the heat around the corner.” -Neil McCauley Heat • Although Neil’s to-a-fault discipline doesn’t apply to everything in life, elasticsearch is one of those things that it actually works well if you apply that philosophy: always be ready to drop and reload your data if something goes horribly wrong in the future
What elasticsearch is Not (2 of 3) • It is NOT secure (at this time) • Even though it is a nice wrapper around Lucene, it, itself still needs to be wrapped and hardened against direct traffic from the Internet, in basic ways, usually with a proxy • Security has not been a focus, and that has been a design decision • You have been warned!
What elasticsearch is Not (3 of 3) • It’s not extremely well documented • There is a lot of documentation, but it is sometimes difficult to parse/read the sentences due to grammatical errors, etc. • Plus there is a lot of jargon when you start talking about analyzers, etc. You have to do a lot of research to make use of what documentation there is. • If you want to really learn a lot, go to a 2 day seminar (It’s $1800.00)
Why do you need elasticsearch if you have Solr working already? • OK, so elasticsearch is very much going to be an alternative to Solr • It is distributed from the get go. ES is distributed at its core. (“shard”) • SolrCloud gets Solr to act more like elasticsearch • Solr is more XML based, but can serve JSON too • Elasticsearch is more JSON based, with configuration in simple .yml • Short answer: there may be no compelling reason to do an expensive migration off of Solr to elasticsearch, but if you are starting a brand new project, consider elasticsearch. It’s cooler and it does more things.
Each machine is a node in the cluster • You’ve heard this terminology before if you have used Hadoop, Zookeeper, or any number of other distributed systems • Nodes can have “types” (master, data, client, and tribe) • Data nodes need disk and memory • Client nodes need memory • Master nodes need stability and to not be “stressed out” or “upset” • The Tribe node (if you create one) is sort of a MasterMaster node
The simplest cluster: one node • It’s the master • It’s the data node • It’s the client node • There is no tribe node • Let’s set one of these up in Azure…
Let’s make a Linux box in Azure • Login to the Windows Azure management console • manage.windowsazure.com • If you don’t already have a subscription, Google for “try Azure 90 days” • Go to Virtual Machines and click on New • From Gallery > CentOS> OpenLogic> A1 > Small> East, w/password • Open up endpoint security on port 9200 (elastic/9200/9200) • SSH to the machine using cygwin(or PuTTY, or whatever you like best)
Components to install • elasticsearch 0.90.x , currently 0.90.10 – available from www.elasticsearch.org • There is an elasticsearch 1.0 version and you are welcome to try that instead if you prefer
Components to Install (only getting underlined for this demonstration) • wget, curl • elasticsearch plugins: • head • azure plugin • bigdesk • paramedic • river-jdbc • elasticsearch-service wrapper • Many more to check out • other things to get, as you need: mysqlconnector, etc.
Install and Run Elasticsearch • http://manage.windowsazure.com/ (+switch subscriptions) • create an elasticsearch node in azure
Before I began • Created an extrasmall VM • Installed Node.js • Installed the cross-platform CLI tool for Azure
elasticsearch starts up • the node gets a random name from a list • it is started in the foreground right now for our simple demo purposes, but normally you would want to install it as a service • go to • http://azure-elasticsearch-cluster?.cloudapp.net:9200/ • make sure there is a response.
Add some data and search it (experiment) • Let’s load some data in and do a search on that data. • Run Experiment • Show scripts, show output • Show output files JamesTaylor.txt vs. TaylorJames.txt
Front ends • create an Azure Websites web front end for it (1 html file, 1 js file) • create a Windows 8 front end for it • …>
Make a simple Windows 8 app and hook it up • Run VS2013 as Admin • File > New > Project > Windows Store • “Blank App”, App1, c:\users\chris\documents\visual studio 2013\Projects • Project > Manage NuGet Packages, add JSON.NET, MicroMVVM • Add textbox, and a button. Add a stackpanel. • Double click button. • (Open Desktop/presentation/App1)
Demo Azure Plugin • ./createSome.sh • Configure Endpoints manually: 9200 load balanced. • Still to do: • automate endpoints being opened • attach listeners/pollers to process that can spin up new nodes • keep the running list of nodes somewhere (like an Azure Table, maybe) • differentiate the nodes (master, client, data, tribe) • use chef or puppet?
Links and stuff http://www.depahelix.com/elasticsearch Includes links and stuff for this talk. Check the space in case I fix errata or give more talks. • Chris Morley • firstname.lastname@example.org • Twitter: @depahelix