1 / 37

Building Intelligent Search Applications with Apache Solr and PHP5

Building Intelligent Search Applications with Apache Solr and PHP5. Israel Ekpo Software Architect with Bonnier Corporation Author of Apache Solr PECL extension Website: http://www.israelekpo.com Email: iekpo@php.net Twitter: @israelekpo. About the Presenter. Why Search?.

caden
Download Presentation

Building Intelligent Search Applications with Apache Solr and PHP5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Intelligent Search Applications with Apache Solr and PHP5

  2. Israel Ekpo Software Architect with Bonnier Corporation Author of Apache Solr PECL extension Website: http://www.israelekpo.com Email: iekpo@php.net Twitter: @israelekpo About the Presenter

  3. Why Search? • Looking for needle in haystack. • Retrieve information quickly. • Retrieve information and relevant results. • Narrow down result sets. • Sell more products to customers. • Display content of interest to the visitors. • Keep users staying on the web applications. • Increase the number of returning users.

  4. How to Implement Search • MySQL (full text search with MyISAM)‏ • Sphinx • Lucene • Apache Solr

  5. Apache Solr Search Tool of Choice?

  6. Apache Solr Features • HTTP interface for clients (any language)‏ • Standalone powerful full-text search server • REST-like HTTP/XML and JSON APIs • Hit highlighting • Faceted search • Dynamic clustering • Database integration • Open Source (FREE)‏

  7. http://www.apache.org/dyn/closer.cgi/lucene/solr/ Current Version is 1.4.0 Where do I get Solr?

  8. How do I Install It or Set it Up? • Solr can run as a Standalone Search Server • However for production purposes it is recommended to set it up with a servelet container or application server such as Jetty, Tomcat or Glassfish.

  9. Verifying Availability of Java 1.6 Dependencies $ dpkg --get-selections | grep sun-java sun-java6-bin install sun-java6-jdk install sun-java6-jre install $ sudo aptitude install sun-java6-jdk sun-java6-bin sun-java6-jre Setting Up Tomcat to Work with Solr

  10. Getting Tomcat 6 • http://tomcat.apache.org/download-60.cgi • Get the URL to the Core Binary distribution from your closest mirror and then • $ wget http://apache.cs.utah.edu/tomcat/tomcat-6/v6.0.20/bin/apache-tomcat-6.0.20.tar.gz • $ tar -zxvf apache-tomcat-6.0.20.tar.gz

  11. Setting up Tomcat 6 • $ sudo mv apache-tomcat-6.0.20 /usr/local/tomcat • The $JAVA_HOME and $JAVA_OPTS variables are required so we have to declare these in the ~/.bashrc file. • The $JAVA_HOME variable will be set to /usr/lib/jvm/java-6-sun • The Solr home will be set to /usr/local/tomcat/solr

  12. Contents of .bashrc file • export JAVA_HOME=/usr/lib/jvm/java-6-sun • export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/usr/local/tomcat/solr -Dsolr.data.dir=/usr/local/tomcat/solr/data -Dsolr.abortOnConfigurationError=true" • For multi-core configuration, please remove -Dsolr.data.dir=/usr/local/tomcat/solr/data from the options

  13. Tomcat Manual Setup Complete • The set up for Tomcat is now complete. • It can be started and stopped using the following commands : • $ sudo /usr/local/tomcat/bin/startup.sh • $ sudo /usr/local/tomcat/bin/shutdown.sh

  14. Setting up Tomcat Admin and Users • This will be done in the $CATALINA_HOME/conf/tomcat-users.xml file • $ sudo vim /usr/local/tomcat/conf/tomcat-users.xml • <tomcat-users> • <role rolename="manager"/> • <role rolename="admin"/> • <role rolename="webuser"/> • <user username="admin" password="Ch8ng3me" roles="admin,manager,webuser"/> • <user username="frontend" password="Ch8ng3me" roles="webuser"/> • </tomcat-users>

  15. Port number for HTTP Connector • The default port for the Java HTTP Connector is 8080 • This should be the first /Server/Service/Connector element node. • The second /Server/Connector node is for the Java AJP Connector. • $ sudo vim /usr/local/tomcat/conf/server.xml • <Server ...> • <Service ...> • <Connector port="8983" ... /> • ... • </Connector> • </Service> • </Server>

  16. Character Encoding for Non-ASCII • Find the node /Server/Service/Connector element node and add or set its • "URIEncoding" attribute to "UTF-8". • $ sudo vim /usr/local/tomcat/conf/server.xml • <Server ...> • <Service ...> • <Connector ... URIEncoding="UTF-8"/> • ... • </Connector> • </Service> • </Server>

  17. Automatic Startups and Shutdowns • Create the file /etc/init.d/tomcat and enter startup and shutdown commands : • $ sudo vim /etc/init.d/tomcat

  18. case $1 in start)‏ sh /usr/local/tomcat/bin/startup.sh ;; stop) sh /usr/local/tomcat/bin/shutdown.sh ;; restart)‏ sh /usr/local/tomcat/bin/shutdown.sh sh /usr/local/tomcat/bin/startup.sh ;; esac exit 0 Automatic Shutdown and Startups

  19. We have to make the /etc/init.d/tomcat script executable : $ sudo chmod 0755 /etc/init.d/tomcat The final step is to create a symbolic link between the /etc/init.d/tomcat script to the startup and shutdown folders. $ sudo ln -s /etc/init.d/tomcat /etc/rc1.d/K99tomcat $ sudo ln -s /etc/init.d/tomcat /etc/rc2.d/S99tomcat Terminamos! Automatic Shutdown and Startups

  20. 1. Download Solr 1.4.0 $ wget http://mirror.csclub.uwaterloo.ca/apache/lucene/solr/1.4.0/apache-solr-1.4.0.zip 2. Unzip the compressed folder $ unzip apache-solr-1.4.0.zip 3. Copy the solr.war file to the Tomcat webapps directory : $ sudo cp -p apache-solr-1.4.0/example/webapps/solr.war /usr/local/tomcat/webapps/solr.war Setting up Apache Solr

  21. 4. We now have to set up the Solr Home. Copy the example solr home example/solr as a template for your solr home. $ sudo cp -pr apache-solr-1.4.0/example/solr /usr/local/tomcat/solr From here on $SOLR_HOME is /usr/local/tomcat/solr The default solrconfig.xml file in $SOLR_HOME/conf/solrconfig.xml set the data directory for the index as ./solr/data relative to the current working directory. Please modify this to the absolute path to $SOLR_HOME/data $sudo vim /usr/local/tomcat/solr/solrconfig.xml <dataDir>${solr.data.dir:/usr/local/tomcat/solr/data}</dataDir> Setting Up Solr

  22. 5. We are almost done. We now have to restart the servlet container $ sudo /etc/init.d/tomcat restart Setting Up Solr

  23. 6. If you are setting up Solr in multi-core mode, then you need to set up the solr.xml in the $SOLR_HOME folder The core name, instance and data directory will be specified in the solr.xml file, each one with its own schema.xml and solrconfig.xml files This can be accomplished by simply moving the original conf folder to the instance directory for each core and then changing the contents of the files to match your settings Setting Up Solr

  24. <?xml version='1.0' encoding='UTF-8'?> <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" shareSchema="false"> <core name="confooevents" instanceDir="confooevents"> <property name="dataDir" value="/usr/local/tomcat/solr/data/confooevents" /> </core> <core name="confoospeakers" instanceDir="confoospeakers" > <property name="dataDir" value="/usr/local/tomcat/solr/data/confoospeakers" /> </core> <core name="confoosuggest" instanceDir="confoosuggest" > <property name="dataDir" value="/usr/local/tomcat/solr/data/confoosuggest" /> </core> </cores> </solr> Running Solr in Multi-Core Mode

  25. <fields> <field name="speaker_id" type="tint" indexed="true" stored="true" required="true" /> <field name="speaker_name" type="string" indexed="true" stored="true" multiValued="false" omitNorms="true"/> <field name="company" type="string" indexed="true" stored="true" multiValued="false" omitNorms="true"/> <field name="talks" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true"/> <field name="number_talks" type="tint" indexed="true" stored="true" multiValued="false" omitNorms="true"/> <field name="bio" type="text" indexed="true" stored="true" multiValued="false"/> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> <!-- Copying from display to default search fields. see text above and defaultSearchField below --> <copyField source="speaker_name" dest="text" /> <copyField source="company" dest="text" /> <copyField source="talks" dest="text" /> <copyField source="bio" dest="text" /> </fields> <!-- This is the primary key for this index --> <uniqueKey>speaker_id</uniqueKey> schema.xml

  26. <dataDir>${solr.data.dir:/usr/local/tomcat/solr/confoospeakers/data}</dataDir> solrconfig.xml

  27. Apache Solr PECL Extension Interacting Solr using PHP

  28. How to Get the Solr PECL extension • pecl install solr-beta • http://pecl.php.net/package/solr • Extract tarball • Enter extension directory • phpize • ./configure • make and make install • adjust php.ini settings • run php -me

  29. $options = array ( 'hostname' => SOLR_SERVER_HOSTNAME, 'port' => SOLR_SERVER_PORT, 'path' => SOLR_PATH_SPEAKERS, 'timeout' => SOLR_SERVER_TIMEOUT, ); /* Creating SolrClient instance */ $client = new SolrClient($options); Adding Documents to Solr

  30. /* Creating new input document */ $doc = new SolrInputDocument(); $doc->addField('speaker_id', $speaker_id); $doc->addField('speaker_name', $speaker_name); $doc->addField('company', $company); /* Adding document to the index */ $client->addDocument($doc); /* Finalizing Changes Do not forget this step */ $client->commit(); Adding Documents to Solr

  31. /* Creating SolrClient instance */ $client = new SolrClient($options); /* Remove the target document if you know the UniqueKey */ $client->deleteById($speaker_id); /* Finalizing Changes. Do not forget this step */ $client->commit(); Removing Documents from Solr

  32. SolrQuery Syntax http://wiki.apache.org/solr/SolrQuerySyntax http://lucene.apache.org/java/2_9_1/queryparsersyntax.html q=PHP q=company:Microsoft q=number_talks:[1 TO 3] q="apache solr"~10 Searching for Data in the Index

  33. /* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $query->setQuery($search_string); $query_response = $client->query($query); $response = $query_response->getResponse(); Searching for Data

  34. /* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $query->setQuery($search_string); $query->setHighlight(true); $query->setHighlightUsePhraseHighlighter(true); $query->setHighlightMaxAnalyzedChars(10000); $query->setHighlightFragsize(5000); $query->addHighlightField('speaker_name_t'); $query->addHighlightField('company_t'); $query->addHighlightField('bio'); $query->setHighlightSimplePre('<strong>'); $query->setHighlightSimplePost('</strong>'); $query_response = $client->query($query); $response = $query_response->getResponse(); Highlighting Hits

  35. /* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $query->setQuery($search_string); $query->setFacet(true); $query->setFacetMinCount(1); $query->addFacetField('company'); $query->addFacetField('number_talks'); $query_response = $client->query($query); $response = $query_response->getResponse(); Dynamic Faceting of Results

  36. /* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $auto_complete_type = 'terms'; if ('terms' == $auto_complete_type)‏ { $query->setTerms(true); if (strlen($auto_suggest_string))‏ { $query->setTermsPrefix($auto_suggest_string); } $query->setTermsField('speaker_name'); } else { $query->addField('speaker_name'); $query->setQuery("{!prefix f=speaker_name}$auto_suggest_string"); } AutoSuggest when Searching

  37. http://joind.in/1398 http://www.israelekpo.com/works Downloads and Feedback

More Related