Tools for text indexing and searching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

Tools for Text Indexing and SearchING PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

PeWe 2011. Tools for Text Indexing and SearchING. Du šan Zeleník. FIIT STU. zelenik @ fiit.stuba.sk. Searching using SQL LIKE. CREATE INDEX names_index ON heroes(name) SELECT name FROM heroes WHERE name LIKE “z elen %” will use names_index , ok

Download Presentation

Tools for Text Indexing and SearchING

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tools for text indexing and searching

PeWe 2011

Tools for Text Indexing and SearchING

DušanZeleník

FIIT STU

[email protected]


Searching using sql like

Searchingusing SQL LIKE

  • CREATE INDEX names_index ON heroes(name)

  • SELECTname FROM heroesWHERE name LIKE “zelen%”

    • will use names_index, ok

  • SELECTname FROM heroesWHERE name LIKE “%ik”

    • won’t use names_index (seriously don’t do that)

  • CREATE FULLTEXT INDEX names_fullindex ON heroes(name)

  • SELECTname FROM heroesMATCH(name) AGAINST(“%ik”)

    • will use names_fullindex

  • SELECTname FROM heroesMATCH(name) AGAINST(“ze%ik”)

    • won’t use names_fullindex(seriously don’t do that)


Search engines for text

Search Engines for TEXT

  • Lucene

    • Lucene Core - Java (library)

      • Ferret …

    • Solr - Java (standalone server)

      • Sunspot …

    • ElasticSearch - Lucene Core

      • Tire …

  • Sphinx – C++

    • Thinking Sphinx


Lucene vs sphinx

Lucene vs. Sphinx


Sphinx

Sphinx

  • Standalone server (http://sphinxsearch.com/)

  • Thinking Sphinx (Rails Gem – MVC)

    • http://freelancing-god.github.com/

    • works directly with DB and Sphinx server


Thinking sphinx

Thinking Sphinx

class Hero < ActiveRecord::Base

define_indexdo

indexes description, :sortable=> true

indexes sidekick(:name), :as => :sidekick, :sortable => true

has sidekick, summoned_at, died_at

end

end

Hero.search “zelenik”

Hero.search:conditions=> {:sidekick=> “simko”},

:match_mode=> :any#(:all, :any, :phrase, :boolean)

:order=> :died_at


Thinking sphinx1

Thinking Sphinx

Excerpts

  • heroes = Hero.search “gigant”

  • heroes.excerpts.description

    • … has abnormally gigant muscles ….

      Facets

  • indexes sidekick.name, :as => :sidekick, :facet => true

    Geolocation

  • has "RADIANS(latitude)", :as => :latitude, :type => :float

  • has "RADIANS(longitude)", :as => :longitude, :type => :float

  • Place.search “zelenik",

    :geo => [@lat, @lng],

    :with => {"@geodist" => 0.0..10_000.0}


Tools for text indexing and searching

Solr

  • Standalone server (http://lucene.apache.org/solr/)

  • Sunspot (Rails Gem)

    • http://outoftime.github.com/sunspot/

    • communicates with DB and Solr server


Sunspot

Sunspot

Hero.searchdo

fulltext ‘muscles'

with(:died_at).less_thanTime.now

order_by :summoned_at, :desc

paginate :page => 2, :per_page => 15

facet :sidekick

end

class Hero < ActiveRecord::Base

searchable do

text :description

string :sidekick do

sidekick.name

end

time :summoned_at

time :died_at

end

end


Sunspot1

Sunspot

DSL

Solr highlighting

Class hierarchy

Facets

Geographical searches

WillPaginate support

Lucene analyzers (tokenizers, filters …)


Elasticsearch

ElasticSearch

  • Standalone server based on Solr

    • (http://www.elasticsearch.org/)

  • Tire (Rails Gem), better than nothing

    • https://github.com/karmi/tire

    • communicates with DB and ElasticSearch server


Tools for text indexing and searching

Tire

class Hero < ActiveRecord::Base

include Tire::Model::Search

include Tire::Model::Callbacks

mapping do

indexes :description,:type => 'string‘, :analyzer => 'snowball‘

indexes :name,:type => 'string'

indexes :died_at,:type => ‘time‘

indexes :summoned_at,:type => ‘time‘

end

end

Hero.search ‘muscles'


Elasticsearch1

ElasticSearch

ADVANTAGES OF SOLR

REST

DISTRIBUTED!!!

http://www.youtube.com/watch?v=l4ReamjCxHo

For instance, Hadoop …

http://www.elasticsearch.org/guide/reference/modules/gateway/hadoop.html


  • Login