sql server full text search using full text search in sql server 2005
Download
Skip this Video
Download Presentation
SQL Server Full-Text Search Using full-text search in SQL Server 2005

Loading in 2 Seconds...

play fullscreen
1 / 30

SQL Server Full-Text Search - PowerPoint PPT Presentation


  • 888 Views
  • Uploaded on

SQL Server Full-Text Search Using full-text search in SQL Server 2005. Edward Forgács <[email protected]>. What is full text search?. Allows language-specific queries of text data stored in SQL Server char varchar nchar nvarchar image (containing DOC or PDF data)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SQL Server Full-Text Search' - Philip


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
what is full text search
What is full text search?
  • Allows language-specific queries of text data stored in SQL Server
    • char
    • varchar
    • nchar
    • nvarchar
    • image (containing DOC or PDF data)
  • Supported by all editions ofSQL Server (except Express)
results of like
Results of LIKE
  • Only one result returned
  • Does not match
    • "email"
    • "e-mails"
    • "emails"
why use full text search
Why use full text search?
  • Much more powerful than LIKE
    • Used correctly, will produce more specific, relevant results
    • Better performance – LIKE queries are designed for small amounts of text data, full-text search scales to huge documents
    • Provides ranking of results
  • Common uses
    • Search through the content in a text-intensive, database driven website, e.g. a knowledge base
    • Search the contents of documents stored in BLOB fields
    • Perform advanced searches
      • e.g. with exact phrases - "to be or not to be" (however needs care!)
      • e.g. Boolean operators - AND, OR, NOT, NEAR
new for fts in 2005
New for FTS in 2005
  • XML support
  • Linked server support for queries
  • Integrated backup, restore and recovery
  • Faster queries and index building
  • Data definition language (DDL) statements for creating and altering indexes
    • System stored procedures deprecated
  • Noise Insensitivity – noise words no longer break the query
  • Accent Insensitivity (optional) – e.g. café and cafe are the same
  • Multiple columns can be included in full-text searches
  • Pre-computed ranking optimizations when using FREETEXTTABLE
  • Improved ranking algorithm
  • Catalogs can be set to populate continuously, track changes, or index when the CPU is idle
setting up full text search
Setting up full-text search
  • Enable FTS for the database – check "Use full-text indexing" on the Files page, orEXEC sp_fulltext_database \'enable\'
  • Create catalogs for the required tables and select columns to index
freetext query
FREETEXT query
  • SyntaxSELECT ColumnsFROM TableWHEREFREETEXT((Column1, Column2), \'word1 word2\')orFREETEXT(*, \'word1, word2\')
  • Returns rows where the specified columns contain ANY form of ANY of the specified words – like AltaVista
  • Usually returns hundreds of irrelevant results, often poorly ranked
  • Quick and dirty – really not suitable for production usage
freetexttable query
FREETEXTTABLE query
  • SyntaxSELECT Columns, KEY_TBL.RANKFROM CategoriesINNER JOIN FREETEXTTABLE(Categories, Column1, Column2, \'word1 word2\') AS KEY_TBL ON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC
  • Produces the same results as a FREETEXT query, but additionally provides ranking information
contains query
CONTAINS query
  • SyntaxSELECT Column1, Column2FROM TableWHERE CONTAINS((Column1, Column2), \'"exact phrase" AND FORMSOF(INFLECTIONAL, "word")\')
  • Special full-text query syntax allows you to utilize the power of full-text search
  • Allows far more sophistocated queries, with boolean operators and exact phrases
  • Customizable word matching
  • Produces far more specific results than FREETEXT
containstable query
CONTAINSTABLE query
  • SyntaxSELECT ColumnsFROM CategoriesINNER JOIN CONTAINSTABLE(Categories, Column, \'FORMSOF(INFLECTIONAL, "word")\') AS KEY_TBLON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC
  • Returns the same results as a CONTAINS query, but with ranking information
  • Necessary to sort results by relevance
slide13
Demo: Running a simple

full-text search query

writing fts terms
Writing FTS terms
  • The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function
  • Several different types of terms:
    • Simple terms
    • Prefix terms
    • Generation terms
    • Proximity terms
    • Weighted terms
simple terms
Simple terms
  • Either words or phrases
  • Quotes are optional, but recommended
  • Matches columns which contain the exact words or phrases specified
  • Case insensitive
  • Punctuation is ignored
  • e.g.
    • CONTAINS(Column, \'SQL\')
    • CONTAINS(Column, \' "SQL" \')
    • CONTAINS(Column, \'Microsoft SQL Server\')
    • CONTAINS(Column, \' "Microsoft SQL Server" \')
prefix terms
Prefix terms
  • Matches words beginning with the specified text
  • e.g.
    • CONTAINS(Column, \' "local*" \')
      • matches local, locally, locality
    • CONTAINS(Column, \' "local wine*" \')
      • matches "local winery", "locally wined"
generation terms
Generation terms
  • Two types:
    • Inflectional
      • FORMSOF(INFLECTIONAL, "expression")
    • Thesaurus
      • FORMSOF(THESAURUS, "expression")
  • Both return variants of the specified word, but variants are determined differently
inflectional
Inflectional
  • Matches plurals and words which share the same stem
  • When vague words such as "best" are used, doesn\'t match the exact word, only "good"
thesaurus
Thesaurus
  • Supposed to match synonyms of search terms – but the thesaurus seems to be very limited
  • Does not match plurals
  • Not particularly useful
proximity terms
Proximity terms
  • SyntaxCONTAINS(Column, \'local NEAR winery\')CONTAINS(Column, \' "local" NEAR "winery" \')
  • Matches words which are NEAR each other
  • Terms on either side of NEAR must be either simple or proximity terms
weighted terms
Weighted terms
  • Each word can be given a rank
  • Can be combined with simple, prefix, generation and proximity terms
  • e.g.
    • CONTAINS(Column, \'ISABOUT( performance weight(.8), comfortable weight(.4))\')
    • CONTAINS(Column, \'ISABOUT( FORMSOF(INFLECTIONAL, "performance") weight (.8), FORMSOF(INFLECTIONAL, "comfortable") weight (.4))\')
fts with blob fields
FTS with BLOB fields
  • Full text search can be used with data stored in BLOB fields, such as Word and PDF documents
  • Two columns are necessary in the table
    • A BLOB column, such as IMAGE or VARBINARY, which holds the data to search
    • A data type column, e.g. VARCHAR(3), which holds the file extension to specify the type
  • When the catalog is created, the data type field also needs to be specified
  • Needs to be used with caution to avoid a massive performance hit
managing population schedules
Managing Population Schedules
  • In SQL 2000, full text catalogs could only be populated on specified schedules
  • SQL 2005 can track database changes and keep the catalog up to date, with a minor performance hit
  • Additionally, schedules can now be set to run when the CPU is idle
  • Immediate population only accessible from the Catalog Properties dialog
backing up full text catalogs
Backing up full text catalogs
  • Full text catalogs are now included in SQL backups by default, and are retained when a database is detached and re-attached
  • Option in detach dialog to include keep the full text catalog
disadvantages
Disadvantages
  • Full text catalogs consume large amounts of disk space and must be kept up-to-date
  • If catalogs aren’t set to update continuously, results can be out-of-date
  • Continuous updating results in a performance hit
  • Queries are complicated to generate and must be generated as a string on the client
  • Requires care in stored procedures – full text clauses are always evaluated, regardless of OR operators
  • Contextual information not provided with any full-text search functions
ad