Sql server full text search using full text search in sql server 2005
Download
1 / 30

SQL Server Full-Text Search Using full-text search in SQL Server 2005 - PowerPoint PPT Presentation


SQL Server Full-Text Search Using full-text search in SQL Server 2005. Edward Forgács <EdwardForgacs@ssw.com.au>. What is full text search?. Allows language-specific queries of text data stored in SQL Server char varchar nchar nvarchar image (containing DOC or PDF data)

Related searches for SQL Server Full-Text Search Using full-text search in SQL Server 2005

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

SQL Server Full-Text Search Using full-text search in SQL Server 2005

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


SQL Server Full-Text SearchUsing full-text search in SQL Server 2005

Edward Forgács <EdwardForgacs@ssw.com.au>


What is full text search?

  • Allows language-specific queries of text data stored in SQL Server

    • char

    • varchar

    • nchar

    • nvarchar

    • image (containing DOC or PDF data)

  • Supported by all editions ofSQL Server (except Express)


What is wrong with LIKE queries for searching?


Results of LIKE

  • Only one result returned

  • Does not match

    • "email"

    • "e-mails"

    • "emails"


Why use full text search?

  • Much more powerful than LIKE

    • Used correctly, will produce more specific, relevant results

    • Better performance – LIKE queries are designed for small amounts of text data, full-text search scales to huge documents

    • Provides ranking of results

  • Common uses

    • Search through the content in a text-intensive, database driven website, e.g. a knowledge base

    • Search the contents of documents stored in BLOB fields

    • Perform advanced searches

      • e.g. with exact phrases - "to be or not to be" (however needs care!)

      • e.g. Boolean operators - AND, OR, NOT, NEAR


New for FTS in 2005

  • XML support

  • Linked server support for queries

  • Integrated backup, restore and recovery

  • Faster queries and index building

  • Data definition language (DDL) statements for creating and altering indexes

    • System stored procedures deprecated

  • Noise Insensitivity – noise words no longer break the query

  • Accent Insensitivity (optional) – e.g. café and cafe are the same

  • Multiple columns can be included in full-text searches

  • Pre-computed ranking optimizations when using FREETEXTTABLE

  • Improved ranking algorithm

  • Catalogs can be set to populate continuously, track changes, or index when the CPU is idle


Setting up full-text search

  • Enable FTS for the database – check "Use full-text indexing" on the Files page, orEXEC sp_fulltext_database 'enable'

  • Create catalogs for the required tables and select columns to index


Demo: Setting up FTSon a database


FREETEXT query

  • SyntaxSELECT ColumnsFROM TableWHEREFREETEXT((Column1, Column2), 'word1 word2')orFREETEXT(*, 'word1, word2')

  • Returns rows where the specified columns contain ANY form of ANY of the specified words – like AltaVista

  • Usually returns hundreds of irrelevant results, often poorly ranked

  • Quick and dirty – really not suitable for production usage


FREETEXTTABLE query

  • SyntaxSELECT Columns, KEY_TBL.RANKFROM CategoriesINNER JOIN FREETEXTTABLE(Categories, Column1, Column2,'word1 word2') AS KEY_TBL ON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC

  • Produces the same results as a FREETEXT query, but additionally provides ranking information


CONTAINS query

  • SyntaxSELECT Column1, Column2FROM TableWHERE CONTAINS((Column1, Column2),'"exact phrase" AND FORMSOF(INFLECTIONAL, "word")')

  • Special full-text query syntax allows you to utilize the power of full-text search

  • Allows far more sophistocated queries, with boolean operators and exact phrases

  • Customizable word matching

  • Produces far more specific results than FREETEXT


CONTAINSTABLE query

  • SyntaxSELECT ColumnsFROM CategoriesINNER JOIN CONTAINSTABLE(Categories, Column,'FORMSOF(INFLECTIONAL, "word")') AS KEY_TBLON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC

  • Returns the same results as a CONTAINS query, but with ranking information

  • Necessary to sort results by relevance


Demo: Running a simple

full-text search query


Writing FTS terms

  • The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function

  • Several different types of terms:

    • Simple terms

    • Prefix terms

    • Generation terms

    • Proximity terms

    • Weighted terms


Simple terms

  • Either words or phrases

  • Quotes are optional, but recommended

  • Matches columns which contain the exact words or phrases specified

  • Case insensitive

  • Punctuation is ignored

  • e.g.

    • CONTAINS(Column, 'SQL')

    • CONTAINS(Column, ' "SQL" ')

    • CONTAINS(Column, 'Microsoft SQL Server')

    • CONTAINS(Column, ' "Microsoft SQL Server" ')


Prefix terms

  • Matches words beginning with the specified text

  • e.g.

    • CONTAINS(Column, ' "local*" ')

      • matches local, locally, locality

    • CONTAINS(Column, ' "local wine*" ')

      • matches "local winery", "locally wined"


Generation terms

  • Two types:

    • Inflectional

      • FORMSOF(INFLECTIONAL, "expression")

    • Thesaurus

      • FORMSOF(THESAURUS, "expression")

  • Both return variants of the specified word, but variants are determined differently


Inflectional

  • Matches plurals and words which share the same stem

  • When vague words such as "best" are used, doesn't match the exact word, only "good"


Thesaurus

  • Supposed to match synonyms of search terms – but the thesaurus seems to be very limited

  • Does not match plurals

  • Not particularly useful


Proximity terms

  • SyntaxCONTAINS(Column, 'local NEAR winery')CONTAINS(Column, ' "local" NEAR "winery" ')

  • Matches words which are NEAR each other

  • Terms on either side of NEAR must be either simple or proximity terms


Weighted terms

  • Each word can be given a rank

  • Can be combined with simple, prefix, generation and proximity terms

  • e.g.

    • CONTAINS(Column, 'ISABOUT(performance weight(.8),comfortable weight(.4))')

    • CONTAINS(Column, 'ISABOUT(FORMSOF(INFLECTIONAL, "performance") weight (.8),FORMSOF(INFLECTIONAL, "comfortable") weight (.4))')


Demo: Using simple, prefix and generation terms


Demo: Using CONTAINSTABLE


FTS with BLOB fields

  • Full text search can be used with data stored in BLOB fields, such as Word and PDF documents

  • Two columns are necessary in the table

    • A BLOB column, such as IMAGE or VARBINARY, which holds the data to search

    • A data type column, e.g. VARCHAR(3), which holds the file extension to specify the type

  • When the catalog is created, the data type field also needs to be specified

  • Needs to be used with caution to avoid a massive performance hit


Demo: Creating and searching FTS catalog with a BLOB field


Managing Population Schedules

  • In SQL 2000, full text catalogs could only be populated on specified schedules

  • SQL 2005 can track database changes and keep the catalog up to date, with a minor performance hit

  • Additionally, schedules can now be set to run when the CPU is idle

  • Immediate population only accessible from the Catalog Properties dialog


Demo: Creating and managing population schedules


Backing up full text catalogs

  • Full text catalogs are now included in SQL backups by default, and are retained when a database is detached and re-attached

  • Option in detach dialog to include keep the full text catalog


Disadvantages

  • Full text catalogs consume large amounts of disk space and must be kept up-to-date

  • If catalogs aren’t set to update continuously, results can be out-of-date

  • Continuous updating results in a performance hit

  • Queries are complicated to generate and must be generated as a string on the client

  • Requires care in stored procedures – full text clauses are always evaluated, regardless of OR operators

  • Contextual information not provided with any full-text search functions


SQL Server Full-Text SearchUsing full-text search in SQL Server 2005

Edward Forgács <EdwardForgacs@ssw.com.au>


ad
  • Login