350 likes | 1.38k Views
SQL Server Full-Text Search Using full-text search in SQL Server 2005. Edward Forgács <EdwardForgacs@ssw.com.au>. What is full text search?. Allows language-specific queries of text data stored in SQL Server char varchar nchar nvarchar image (containing DOC or PDF data)
E N D
SQL Server Full-Text SearchUsing full-text search in SQL Server 2005 Edward Forgács <EdwardForgacs@ssw.com.au>
What is full text search? • Allows language-specific queries of text data stored in SQL Server • char • varchar • nchar • nvarchar • image (containing DOC or PDF data) • Supported by all editions ofSQL Server (except Express)
Results of LIKE • Only one result returned • Does not match • "email" • "e-mails" • "emails"
Why use full text search? • Much more powerful than LIKE • Used correctly, will produce more specific, relevant results • Better performance – LIKE queries are designed for small amounts of text data, full-text search scales to huge documents • Provides ranking of results • Common uses • Search through the content in a text-intensive, database driven website, e.g. a knowledge base • Search the contents of documents stored in BLOB fields • Perform advanced searches • e.g. with exact phrases - "to be or not to be" (however needs care!) • e.g. Boolean operators - AND, OR, NOT, NEAR
New for FTS in 2005 • XML support • Linked server support for queries • Integrated backup, restore and recovery • Faster queries and index building • Data definition language (DDL) statements for creating and altering indexes • System stored procedures deprecated • Noise Insensitivity – noise words no longer break the query • Accent Insensitivity (optional) – e.g. café and cafe are the same • Multiple columns can be included in full-text searches • Pre-computed ranking optimizations when using FREETEXTTABLE • Improved ranking algorithm • Catalogs can be set to populate continuously, track changes, or index when the CPU is idle
Setting up full-text search • Enable FTS for the database – check "Use full-text indexing" on the Files page, orEXEC sp_fulltext_database 'enable' • Create catalogs for the required tables and select columns to index
FREETEXT query • SyntaxSELECT ColumnsFROM TableWHEREFREETEXT((Column1, Column2), 'word1 word2')orFREETEXT(*, 'word1, word2') • Returns rows where the specified columns contain ANY form of ANY of the specified words – like AltaVista • Usually returns hundreds of irrelevant results, often poorly ranked • Quick and dirty – really not suitable for production usage
FREETEXTTABLE query • SyntaxSELECT Columns, KEY_TBL.RANKFROM CategoriesINNER JOIN FREETEXTTABLE(Categories, Column1, Column2, 'word1 word2') AS KEY_TBL ON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC • Produces the same results as a FREETEXT query, but additionally provides ranking information
CONTAINS query • SyntaxSELECT Column1, Column2FROM TableWHERE CONTAINS((Column1, Column2), '"exact phrase" AND FORMSOF(INFLECTIONAL, "word")') • Special full-text query syntax allows you to utilize the power of full-text search • Allows far more sophistocated queries, with boolean operators and exact phrases • Customizable word matching • Produces far more specific results than FREETEXT
CONTAINSTABLE query • SyntaxSELECT ColumnsFROM CategoriesINNER JOIN CONTAINSTABLE(Categories, Column, 'FORMSOF(INFLECTIONAL, "word")') AS KEY_TBLON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC • Returns the same results as a CONTAINS query, but with ranking information • Necessary to sort results by relevance
Demo: Running a simple full-text search query
Writing FTS terms • The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function • Several different types of terms: • Simple terms • Prefix terms • Generation terms • Proximity terms • Weighted terms
Simple terms • Either words or phrases • Quotes are optional, but recommended • Matches columns which contain the exact words or phrases specified • Case insensitive • Punctuation is ignored • e.g. • CONTAINS(Column, 'SQL') • CONTAINS(Column, ' "SQL" ') • CONTAINS(Column, 'Microsoft SQL Server') • CONTAINS(Column, ' "Microsoft SQL Server" ')
Prefix terms • Matches words beginning with the specified text • e.g. • CONTAINS(Column, ' "local*" ') • matches local, locally, locality • CONTAINS(Column, ' "local wine*" ') • matches "local winery", "locally wined"
Generation terms • Two types: • Inflectional • FORMSOF(INFLECTIONAL, "expression") • Thesaurus • FORMSOF(THESAURUS, "expression") • Both return variants of the specified word, but variants are determined differently
Inflectional • Matches plurals and words which share the same stem • When vague words such as "best" are used, doesn't match the exact word, only "good"
Thesaurus • Supposed to match synonyms of search terms – but the thesaurus seems to be very limited • Does not match plurals • Not particularly useful
Proximity terms • SyntaxCONTAINS(Column, 'local NEAR winery')CONTAINS(Column, ' "local" NEAR "winery" ') • Matches words which are NEAR each other • Terms on either side of NEAR must be either simple or proximity terms
Weighted terms • Each word can be given a rank • Can be combined with simple, prefix, generation and proximity terms • e.g. • CONTAINS(Column, 'ISABOUT( performance weight(.8), comfortable weight(.4))') • CONTAINS(Column, 'ISABOUT( FORMSOF(INFLECTIONAL, "performance") weight (.8), FORMSOF(INFLECTIONAL, "comfortable") weight (.4))')
FTS with BLOB fields • Full text search can be used with data stored in BLOB fields, such as Word and PDF documents • Two columns are necessary in the table • A BLOB column, such as IMAGE or VARBINARY, which holds the data to search • A data type column, e.g. VARCHAR(3), which holds the file extension to specify the type • When the catalog is created, the data type field also needs to be specified • Needs to be used with caution to avoid a massive performance hit
Managing Population Schedules • In SQL 2000, full text catalogs could only be populated on specified schedules • SQL 2005 can track database changes and keep the catalog up to date, with a minor performance hit • Additionally, schedules can now be set to run when the CPU is idle • Immediate population only accessible from the Catalog Properties dialog
Backing up full text catalogs • Full text catalogs are now included in SQL backups by default, and are retained when a database is detached and re-attached • Option in detach dialog to include keep the full text catalog
Disadvantages • Full text catalogs consume large amounts of disk space and must be kept up-to-date • If catalogs aren’t set to update continuously, results can be out-of-date • Continuous updating results in a performance hit • Queries are complicated to generate and must be generated as a string on the client • Requires care in stored procedures – full text clauses are always evaluated, regardless of OR operators • Contextual information not provided with any full-text search functions
SQL Server Full-Text SearchUsing full-text search in SQL Server 2005 Edward Forgács <EdwardForgacs@ssw.com.au>