1 / 30

SQL Server Full-Text Search Using full-text search in SQL Server 2005

SQL Server Full-Text Search Using full-text search in SQL Server 2005. Edward Forgács <EdwardForgacs@ssw.com.au>. What is full text search?. Allows language-specific queries of text data stored in SQL Server char varchar nchar nvarchar image (containing DOC or PDF data)

Philip
Download Presentation

SQL Server Full-Text Search Using full-text search in SQL Server 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQL Server Full-Text SearchUsing full-text search in SQL Server 2005 Edward Forgács <EdwardForgacs@ssw.com.au>

  2. What is full text search? • Allows language-specific queries of text data stored in SQL Server • char • varchar • nchar • nvarchar • image (containing DOC or PDF data) • Supported by all editions ofSQL Server (except Express)

  3. What is wrong with LIKE queries for searching?

  4. Results of LIKE • Only one result returned • Does not match • "email" • "e-mails" • "emails"

  5. Why use full text search? • Much more powerful than LIKE • Used correctly, will produce more specific, relevant results • Better performance – LIKE queries are designed for small amounts of text data, full-text search scales to huge documents • Provides ranking of results • Common uses • Search through the content in a text-intensive, database driven website, e.g. a knowledge base • Search the contents of documents stored in BLOB fields • Perform advanced searches • e.g. with exact phrases - "to be or not to be" (however needs care!) • e.g. Boolean operators - AND, OR, NOT, NEAR

  6. New for FTS in 2005 • XML support • Linked server support for queries • Integrated backup, restore and recovery • Faster queries and index building • Data definition language (DDL) statements for creating and altering indexes • System stored procedures deprecated • Noise Insensitivity – noise words no longer break the query • Accent Insensitivity (optional) – e.g. café and cafe are the same • Multiple columns can be included in full-text searches • Pre-computed ranking optimizations when using FREETEXTTABLE • Improved ranking algorithm • Catalogs can be set to populate continuously, track changes, or index when the CPU is idle

  7. Setting up full-text search • Enable FTS for the database – check "Use full-text indexing" on the Files page, orEXEC sp_fulltext_database 'enable' • Create catalogs for the required tables and select columns to index

  8. Demo: Setting up FTSon a database

  9. FREETEXT query • SyntaxSELECT ColumnsFROM TableWHEREFREETEXT((Column1, Column2), 'word1 word2')orFREETEXT(*, 'word1, word2') • Returns rows where the specified columns contain ANY form of ANY of the specified words – like AltaVista • Usually returns hundreds of irrelevant results, often poorly ranked • Quick and dirty – really not suitable for production usage

  10. FREETEXTTABLE query • SyntaxSELECT Columns, KEY_TBL.RANKFROM CategoriesINNER JOIN FREETEXTTABLE(Categories, Column1, Column2, 'word1 word2') AS KEY_TBL ON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC • Produces the same results as a FREETEXT query, but additionally provides ranking information

  11. CONTAINS query • SyntaxSELECT Column1, Column2FROM TableWHERE CONTAINS((Column1, Column2), '"exact phrase" AND FORMSOF(INFLECTIONAL, "word")') • Special full-text query syntax allows you to utilize the power of full-text search • Allows far more sophistocated queries, with boolean operators and exact phrases • Customizable word matching • Produces far more specific results than FREETEXT

  12. CONTAINSTABLE query • SyntaxSELECT ColumnsFROM CategoriesINNER JOIN CONTAINSTABLE(Categories, Column, 'FORMSOF(INFLECTIONAL, "word")') AS KEY_TBLON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC • Returns the same results as a CONTAINS query, but with ranking information • Necessary to sort results by relevance

  13. Demo: Running a simple full-text search query

  14. Writing FTS terms • The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function • Several different types of terms: • Simple terms • Prefix terms • Generation terms • Proximity terms • Weighted terms

  15. Simple terms • Either words or phrases • Quotes are optional, but recommended • Matches columns which contain the exact words or phrases specified • Case insensitive • Punctuation is ignored • e.g. • CONTAINS(Column, 'SQL') • CONTAINS(Column, ' "SQL" ') • CONTAINS(Column, 'Microsoft SQL Server') • CONTAINS(Column, ' "Microsoft SQL Server" ')

  16. Prefix terms • Matches words beginning with the specified text • e.g. • CONTAINS(Column, ' "local*" ') • matches local, locally, locality • CONTAINS(Column, ' "local wine*" ') • matches "local winery", "locally wined"

  17. Generation terms • Two types: • Inflectional • FORMSOF(INFLECTIONAL, "expression") • Thesaurus • FORMSOF(THESAURUS, "expression") • Both return variants of the specified word, but variants are determined differently

  18. Inflectional • Matches plurals and words which share the same stem • When vague words such as "best" are used, doesn't match the exact word, only "good"

  19. Thesaurus • Supposed to match synonyms of search terms – but the thesaurus seems to be very limited • Does not match plurals • Not particularly useful

  20. Proximity terms • SyntaxCONTAINS(Column, 'local NEAR winery')CONTAINS(Column, ' "local" NEAR "winery" ') • Matches words which are NEAR each other • Terms on either side of NEAR must be either simple or proximity terms

  21. Weighted terms • Each word can be given a rank • Can be combined with simple, prefix, generation and proximity terms • e.g. • CONTAINS(Column, 'ISABOUT( performance weight(.8), comfortable weight(.4))') • CONTAINS(Column, 'ISABOUT( FORMSOF(INFLECTIONAL, "performance") weight (.8), FORMSOF(INFLECTIONAL, "comfortable") weight (.4))')

  22. Demo: Using simple, prefix and generation terms

  23. Demo: Using CONTAINSTABLE

  24. FTS with BLOB fields • Full text search can be used with data stored in BLOB fields, such as Word and PDF documents • Two columns are necessary in the table • A BLOB column, such as IMAGE or VARBINARY, which holds the data to search • A data type column, e.g. VARCHAR(3), which holds the file extension to specify the type • When the catalog is created, the data type field also needs to be specified • Needs to be used with caution to avoid a massive performance hit

  25. Demo: Creating and searching FTS catalog with a BLOB field

  26. Managing Population Schedules • In SQL 2000, full text catalogs could only be populated on specified schedules • SQL 2005 can track database changes and keep the catalog up to date, with a minor performance hit • Additionally, schedules can now be set to run when the CPU is idle • Immediate population only accessible from the Catalog Properties dialog

  27. Demo: Creating and managing population schedules

  28. Backing up full text catalogs • Full text catalogs are now included in SQL backups by default, and are retained when a database is detached and re-attached • Option in detach dialog to include keep the full text catalog

  29. Disadvantages • Full text catalogs consume large amounts of disk space and must be kept up-to-date • If catalogs aren’t set to update continuously, results can be out-of-date • Continuous updating results in a performance hit • Queries are complicated to generate and must be generated as a string on the client • Requires care in stored procedures – full text clauses are always evaluated, regardless of OR operators • Contextual information not provided with any full-text search functions

  30. SQL Server Full-Text SearchUsing full-text search in SQL Server 2005 Edward Forgács <EdwardForgacs@ssw.com.au>

More Related