SQL Server Full-Text Search Using full-text search in SQL Server 2005 - PowerPoint PPT Presentation

sql server full text search using full text search in sql server 2005 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SQL Server Full-Text Search Using full-text search in SQL Server 2005 PowerPoint Presentation
Download Presentation
SQL Server Full-Text Search Using full-text search in SQL Server 2005

play fullscreen
1 / 30
SQL Server Full-Text Search Using full-text search in SQL Server 2005
922 Views
Download Presentation
Philip
Download Presentation

SQL Server Full-Text Search Using full-text search in SQL Server 2005

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. SQL Server Full-Text SearchUsing full-text search in SQL Server 2005 Edward Forgács <EdwardForgacs@ssw.com.au>

  2. What is full text search? • Allows language-specific queries of text data stored in SQL Server • char • varchar • nchar • nvarchar • image (containing DOC or PDF data) • Supported by all editions ofSQL Server (except Express)

  3. What is wrong with LIKE queries for searching?

  4. Results of LIKE • Only one result returned • Does not match • "email" • "e-mails" • "emails"

  5. Why use full text search? • Much more powerful than LIKE • Used correctly, will produce more specific, relevant results • Better performance – LIKE queries are designed for small amounts of text data, full-text search scales to huge documents • Provides ranking of results • Common uses • Search through the content in a text-intensive, database driven website, e.g. a knowledge base • Search the contents of documents stored in BLOB fields • Perform advanced searches • e.g. with exact phrases - "to be or not to be" (however needs care!) • e.g. Boolean operators - AND, OR, NOT, NEAR

  6. New for FTS in 2005 • XML support • Linked server support for queries • Integrated backup, restore and recovery • Faster queries and index building • Data definition language (DDL) statements for creating and altering indexes • System stored procedures deprecated • Noise Insensitivity – noise words no longer break the query • Accent Insensitivity (optional) – e.g. café and cafe are the same • Multiple columns can be included in full-text searches • Pre-computed ranking optimizations when using FREETEXTTABLE • Improved ranking algorithm • Catalogs can be set to populate continuously, track changes, or index when the CPU is idle

  7. Setting up full-text search • Enable FTS for the database – check "Use full-text indexing" on the Files page, orEXEC sp_fulltext_database 'enable' • Create catalogs for the required tables and select columns to index

  8. Demo: Setting up FTSon a database

  9. FREETEXT query • SyntaxSELECT ColumnsFROM TableWHEREFREETEXT((Column1, Column2), 'word1 word2')orFREETEXT(*, 'word1, word2') • Returns rows where the specified columns contain ANY form of ANY of the specified words – like AltaVista • Usually returns hundreds of irrelevant results, often poorly ranked • Quick and dirty – really not suitable for production usage

  10. FREETEXTTABLE query • SyntaxSELECT Columns, KEY_TBL.RANKFROM CategoriesINNER JOIN FREETEXTTABLE(Categories, Column1, Column2, 'word1 word2') AS KEY_TBL ON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC • Produces the same results as a FREETEXT query, but additionally provides ranking information

  11. CONTAINS query • SyntaxSELECT Column1, Column2FROM TableWHERE CONTAINS((Column1, Column2), '"exact phrase" AND FORMSOF(INFLECTIONAL, "word")') • Special full-text query syntax allows you to utilize the power of full-text search • Allows far more sophistocated queries, with boolean operators and exact phrases • Customizable word matching • Produces far more specific results than FREETEXT

  12. CONTAINSTABLE query • SyntaxSELECT ColumnsFROM CategoriesINNER JOIN CONTAINSTABLE(Categories, Column, 'FORMSOF(INFLECTIONAL, "word")') AS KEY_TBLON Categories.CategoryID = KEY_TBL.[KEY]ORDER BY KEY_TBL.RANK DESC • Returns the same results as a CONTAINS query, but with ranking information • Necessary to sort results by relevance

  13. Demo: Running a simple full-text search query

  14. Writing FTS terms • The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function • Several different types of terms: • Simple terms • Prefix terms • Generation terms • Proximity terms • Weighted terms

  15. Simple terms • Either words or phrases • Quotes are optional, but recommended • Matches columns which contain the exact words or phrases specified • Case insensitive • Punctuation is ignored • e.g. • CONTAINS(Column, 'SQL') • CONTAINS(Column, ' "SQL" ') • CONTAINS(Column, 'Microsoft SQL Server') • CONTAINS(Column, ' "Microsoft SQL Server" ')

  16. Prefix terms • Matches words beginning with the specified text • e.g. • CONTAINS(Column, ' "local*" ') • matches local, locally, locality • CONTAINS(Column, ' "local wine*" ') • matches "local winery", "locally wined"

  17. Generation terms • Two types: • Inflectional • FORMSOF(INFLECTIONAL, "expression") • Thesaurus • FORMSOF(THESAURUS, "expression") • Both return variants of the specified word, but variants are determined differently

  18. Inflectional • Matches plurals and words which share the same stem • When vague words such as "best" are used, doesn't match the exact word, only "good"

  19. Thesaurus • Supposed to match synonyms of search terms – but the thesaurus seems to be very limited • Does not match plurals • Not particularly useful

  20. Proximity terms • SyntaxCONTAINS(Column, 'local NEAR winery')CONTAINS(Column, ' "local" NEAR "winery" ') • Matches words which are NEAR each other • Terms on either side of NEAR must be either simple or proximity terms

  21. Weighted terms • Each word can be given a rank • Can be combined with simple, prefix, generation and proximity terms • e.g. • CONTAINS(Column, 'ISABOUT( performance weight(.8), comfortable weight(.4))') • CONTAINS(Column, 'ISABOUT( FORMSOF(INFLECTIONAL, "performance") weight (.8), FORMSOF(INFLECTIONAL, "comfortable") weight (.4))')

  22. Demo: Using simple, prefix and generation terms

  23. Demo: Using CONTAINSTABLE

  24. FTS with BLOB fields • Full text search can be used with data stored in BLOB fields, such as Word and PDF documents • Two columns are necessary in the table • A BLOB column, such as IMAGE or VARBINARY, which holds the data to search • A data type column, e.g. VARCHAR(3), which holds the file extension to specify the type • When the catalog is created, the data type field also needs to be specified • Needs to be used with caution to avoid a massive performance hit

  25. Demo: Creating and searching FTS catalog with a BLOB field

  26. Managing Population Schedules • In SQL 2000, full text catalogs could only be populated on specified schedules • SQL 2005 can track database changes and keep the catalog up to date, with a minor performance hit • Additionally, schedules can now be set to run when the CPU is idle • Immediate population only accessible from the Catalog Properties dialog

  27. Demo: Creating and managing population schedules

  28. Backing up full text catalogs • Full text catalogs are now included in SQL backups by default, and are retained when a database is detached and re-attached • Option in detach dialog to include keep the full text catalog

  29. Disadvantages • Full text catalogs consume large amounts of disk space and must be kept up-to-date • If catalogs aren’t set to update continuously, results can be out-of-date • Continuous updating results in a performance hit • Queries are complicated to generate and must be generated as a string on the client • Requires care in stored procedures – full text clauses are always evaluated, regardless of OR operators • Contextual information not provided with any full-text search functions

  30. SQL Server Full-Text SearchUsing full-text search in SQL Server 2005 Edward Forgács <EdwardForgacs@ssw.com.au>