1 / 47

Introduction to Full-Text Searching in SQL Server 2012

Introduction to Full-Text Searching in SQL Server 2012. Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com. Outline. What can we do with FTS? How to install FTS FTS components Creating FTS indexes How to query with FTS FILESTREAM and FileTable. FTS Basics.

uri
Download Presentation

Introduction to Full-Text Searching in SQL Server 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introductionto Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D.IT Impact, Inc. asocorro@itimpact.com

  2. Outline • What can we do with FTS? • How to install FTS • FTS components • Creating FTS indexes • How to query with FTS • FILESTREAM and FileTable

  3. FTS Basics FTS allows searching against character-based data • char • varchar • nchar • nvarchar • text • ntext • image • xml • varbinary • varbinary(max)

  4. Search Functionality

  5. Search Functionality

  6. A First Look • Let’s run some simple examples to get a feel for FTS!

  7. LIKE vs FTS • LIKE works on character patterns only • Cannot use the LIKE predicate to query formatted binary data • FTS is much faster against large amounts of unstructured text data

  8. Supported SQL Server Editions • Enterprise • Business Intelligence • Standard • Web • Express with Advanced Services Available since at least SQL Server 2000

  9. FTS Components

  10. Language Support • 50+ languages • Language-specific components • Word breakers and stemmers • Stoplists • Thesaurus files

  11. How to Install

  12. Default FTS Language

  13. FTS Indexes • One index per table or indexed view • Must have a unique, single-column, non-nullable index on the table • Grouped within the same database into one or more full-text catalogs (“containers”)

  14. Full-Text Catalogs • A logical construct • A way to manage FT indexes together

  15. Index Population • Population: the addition of data to full-text indexes

  16. Steps to Setup an Index on a Table

  17. Full-Text Index Wizard

  18. Example: Create Catalog and Index

  19. CONTAINS • Precise or prefix matches to single words and phrases • Proximity matches • Logical operations between conditions: AND, OR, AND NOT • Optional use of inflectional formsand thesaurus

  20. FREETEXT • Matching the meaning, but not the exact wording, of specified words or phrases • Always uses inflectional forms and thesaurus

  21. CONTAINSTABLE AND FREETEXTTABLE • Return a relevance ranking value (RANK) and full-text key (KEY) for each row • The actual RANK values are unimportant and typically differ each time the query is run • ISABOUT/WEIGHT influence the rankingin CONTAINSTABLE

  22. Example: Queries

  23. Stoplists • A mechanism to discard commonly occurring strings that do not help the search

  24. Thesaurus • Nicknames: Robert/Bob • Common misspellings: calendar/calender • Homophones: Geoff/Jeff • Technical terms: proc/procedure Very powerful if you log searches and learn what users are commonly searching for

  25. Thesaurus • One file per language

  26. Filters • Extract textual information from the document (removing the formatting) • Send the text to the word-breaker component for the language associated with the column • Need to manually install Office 2010 and PDF filters

  27. Example: FTS Components

  28. Where to Store Large Objects?

  29. Why Store in the Database? • Integrating unstructured data into the relational database provides significant benefits: • Integrated storage and data management capabilities (e.g., backup) • Ease of administration and policy management • Full-text search

  30. FILESTREAM • A database/file system hybrid • FILESTREAM is an attribute that can be assigned to a varbinary(max) column • Allows storing BLOB data in the file system • Not restricted to the 2 GB limitSQL Server imposes on BLOBs

  31. FILESTREAM • SQL Server buffer pool is not used • Isolation semantics are governed byDatabase Engine transaction isolation levels

  32. Steps to FILESTREAM

  33. OS-level Configuration ofFILESTREAM

  34. Instance-level Configuration of FILESTREAM

  35. Example: FILESTREAM

  36. FILESTREAM • All data access must be transactional • Must use specific APIs for file I/O • Do not edit the files directly!

  37. When to Use FILESTREAM • Objects that are being stored are, on average, larger than 1 MB • Store smaller objects in the database • Fast read access is important • You are using a middle tier for application logic

  38. FileTables • A special, fixed-schema kind of table • Builds on top of existing FILESTREAM capabilities • Store files and documents in in the database, but access them from Windows applications as if they were stored in the file system (WIN32 API)

  39. FileTables • Hierarchical namespace • Includes file system properties as columns • Preserves full file names • Non-transactional access through the FS

  40. FileTables • Calls to create or change a file or directory through the Windows share are intercepted by a SQL Server component and reflected in the corresponding relational data in the FileTable

  41. Example: FTS over FileTables

  42. FileTables vs FILESTREAM • File and directory hierarchy maintained in the database • Windows application compatibility • Relational access to file attributes • Both are available in all editions

  43. Wrap Up • Advanced searching on character-based data, including documents • FTS setup, components, and queries • FILESTREAM • FileTables

  44. Other Topics • Document-property search • Semantic search • Optimizations • Query plans and execution traces

  45. References • Posts and presentations by Bob Beauchemin • http://www.sqlskills.com/blogs/bobb/ • Blog: SQL Server FTS Team Blog • http://blogs.msdn.com/b/sqlfts • SQL Server 2012 Books Online • http://msdn.microsoft.com/en-us/library/cc645577(SQL.110).aspx

  46. Filter Packs • Adobe PDF Filter • http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=4025&fileID=3941 • Office 2010 Filters • http://www.microsoft.com/en-us/download/details.aspx?id=17062

More Related