Introduction to full text searching in sql server 2012
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

Introduction to Full-Text Searching in SQL Server 2012 PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to Full-Text Searching in SQL Server 2012. Adolfo J. Socorro, Ph.D. IT Impact, Inc. [email protected] Outline. What can we do with FTS? How to install FTS FTS components Creating FTS indexes How to query with FTS FILESTREAM and FileTable. FTS Basics.

Download Presentation

Introduction to Full-Text Searching in SQL Server 2012

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to full text searching in sql server 2012

Introductionto Full-Text Searching in SQL Server 2012

Adolfo J. Socorro, Ph.D.IT Impact, Inc.

[email protected]


Outline

Outline

  • What can we do with FTS?

  • How to install FTS

  • FTS components

  • Creating FTS indexes

  • How to query with FTS

  • FILESTREAM and FileTable


Fts basics

FTS Basics

FTS allows searching against character-based data

  • char

  • varchar

  • nchar

  • nvarchar

  • text

  • ntext

  • image

  • xml

  • varbinary

  • varbinary(max)


Search functionality

Search Functionality


Search functionality1

Search Functionality


A first look

A First Look

  • Let’s run some simple examples to get a feel for FTS!


Like vs fts

LIKE vs FTS

  • LIKE works on character patterns only

  • Cannot use the LIKE predicate to query formatted binary data

  • FTS is much faster against large amounts of unstructured text data


Supported sql server editions

Supported SQL Server Editions

  • Enterprise

  • Business Intelligence

  • Standard

  • Web

  • Express with Advanced Services

    Available since at least SQL Server 2000


Fts components

FTS Components


Language support

Language Support

  • 50+ languages

  • Language-specific components

    • Word breakers and stemmers

    • Stoplists

    • Thesaurus files


How to install

How to Install


Default fts language

Default FTS Language


Fts indexes

FTS Indexes

  • One index per table or indexed view

  • Must have a unique, single-column, non-nullable index on the table

  • Grouped within the same database into one or more full-text catalogs (“containers”)


Full text catalogs

Full-Text Catalogs

  • A logical construct

  • A way to manage FT indexes together


Index population

Index Population

  • Population: the addition of data to full-text indexes


Steps to setup an index on a table

Steps to Setup an Index on a Table


Full text index wizard

Full-Text Index Wizard


Example create catalog and index

Example: Create Catalog and Index


Contains

CONTAINS

  • Precise or prefix matches to single words and phrases

  • Proximity matches

  • Logical operations between conditions: AND, OR, AND NOT

  • Optional use of inflectional formsand thesaurus


Freetext

FREETEXT

  • Matching the meaning, but not the exact wording, of specified words or phrases

  • Always uses inflectional forms and thesaurus


Containstable and freetexttable

CONTAINSTABLE AND FREETEXTTABLE

  • Return a relevance ranking value (RANK) and full-text key (KEY) for each row

  • The actual RANK values are unimportant and typically differ each time the query is run

  • ISABOUT/WEIGHT influence the rankingin CONTAINSTABLE


Example queries

Example: Queries


Stoplists

Stoplists

  • A mechanism to discard commonly occurring strings that do not help the search


Thesaurus

Thesaurus

  • Nicknames: Robert/Bob

  • Common misspellings: calendar/calender

  • Homophones: Geoff/Jeff

  • Technical terms: proc/procedure

Very powerful if you log searches and learn what users are commonly searching for


Thesaurus1

Thesaurus

  • One file per language


Filters

Filters

  • Extract textual information from the document (removing the formatting)

  • Send the text to the word-breaker component for the language associated with the column

  • Need to manually install Office 2010 and PDF filters


Example fts components

Example: FTS Components


Where to store large objects

Where to Store Large Objects?


Why store in the database

Why Store in the Database?

  • Integrating unstructured data into the relational database provides significant benefits:

    • Integrated storage and data management capabilities (e.g., backup)

    • Ease of administration and policy management

    • Full-text search


Filestream

FILESTREAM

  • A database/file system hybrid

  • FILESTREAM is an attribute that can be assigned to a varbinary(max) column

  • Allows storing BLOB data in the file system

  • Not restricted to the 2 GB limitSQL Server imposes on BLOBs


Filestream1

FILESTREAM

  • SQL Server buffer pool is not used

  • Isolation semantics are governed byDatabase Engine transaction isolation levels


Steps to filestream

Steps to FILESTREAM


Os level configuration of filestream

OS-level Configuration ofFILESTREAM


Instance level configuration of filestream

Instance-level Configuration of FILESTREAM


Example filestream

Example: FILESTREAM


Filestream2

FILESTREAM

  • All data access must be transactional

  • Must use specific APIs for file I/O

  • Do not edit the files directly!


When to use filestream

When to Use FILESTREAM

  • Objects that are being stored are, on average, larger than 1 MB

    • Store smaller objects in the database

  • Fast read access is important

  • You are using a middle tier for application logic


Filetables

FileTables

  • A special, fixed-schema kind of table

  • Builds on top of existing FILESTREAM capabilities

  • Store files and documents in in the database, but access them from Windows applications as if they were stored in the file system (WIN32 API)


Filetables1

FileTables

  • Hierarchical namespace

  • Includes file system properties as columns

  • Preserves full file names

  • Non-transactional access through the FS


Filetables2

FileTables

  • Calls to create or change a file or directory through the Windows share are intercepted by a SQL Server component and reflected in the corresponding relational data in the FileTable


Example fts over filetables

Example: FTS over FileTables


Filetables vs filestream

FileTables vs FILESTREAM

  • File and directory hierarchy maintained in the database

  • Windows application compatibility

  • Relational access to file attributes

  • Both are available in all editions


Wrap up

Wrap Up

  • Advanced searching on character-based data, including documents

  • FTS setup, components, and queries

  • FILESTREAM

  • FileTables


Other topics

Other Topics

  • Document-property search

  • Semantic search

  • Optimizations

  • Query plans and execution traces


References

References

  • Posts and presentations by Bob Beauchemin

    • http://www.sqlskills.com/blogs/bobb/

  • Blog: SQL Server FTS Team Blog

    • http://blogs.msdn.com/b/sqlfts

  • SQL Server 2012 Books Online

    • http://msdn.microsoft.com/en-us/library/cc645577(SQL.110).aspx


Filter packs

Filter Packs

  • Adobe PDF Filter

    • http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=4025&fileID=3941

  • Office 2010 Filters

    • http://www.microsoft.com/en-us/download/details.aspx?id=17062


  • Login