Sql server full text search sql server 2008
This presentation is the property of its rightful owner.
Sponsored Links
1 / 51

SQL Server Full-Text Search SQL Server 2008 PowerPoint PPT Presentation


  • 139 Views
  • Uploaded on
  • Presentation posted in: General

SQL Server Full-Text Search SQL Server 2008. Plan. Qu’est-ce que la recherche Full-Text La méthode SQL server 2008 Comment faire Requête. Introduction.

Download Presentation

SQL Server Full-Text Search SQL Server 2008

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sql server full text search sql server 2008

SQL Server Full-Text SearchSQL Server 2008


Sql server full text search sql server 2008

Plan

  • Qu’est-ceque la recherche Full-Text

  • La méthode SQL server 2008

  • Comment faire

  • Requête


Introduction

Introduction

  • L'indexation plain texte, aussi appelée indexation de texte intégrale, propose de fournir un service de recherche rapide d'informations déstructurées basée sur des mots, parties de mots, expressions, formes fléchies, synonymes, etc. contenu dans une ligne d'une table.

  • Un service(Full textsearch) permet d’indexer les éléments contenue dans notre base de données dans le but de permettre une recherche efficace.


Type de recherche

Type de recherche

  • La technologie full-text permet de créer des indexes sur base de mots non parasites et d'utiliser ces indexes pour des recherches (avec support linguistique) ainsi que des recherches de proximité. 

  • Les mots parasites sont les mots de liaisons, les pronoms, … tels que " je ", " de ", " car " et bien d'autres. Ceci permet d'éviter d'avoir comme résultat des données faussées par ces mots. Ainsi, si l'on effectue une recherche avec comme critère la phrase " je veux une classe mfc qui permet de faire du gdi ", seuls les mots " classe ", " mfc " et " gdi " seront pris en compte. 


Type de recherche1

Type de recherche

  • Le support linguistique permet de définir, en interne, toute une série de mots qui dérivent de ceux qui forment les critères. Ainsi, pour le mot " nager ", le système effectuera la recherche avec les mots " nager ", " nage ", " nagé(e) ".

  • La recherche de proximité permet de spécifier qu'un mot doit se trouver près d'un autre. Dans le cas de Sql Server, il s'agit d'un " rayon " de 50 mots. Ce nombre est fixé et n'est pas paramétrable.


On aurait pu

On auraitpu….

  • SELECT *FROM [Northwind].[dbo].[Employees]WHERE Notes LIKE '%grad%‘

  • Cependant les performances ne sont pas aussi bonne et la possibilité… n’enparlonsmême pas


Th orie

Théorie


Full text search terminology 1 3

Full-Text Search Terminology 1/3

  • Full-text index

    • Information à propos des mots et leur emplacement dans les colonnes

    • Utilisédans les requêtes full text

  • Full-text catalog

    • Groupelogiqued’index (Container)

  • Word breaker

    • Identifie les mots pour lesquelsilexiste des dérivations(en fonction de la langue)


  • Full text search terminology 2 3

    Full-Text Search Terminology 2/3

    • Token

      • Les mots identifiés par le word breaker

    • Stemmer

      • Génère des dérivation(inflectional) d’un mots (propre à la langue)

    • Filter

      • Extrait le texte d’un fichiersauvegarderdans un varbinary(max)ouuneimage

    • Population or Crawl

      • Créer et mettre à jour l’index full-text index.


    Full text search terminology 3 3

    Full-Text Search Terminology 3/3

    • Stopwords/Stoplists/Noise Word

      • Mot sans importance

      • Par exemple: ‘le’, ‘la’, ‘les’, ‘à’ etc.

    • Accent insensitivity

      • café = cafe


    Fulltext search sous le capot

    Fulltext search – sous le capot


    Comment administrer

    CommentAdministrer


    Sql server full text search sql server 2008

    Administrer la recherche Full-Text

    Troistâche:

    • Créer/Modifier/Supprimer des catalogs full-text

    • Créer/Modifier/Supprimer des indexesfull-text

    • Céduler et maintenir les informationsdansl’index.


    Sql server full text search sql server 2008

    Comment

    Créer un Full Text Catalog


    Sql 2008

    SQL 2008


    Comment cr er un full text indexes

    Comment Créer un Full Text Indexes


    Sql server full text search sql server 2008

    Property of column


    Sql server full text search sql server 2008

    Full-text Index property window


    Comment remplir l index et le catalog

    CommentRemplirl’Index et le Catalog


    Remplir l index full text

    Remplirl’index Full-Text

    • Full

      • Créel’index

      • Utilise beaucoup de ressources

    • Incremental

      • Met à jour automatiquement l’index en mettant à jour les informations qui ont été mises à jour depuis la dernière mise à jour.


    Remplir l index full text1

    Remplirl’index Full-Text

    • Update

      • Gardeune trace des modifications (inserts, updates, and deletes)

      • Propage les mises à jour selondeuxfaçons:

        • AUTO automatic

        • MANUAL basésurunecédulemanuel des changements


    Comment rechercher full text

    CommentRechercher Full-Text

    Les mots clés:

    • FREETEXT

    • FREETEXTTABLE

    • CONTAINS

    • CONTAINSTABLE


    Sql server full text search sql server 2008

    FREETEXT

    • Recherchefloue

      • Inflectional forms (Stemming)

      • Related words (Thesaurus)

    • SELECT *

    • FROM Artiste

    • WHERE Description like '%francais%';

    • SELECT *

    • FROM Artiste

    • WHERE FreeText (*, 'francais)')

    • SELECT *

    • FROM Artiste

    • WHERE Contains (*, 'FORMSOF(INFLECTIONAL, francais)')


    Sql server full text search sql server 2008

    FREETEXTTABLE

    • La fonctionprécédenteretournetoutes les réponses qui correspondent à ;la requête. Cependant, cesréponses ne sont pas classé en ordre de pertinence.

    • Retourne un ensemble de lignes qui contientunecolonne rank

      • Valeur entre 1 et 1000

      • Valeur relative, critère de pertinence par rapport à la requête.

        Requête:

    • SELECT Artiste.*, KEYTBL.[KEY], KEYTBL.RANK FROM Artiste

    • INNER JOIN FREETEXTTABLE(Artiste, *, 'FORMSOF(INFLECTIONAL, francais)')

    • AS KEYTBL ON No = KEYTBL.[KEY]


    Sql server full text search sql server 2008

    CONTAINS

    • Identique à freetextmaislaissel’utilisateurpréciser la recherchefloue

      SELECT ProductDescriptionID, Description FROM Production.ProductDescription

      WHERE CONTAINS(Description, 'bike');

      SELECT ProductDescriptionID, Description FROM Production.ProductDescription

      WHERE CONTAINS(Description, ‘”bike*”'):

      INFLECTIONAL Considère les dérivations“ride“  “riding", “riden", ..

      THESAURUSConsidère les Synonyms"metal“  "gold", "aluminium"," steel", ..


    Sql server full text search sql server 2008

    SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, ‘FORMSOF(INFLECTIONAL, ride) ');

    Proximité des mots NEAR ( ~ ) Combienprèssont les mots dans le texte

    SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, 'mountain NEAR bike');

    SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, 'mountain ~ bike');


    Sql server full text search sql server 2008

    CONTAINTABLE

    CONTAINSTABLE (table , { column_name | (column_list ) | * } ,' < contains_search_condition > ' [ , LANGUAGE language_term]   [ ,top_n_by_rank ] ) < contains_search_condition > ::=     { < simple_term >     | < prefix_term >     | < generation_term >     | < proximity_term >     | < weighted_term >     }     | { ( < contains_search_condition > )     { { AND | & } | { AND NOT | &! } | { OR | | } } < contains_search_condition > [ ...n ]     } < simple_term > ::=           word | "phrase " < prefix term > ::=      { "word *" | "phrase *" } < generation_term > ::=      FORMSOF ( { INFLECTIONAL | THESAURUS } , < simple_term > [ ,...n ] ) < proximity_term > ::=      { < simple_term > | < prefix_term > }      { { NEAR | ~ } { < simple_term > | < prefix_term > } } [ ...n ] < weighted_term > ::=      ISABOUT         ( { {   < simple_term >   | < prefix_term >   | < generation_term >   | < proximity_term >   }    [ WEIGHT (weight_value ) ]    } [ ,...n ]        )


    Sql server full text search sql server 2008

    • La recherche Full-text est plus puissanteque LIKE

      • Plus spécifique, des résultats plus pertinents

      • Meileure performance

      • Amèneune classification des résultats

    • Utilisation commune

      • Recherchedans les informations de la base de données

      • Recherchedans les fichierscontenusdans la base de données

      • Filtre les mots vide de sens

        • e.g. with exact phrases - "to be or not to be" (however needs care!)


    Sql server full text search sql server 2008

    • Integrated backup, restore and recovery

    • Faster queries and index building

    • Data definition language (DDL) statements for creating and altering indexes

      • System stored procedures deprecated

    • Noise Insensitivity – noise words no longer break the query

    • Accent Insensitivity (optional) – e.g. café and cafe are the same

    • Multiple columns can be included in full-text searches

    • Pre-computed ranking optimizations when using FREETEXTTABLE

    • Improved ranking algorithm

    • Catalogs can be set to populate continuously track changes, or index when the CPU is idle


    Generation terms

    Generation terms

    • Inflectional

      • FORMSOF(INFLECTIONAL, "expression")

      • "drive“  "drove", "driven", .. (share the same stem)

      • When vague words such as "best" are used, doesn't match the exact word, only "good"

    • Thesaurus

      • FORMSOF(THESAURUS, "expression")

      • "metal“  "gold", "aluminium"," steel", ..

    • Both return variants of the specified word, but variants are determined differently


    Thesaurus

    Thesaurus

    • Supposed to match synonyms of search terms

    • SQL_Server_install_path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\FTNoiseThesaurusBak

      http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506231


    Proximity terms

    Proximity terms

    • SyntaxCONTAINS(Column, 'local NEAR winery')CONTAINS(Column, ' "local" NEAR "winery" ')

    • Important for ranking

    • Both words must be in column like AND

    • Terms on either side of NEAR must be either simple or proximity terms


    Weighted terms

    Weighted terms

    • Each word can be given a rank

    • Can be combined with simple, prefix, generation and proximity terms

    • e.g.

      • CONTAINS(Column, 'ISABOUT(performance weight(.8),comfortable weight(.4))')

      • CONTAINS(Column, 'ISABOUT(FORMSOF(INFLECTIONAL, "performance") weight (.8),FORMSOF(INFLECTIONAL, "comfortable") weight (.4))')


    Des questions

    Des questions?


    Sql server full text search sql server 2008

    • Best practices


    Sql server full text search sql server 2008

    Administering Full-Text Search

    Guidelines in administering full-text indexes (Continued)

    • Always select the smallest unique index available for your full-text unique key. (A 4-byte, integer-based index is optimal.) This reduces the resources required by Microsoft Search service in the file system significantly. If the primary key is large (over 100 bytes), consider choosing another unique index in the table (or creating another unique index) as the full-text unique key. Otherwise, if the full-text unique key size exceeds the maximum size allowed (900 bytes), full-text population will not be able to proceed.

    • If you are indexing a table that has millions of rows, assign the table to its own full-text catalog.

    • Consider the amount of change occurring in the tables being full-text indexed, as well as the number of table rows. If the total number of rows being changed, together with the numbers of rows in the table present during the last full-text population, represents millions of rows, assign the table to its own full-text catalog.


    Sql server full text search sql server 2008

    Administering Full-Text Search

    Guidelines in administering full-text indexes (Continued)

    • A full-text index is a special type of token-based functional index that is built and maintained by theMicrosoft Full-Text Engine for SQL Server (MSFTESQL) service. The process of building a full-text index is quite different from building other types of indexes. Instead of constructing a B-tree structure based on a value stored in a particular row, MSFTESQL builds an inverted, stacked, compressed index structure based on individual tokens from the text being indexed.

    • In SQL Server 2005, the size of a full-text index is limited only by the available memory resources of the computer on which the instance of SQL Server is running.

    • You can build full-text indexes on textual data stored in char, nchar, varchar, nvarchar, varchar(max), text, ntext, image, varbinary, and varbinary(max), and xml columns. However, the image, varbinary, and varbinary(max) columns require special handling if you want to use them for full-text processing.

    • You use protocol handlers and filters when you want to create a full-text index on a varbinary, varbinary(max), or image column. These services let you extract text from Word, Excel, and PowerPoint files as well as PDF and other files that are stored in a native format inside SQL Server. For the filters to work, you need to add a column to the table to contain a value that indicates the type of document stored in the column. The filter then loads up the binary stream stored in the column, strips all formatting information, and returns the text within the document to the word-breaker routine.


    Administering full text search

    Administering Full-Text Search

    Guidelines in administering full-text indexes (Continued)

    • After the word breaker-routine has a list of valid words for a row within a column, the full-text engine calculates tokens to represent the words.

    • A token is simply a compressed form of the original word that saves space and ensures that full-text indexes can be created in as compact a form as possible.

    • The full text-text functionality then builds all tokens in a column into inverted, stacked, compressed structure within a file that is used for search operations. This unique structure allows ranking and scoring algorithms to efficiently satisfy possible queries.


    Question

    Question?

    • Est-ce que vous avez des questions?


  • Login