Sql server full text search sql server 2008
Sponsored Links
This presentation is the property of its rightful owner.
1 / 51

SQL Server Full-Text Search SQL Server 2008 PowerPoint PPT Presentation


  • 162 Views
  • Uploaded on
  • Presentation posted in: General

SQL Server Full-Text Search SQL Server 2008. Plan. Qu’est-ce que la recherche Full-Text La méthode SQL server 2008 Comment faire Requête. Introduction.

Download Presentation

SQL Server Full-Text Search SQL Server 2008

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


SQL Server Full-Text SearchSQL Server 2008


Plan

  • Qu’est-ceque la recherche Full-Text

  • La méthode SQL server 2008

  • Comment faire

  • Requête


Introduction

  • L'indexation plain texte, aussi appelée indexation de texte intégrale, propose de fournir un service de recherche rapide d'informations déstructurées basée sur des mots, parties de mots, expressions, formes fléchies, synonymes, etc. contenu dans une ligne d'une table.

  • Un service(Full textsearch) permet d’indexer les éléments contenue dans notre base de données dans le but de permettre une recherche efficace.


Type de recherche

  • La technologie full-text permet de créer des indexes sur base de mots non parasites et d'utiliser ces indexes pour des recherches (avec support linguistique) ainsi que des recherches de proximité. 

  • Les mots parasites sont les mots de liaisons, les pronoms, … tels que " je ", " de ", " car " et bien d'autres. Ceci permet d'éviter d'avoir comme résultat des données faussées par ces mots. Ainsi, si l'on effectue une recherche avec comme critère la phrase " je veux une classe mfc qui permet de faire du gdi ", seuls les mots " classe ", " mfc " et " gdi " seront pris en compte. 


Type de recherche

  • Le support linguistique permet de définir, en interne, toute une série de mots qui dérivent de ceux qui forment les critères. Ainsi, pour le mot " nager ", le système effectuera la recherche avec les mots " nager ", " nage ", " nagé(e) ".

  • La recherche de proximité permet de spécifier qu'un mot doit se trouver près d'un autre. Dans le cas de Sql Server, il s'agit d'un " rayon " de 50 mots. Ce nombre est fixé et n'est pas paramétrable.


On auraitpu….

  • SELECT *FROM [Northwind].[dbo].[Employees]WHERE Notes LIKE '%grad%‘

  • Cependant les performances ne sont pas aussi bonne et la possibilité… n’enparlonsmême pas


Théorie


Full-Text Search Terminology 1/3

  • Full-text index

    • Information à propos des mots et leur emplacement dans les colonnes

    • Utilisédans les requêtes full text

  • Full-text catalog

    • Groupelogiqued’index (Container)

  • Word breaker

    • Identifie les mots pour lesquelsilexiste des dérivations(en fonction de la langue)


  • Full-Text Search Terminology 2/3

    • Token

      • Les mots identifiés par le word breaker

    • Stemmer

      • Génère des dérivation(inflectional) d’un mots (propre à la langue)

    • Filter

      • Extrait le texte d’un fichiersauvegarderdans un varbinary(max)ouuneimage

    • Population or Crawl

      • Créer et mettre à jour l’index full-text index.


    Full-Text Search Terminology 3/3

    • Stopwords/Stoplists/Noise Word

      • Mot sans importance

      • Par exemple: ‘le’, ‘la’, ‘les’, ‘à’ etc.

    • Accent insensitivity

      • café = cafe


    Fulltext search – sous le capot


    CommentAdministrer


    Administrer la recherche Full-Text

    Troistâche:

    • Créer/Modifier/Supprimer des catalogs full-text

    • Créer/Modifier/Supprimer des indexesfull-text

    • Céduler et maintenir les informationsdansl’index.


    Comment

    Créer un Full Text Catalog


    SQL 2008


    Comment Créer un Full Text Indexes


    Property of column


    Full-text Index property window


    CommentRemplirl’Index et le Catalog


    Remplirl’index Full-Text

    • Full

      • Créel’index

      • Utilise beaucoup de ressources

    • Incremental

      • Met à jour automatiquement l’index en mettant à jour les informations qui ont été mises à jour depuis la dernière mise à jour.


    Remplirl’index Full-Text

    • Update

      • Gardeune trace des modifications (inserts, updates, and deletes)

      • Propage les mises à jour selondeuxfaçons:

        • AUTO automatic

        • MANUAL basésurunecédulemanuel des changements


    CommentRechercher Full-Text

    Les mots clés:

    • FREETEXT

    • FREETEXTTABLE

    • CONTAINS

    • CONTAINSTABLE


    FREETEXT

    • Recherchefloue

      • Inflectional forms (Stemming)

      • Related words (Thesaurus)

    • SELECT *

    • FROM Artiste

    • WHERE Description like '%francais%';

    • SELECT *

    • FROM Artiste

    • WHERE FreeText (*, 'francais)')

    • SELECT *

    • FROM Artiste

    • WHERE Contains (*, 'FORMSOF(INFLECTIONAL, francais)')


    FREETEXTTABLE

    • La fonctionprécédenteretournetoutes les réponses qui correspondent à ;la requête. Cependant, cesréponses ne sont pas classé en ordre de pertinence.

    • Retourne un ensemble de lignes qui contientunecolonne rank

      • Valeur entre 1 et 1000

      • Valeur relative, critère de pertinence par rapport à la requête.

        Requête:

    • SELECT Artiste.*, KEYTBL.[KEY], KEYTBL.RANK FROM Artiste

    • INNER JOIN FREETEXTTABLE(Artiste, *, 'FORMSOF(INFLECTIONAL, francais)')

    • AS KEYTBL ON No = KEYTBL.[KEY]


    CONTAINS

    • Identique à freetextmaislaissel’utilisateurpréciser la recherchefloue

      SELECT ProductDescriptionID, Description FROM Production.ProductDescription

      WHERE CONTAINS(Description, 'bike');

      SELECT ProductDescriptionID, Description FROM Production.ProductDescription

      WHERE CONTAINS(Description, ‘”bike*”'):

      INFLECTIONAL Considère les dérivations“ride“  “riding", “riden", ..

      THESAURUSConsidère les Synonyms"metal“  "gold", "aluminium"," steel", ..


    SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, ‘FORMSOF(INFLECTIONAL, ride) ');

    Proximité des mots NEAR ( ~ ) Combienprèssont les mots dans le texte

    SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, 'mountain NEAR bike');

    SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, 'mountain ~ bike');


    CONTAINTABLE

    CONTAINSTABLE (table , { column_name | (column_list ) | * } ,' < contains_search_condition > ' [ , LANGUAGE language_term]   [ ,top_n_by_rank ] ) < contains_search_condition > ::=     { < simple_term >     | < prefix_term >     | < generation_term >     | < proximity_term >     | < weighted_term >     }     | { ( < contains_search_condition > )     { { AND | & } | { AND NOT | &! } | { OR | | } } < contains_search_condition > [ ...n ]     } < simple_term > ::=           word | "phrase " < prefix term > ::=      { "word *" | "phrase *" } < generation_term > ::=      FORMSOF ( { INFLECTIONAL | THESAURUS } , < simple_term > [ ,...n ] ) < proximity_term > ::=      { < simple_term > | < prefix_term > }      { { NEAR | ~ } { < simple_term > | < prefix_term > } } [ ...n ] < weighted_term > ::=      ISABOUT         ( { {   < simple_term >   | < prefix_term >   | < generation_term >   | < proximity_term >   }    [ WEIGHT (weight_value ) ]    } [ ,...n ]        )


    • La recherche Full-text est plus puissanteque LIKE

      • Plus spécifique, des résultats plus pertinents

      • Meileure performance

      • Amèneune classification des résultats

    • Utilisation commune

      • Recherchedans les informations de la base de données

      • Recherchedans les fichierscontenusdans la base de données

      • Filtre les mots vide de sens

        • e.g. with exact phrases - "to be or not to be" (however needs care!)


    • Integrated backup, restore and recovery

    • Faster queries and index building

    • Data definition language (DDL) statements for creating and altering indexes

      • System stored procedures deprecated

    • Noise Insensitivity – noise words no longer break the query

    • Accent Insensitivity (optional) – e.g. café and cafe are the same

    • Multiple columns can be included in full-text searches

    • Pre-computed ranking optimizations when using FREETEXTTABLE

    • Improved ranking algorithm

    • Catalogs can be set to populate continuously track changes, or index when the CPU is idle


    Generation terms

    • Inflectional

      • FORMSOF(INFLECTIONAL, "expression")

      • "drive“  "drove", "driven", .. (share the same stem)

      • When vague words such as "best" are used, doesn't match the exact word, only "good"

    • Thesaurus

      • FORMSOF(THESAURUS, "expression")

      • "metal“  "gold", "aluminium"," steel", ..

    • Both return variants of the specified word, but variants are determined differently


    Thesaurus

    • Supposed to match synonyms of search terms

    • SQL_Server_install_path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\FTNoiseThesaurusBak

      http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506231


    Proximity terms

    • SyntaxCONTAINS(Column, 'local NEAR winery')CONTAINS(Column, ' "local" NEAR "winery" ')

    • Important for ranking

    • Both words must be in column like AND

    • Terms on either side of NEAR must be either simple or proximity terms


    Weighted terms

    • Each word can be given a rank

    • Can be combined with simple, prefix, generation and proximity terms

    • e.g.

      • CONTAINS(Column, 'ISABOUT(performance weight(.8),comfortable weight(.4))')

      • CONTAINS(Column, 'ISABOUT(FORMSOF(INFLECTIONAL, "performance") weight (.8),FORMSOF(INFLECTIONAL, "comfortable") weight (.4))')


    Des questions?


    • Best practices


    Administering Full-Text Search

    Guidelines in administering full-text indexes (Continued)

    • Always select the smallest unique index available for your full-text unique key. (A 4-byte, integer-based index is optimal.) This reduces the resources required by Microsoft Search service in the file system significantly. If the primary key is large (over 100 bytes), consider choosing another unique index in the table (or creating another unique index) as the full-text unique key. Otherwise, if the full-text unique key size exceeds the maximum size allowed (900 bytes), full-text population will not be able to proceed.

    • If you are indexing a table that has millions of rows, assign the table to its own full-text catalog.

    • Consider the amount of change occurring in the tables being full-text indexed, as well as the number of table rows. If the total number of rows being changed, together with the numbers of rows in the table present during the last full-text population, represents millions of rows, assign the table to its own full-text catalog.


    Administering Full-Text Search

    Guidelines in administering full-text indexes (Continued)

    • A full-text index is a special type of token-based functional index that is built and maintained by theMicrosoft Full-Text Engine for SQL Server (MSFTESQL) service. The process of building a full-text index is quite different from building other types of indexes. Instead of constructing a B-tree structure based on a value stored in a particular row, MSFTESQL builds an inverted, stacked, compressed index structure based on individual tokens from the text being indexed.

    • In SQL Server 2005, the size of a full-text index is limited only by the available memory resources of the computer on which the instance of SQL Server is running.

    • You can build full-text indexes on textual data stored in char, nchar, varchar, nvarchar, varchar(max), text, ntext, image, varbinary, and varbinary(max), and xml columns. However, the image, varbinary, and varbinary(max) columns require special handling if you want to use them for full-text processing.

    • You use protocol handlers and filters when you want to create a full-text index on a varbinary, varbinary(max), or image column. These services let you extract text from Word, Excel, and PowerPoint files as well as PDF and other files that are stored in a native format inside SQL Server. For the filters to work, you need to add a column to the table to contain a value that indicates the type of document stored in the column. The filter then loads up the binary stream stored in the column, strips all formatting information, and returns the text within the document to the word-breaker routine.


    Administering Full-Text Search

    Guidelines in administering full-text indexes (Continued)

    • After the word breaker-routine has a list of valid words for a row within a column, the full-text engine calculates tokens to represent the words.

    • A token is simply a compressed form of the original word that saves space and ensures that full-text indexes can be created in as compact a form as possible.

    • The full text-text functionality then builds all tokens in a column into inverted, stacked, compressed structure within a file that is used for search operations. This unique structure allows ranking and scoring algorithms to efficiently satisfy possible queries.


    Question?

    • Est-ce que vous avez des questions?


  • Login