CQL – a Common Query Language
Download
1 / 37

CQL – a Common Query Language - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

CQL – a Common Query Language. What CQL is Motivation Examples and explanation Applications Implementation. CQL – a Common Query Language. Mike Taylor <mike@indexdata.com>. Chapter 1: What CQL is. CQL is a query language: – For humans to type – For query forms to generate

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CQL – a Common Query Language' - sibyl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CQL – a Common Query Language

What CQL is

Motivation

Examples and explanation

Applications

Implementation

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Chapter 1: What CQL is

  • CQL is a query language:

    – For humans to type

    – For query forms to generate

    – For translating other languages into

  • The only query language of SRW/SRU

  • Also applicable in other contexts:

    – Z39.50 (instead of the Type-1 Query)

    – Query boxes for web searches

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Chapter 2: Motivation

Most query languages fall into one of two camps:

  • Complex and powerful, but cryptic and hard to learn

    – SQL, Prefix Query Format (PQF), XML Query

  • Easy to learn and use, but lacking in power

    – Google, AltaVista, CCL

    CQL aims to “make simple queries easy, and complex

    queries possible” (to paraphrase Larry Wall, of Perl)

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Learning curves for query languages

SQL

Effort in learning query language

Power of query that can be expressed

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Learning curves for query languages

SQL

Effort in learning query language

Google

Power of query that can be expressed

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Learning curves for query languages

SQL

CQL

Effort in learning query language

Google

Power of query that can be expressed

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Important concepts

  • Simple terms

  • Quoting

  • Booleans

  • Parentheses

  • Pattern matching

  • Word anchoring

  • Indexes

  • Prefixes

  • Context sets

  • Relations

Esoteric concepts

  • Proximity

  • Relation modifiers

  • Boolean modifiers

  • Prefix mapping

Chapter 3: Examples and explanation

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: simple terms

Here are some perfectly good CQL queries:

  • fish

  • Churchill

  • dinosaur

  • comp.sources.misc

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: quoting

Double-quote marks remove the special meanings of

special characters like space (which otherwise separates

tokens) and of keywords such as “and” and “or”.

  • "dinosaur"

  • "the complete dinosaur"

  • "ext–>u.generic"

  • "and"

  • "the \"nuxi\" problem"

    (Backslash removes the special meaning of following

    double-quote characters.)

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: booleans

The keywords “and” and “or” are boolean operators.

The keyword “not” is an and-not binary operator.

There is no unary negation operator. Case is not

significant, so “AND” and “aNd” also work.

  • dinosaur or bird

  • dinosaur not reptile

  • dinosaur and bird and reptile

  • dinosaur and bird or dinobird

  • dinosaur not theropod not ornithischian

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: boolean precedence

The “and”, “or” and “not” booleans all have equal

precedence and are evaluated left-to-right.

  • dinosaur and bird or dinobird

    MEANS

    (dinosaur and bird) or dinobird

  • dinosaur or bird and dinobird

    MEANS

    (dinosaur or bird) and dinobird

    NOT

    dinosaur or (bird and dinobird)

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: parentheses

Parentheses may be used to override the default

left-to-right parsing of boolean operators.

  • dinosaur and (bird or dinobird)

  • dinosaur or (bird and dinobird)

  • (bird or dinosaur) and (feathers or scales)

  • "feathered dinosaur" and (yixian or jehol)

  • (((a and b) or (c not d) not (e or f and g)) and h not i) or j

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: pattern matching

There are two pattern-matching characters:

* matches any number of characters

? matches any single character

A preceding backslash removes their special meaning.

  • dinosaur* – matches “dinosaurs”, “dinosauria”

  • *sauria – matches “dinosauria”, “carnosauria”

  • man?raptor – matches “maniraptor”, “manuraptor”

  • man?raptor* – matches the plurals of these

  • "the comp*saur" – matches “the complete dinosaur”

  • char\* – matches literal “char*”

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: word anchoring

A word beginning with “^” must occur at the start of its

field. A word ending with “^” must occur at the end of

its field.

  • dinosaur – matches “the complete dinosaur”

  • dinosaur^ – also matches

  • ^dinosaur – does not match

  • the – matches “the complete dinosaur”

  • ^the – also matches

  • the^ – does not match

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: indexes

A term of the form name=value is a query for the specified

value occurring within the named index.

  • title=Churchill – finds biographies of Churchill

  • author=Churchill – finds books written by him

  • title=dinosaur and author=farlow

  • title=(dinosaur and bird)

  • subject=(dinosaur* or pterosaur*)

    Index names are case-insensitive, so “title” is the same

    index as “TITLE”, “Title” or “tiTLe”.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: prefixes

The meaning of an index can be specified more fully

by a prefix indicating what context set it is from. The

meaning of “title” is different in cross-domain searching

(Dublin Core), bibliographic searching (Bath Profile)

and heraldry.

  • dc.title="the complete dinosaur"

  • property.title=freehold

  • heraldry.title=(viscount or duke)

  • cql.serverChoice=fruit

  • cql.resultSet=YXJjaGJpc2hvcAp

    Prefixes are case-insensitive.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: context sets

A context set is a set of indexes that are related to a

particular area (plus some other more esoteric stuff that

you can ignore).

For example, the Dublin Core context set contains

indexes for searching against the fifteen DC elements:

title, creator, subject, description, publisher,

contributor, date, type, format, identifier,

source, language, relation, coverage, rights.

The context set prose must define their semantics.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: some context sets

A few core sets created by the SRW editorial board:

  • CQL – for core indexes such as resultSet

  • DC – for metadata searching with Dublin Core

  • Rec – metadata about the record, not the resource

  • Net – network concepts such as hostname and port

    Also, many application-specific sets:

  • Bath, Zthes, CCG, Music

  • Rel – deep voodoo for relevance matching

  • GILS is in development

    Where do context sets come from?

  • You can just make them up! No-one can stop you!

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


A digression on the CQL context set

The CQL context set is special. It contains some “magic”

indexes:

  • cql.anywhere – searches in all the indexes available

  • cql.serverChoice – allows the server to choose whatever

    index or indexes are suitable

  • cql.resultSetId – finds the records obtained in a previous

    search, e.g. for refinement by combining with other

    query terms.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: relations

Usually “=” connects an index with its relation, but all the

other obvious numeric relations are supported:

  • Height = 13

  • numberOfWheels <= 3

  • numberOfPlates = 18

  • lengthOfFemur > 2.4

  • BioMass >= 100

  • NumberOfToes <> 3 (inequality)

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: special relations

The keywords “any” and “all” can be used as relations,

indicating that any one of, or all of, the words specified

in the term must be found in the index:

  • author all "kernighan ritchie"

    – shorthand for

    author=kernighan and author=ritchie

  • author any "kernighan ritchie thompson"

    – shorthand for

    author=kernighan or author=ritchie or

    author=thompson

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL features: esoterica

“You are not expected to understand this.”

– comment in the Unix Version 7 source code.

The point is that new users are not required to understand

this, and may happily use CQL for many years – perhaps

forever – without needing to.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: proximity

The “prox” boolean, by default, requires its operands

to be next to each other, in either order:

  • cervical prox vertebra

    – equivalent to

    "cervical vertebra" or "vertebra cervical"

  • (cervical or dorsal) prox vertebra

    – equivalent to

    "cervical vertebra" or "dorsal vertebra" or

    "vertebra cervical" or "vertebra dorsal"

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: proximity II

Modifiers can generalise the semantics of proximity:

  • cervical prox/distance<=5/ vertebrae

    – within five words of each other

  • cervical prox/distance=0/unit=sentence vertebrae

    – within the same sentence

  • cervical prox/distance>0/unit=paragraph vertebrae

    – in different paragraphs

  • cervical prox/ordered vertebrae

    – in the specified order: exactly equivalent to

    "cervical vertebra"

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: relation modifiers

Modifiers can refine the semantics of relations:

  • title =/stem dig

    – finds “dig”, “digging”, “dug”, etc.

  • title any/relevant "dinosaur bird reptile"

    – finds “sauropods”, “avian”, “crocodile”, “snake”, etc.

  • author =/fuzzy tailor

    – finds “Mike Taylor”

  • phoneNumber exact/fuzzy "020 8348 6768"

    – finds “020 8348 6769”

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: relation modifiers II

Relation modifiers can be overloaded to specify extra

information about the term that the relation joins to the

index:

  • createdDate >/isoDate "2004-03-12 09:45:00"

    – the term is in ISO 8601 format.

  • Location within/geom.polygon "(12,46) (15,52)"

    – the term indicates a polygon of two points (i.e. a

    straight line) rather than the corners of a rectangle.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: boolean modifiers

Modifiers can refine the semantics of boolean operators.

We've already seen some examples of this in proximity.

  • cervical prox/distance<=5/ vertebrae

    – within five words of each other

  • cervical or/exclusive vertebrae

    – one or the other, but not both.

  • "denenberg or/rel.mean "information retrieval"

  • "denenberg or/rel.sum "information retrieval"

  • "denenberg or/rel.max "information retrieval"

    – average, total or maximum relevance of operands

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: prefix mapping

So far, we have been free and easy with index prefixes

such as “dc”. But how do we know what they mean?

Why should “dc” mean Dublin Core rather than Deep

Custard?

  • dc.custardDepth <= 20

    Why should “bath” mean the Bath Profile for bibliographic

    searching instead of plumbing supplies?

  • bath.capacityInGallons > 45

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: prefix mapping II

Prefixes are just convenient, easy-to-type abbreviations.

The real identifier of a context set is its URI.

For example, the Dublin Core context set is

info:srw/cql-context-set/1/dc-v1.1

but we map that URI to a prefix for convenience.

This is exactly like XML namespaces: they are identified

by URIs, but the URIs do not appear in the names of

elements or attributes: short prefixes are used instead.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: prefix mapping III

In XML, a prefix is associated with a namespace using:

  • <element xmlns:prefix="http://example.org/xyz/">

    In CQL, a prefix is associated with a namespace using:

  • >prefix=http://example.org/xyz/

    and the rest of the query follows.

    The following queries are exactly equivalent:

  • >dc=info:srw/cql-context-set/1/dc-v1.1 dc.title=fish

  • >yx=info:srw/cql-context-set/1/dc-v1.1 yx.title=fish

    Most applications will have established default mappings.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: prefix mapping IV

It is possible to establish the context set from which

indexes with no explicit prefix are taken by omitting the

“prefix=” part from the mapping:

  • >http://example.org/heraldry/

    title=baron and side=sinister

    So the following queries are exactly equivalent:

  • >info:srw/cql-context-set/1/dc-v1.1 title=fish

  • >yx=info:srw/cql-context-set/1/dc-v1.1 yx.title=fish

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: prefix mapping V

Finally ... Finally! :-)

Prefix mappings can be stacked up:

  • >dc = info:srw/cql-context-set/1/dc-v1.1

    >bath=http://zing.z3950.org/cql/bath/2.0/

    >rec=info:srw/cql-context-set/2/rec-1.0

    rec.created < 2004-10-09 and

    dc.title=ecology and

    bath.conferenceName=dinosaur

    (Yes, this is all one query.)

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: prefix mapping VI

Don't try this at home.

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Chapter 4: Applications

CQL has been deployed in many kinds of application:

  • Google-like structureless searching

  • Simple metadata searching with the Dublin Core

  • Bath Profile for bibliographic data

  • Zthes profile for hierarchical thesaurus navigation

  • CCG for collectable card games

  • Music – musicalKey, arranger, duration, etc.

  • GILS (Global Information Locator Service)

  • ... your application goes here!

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Chapter 5: Implementations

There are good-quality free CQL implementations

in several important languages:

  • Java (Mike Taylor's CQL-Java package)

  • C/C++ (Adam Dickmeiss in Index Data's YAZ)

  • Python (Rob Sanderson in Cheshire)

  • Perl (Ed Summers' CQL::Parser module)

  • Visual Basic is in development (Thomas Habing)

  • ... your language goes here!

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


Conclusion: What to take home

  • CQL makes easy queries easy and hard ones possible

  • You can use it well without learning the hard bits

  • It is used in SRW/SRU but also applicable elsewhere

  • It is extensible through context sets

  • Existing context sets support lots of applications

  • There are free implementations in several languages

  • Tutorial on-line at:

    http://zing.z3950.org/cql/intro.html

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


CQL esoterica: relation modifiers II

Relation modifiers can be used to define essentially new

relations. Some hypothetical examples:

  • location </geom.within "(12,46) (15,52)"

    – points within the specified rectangle

  • task >/proj.prerequisite uiDesign

    – tasks that must be performed before the design

    of the user interface

  • location =/geography.sameState "Las Vegas"

    – places in the same state as Las Vegas

CQL – a Common Query Language

Mike Taylor <mike@indexdata.com>


ad