writing a perl xs swig interface to the clucene c text search engine n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Writing a Perl XS swig interface to the CLucene C++ text search engine PowerPoint Presentation
Download Presentation
Writing a Perl XS swig interface to the CLucene C++ text search engine

Loading in 2 Seconds...

play fullscreen
1 / 28

Writing a Perl XS swig interface to the CLucene C++ text search engine - PowerPoint PPT Presentation


  • 224 Views
  • Uploaded on

Writing a Perl XS swig interface to the CLucene C++ text search engine. Peter Edwards. Introduction . Peter Edwards ~ background Subject ~ writing a Perl XS swig interface to the CLucene C++ text search engine. Aims.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Writing a Perl XS swig interface to the CLucene C++ text search engine' - carissa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
writing a perl xs swig interface to the clucene c text search engine

Writing a Perl XS swig interface to the CLucene C++ text search engine

Peter Edwards

Perl XS and SWIG interface to CLucene C++ text search engine

introduction
Introduction
  • Peter Edwards ~ background
  • Subject ~ writing a Perl XS swig interface to the CLucene C++ text search engine

Perl XS and SWIG interface to CLucene C++ text search engine

slide3
Aims
  • Give an idea of the process involved in selecting and using an external library from Perl
  • Introduction to extending Perl using XS, swig, GNU autotools
  • Entertainment
  • Audience: What is your background and interest?

Perl XS and SWIG interface to CLucene C++ text search engine

topics
Topics
  • Understanding the Problem
  • The Answer (at a high level)
  • Technical Options
  • Investigating Options
  • Writing a perl / C++ Interface
  • Layers and Components
  • Lessons Learned

Process

Extending Perl

Perl XS and SWIG interface to CLucene C++ text search engine

terms
Terms
  • Perl ~ Pathologically Eclectic Rubbish Lister$_ = "wftedskaebjgdpjgidbsmnjgc";tr/a-z/oh, turtleneck Phrase Jar!/; print;
  • Perl XS ~ eXternal Subroutineallows a perl program to call a C language subroutineXS is also the “glue” language specifying the calling interfacecontains complex “perlguts” stuff that will destroy your sanity
  • SWIG ~ Simplified Wrapper and Interface Generatormakes it easy to call a C/C++ library from many languages (perl, python, ruby, PHP…)
  • C++ ~ Object Oriented version of C programming language
  • text search ~ boolean searching of stemmed words, wildcards
  • CLucene ~ C++ text search engine based on Java Lucene

Perl XS and SWIG interface to CLucene C++ text search engine

understanding the problem
Understanding the Problem
  • Recruitment software written in Perl
  • 20,000+ candidate Word CVs/resumes
  • Boolean searching using words or partial words and wildcardse.g. (“BA” or “MA”) and “literature”
  • Combined with SQL searchinge.g. geographic area, skill profile codes, pay rate
  • Speed < 2 seconds
  • Old system used dtSearch proprietary s/w

Perl XS and SWIG interface to CLucene C++ text search engine

the answer at a high level
The Answer (at a high level)

Load

  • Convert candidate CVs from Word to text using wvWare (OpenOffice) converter
  • Index text against candidate no.

Search

  • Search text -> cand nos -> SQL temp table
  • Normal SQL search on other criteria

Perl XS and SWIG interface to CLucene C++ text search engine

technical options at 2003 4
Technical Options (at 2003/4)

Proprietary

  • dtSearch ~ cost; hard to get cand nos out; Windows interface when perl app is Web

Open Source

  • Java Lucene ~ slow but good API and power
  • C++ CLucene ~ alpha quality rewrite of Lucene in Visual C++ as degree project by Ben van Klinken
  • Perl CPAN (PLucene etc.) belowhttp://search.cpan.org/modlist/String_Language_Text_Processing

Perl XS and SWIG interface to CLucene C++ text search engine

investigating perl options
Investigating Perl Options
  • Wrote test harness to load 1000 CVs then do some searches
  • Tried about 5 CPAN modules
  • PLucene search speed okay for small volumes but exponential increase in insert time>60 seconds per insert
  • Why? Tokenises doc, multi-lingual word stemming, adds doc id to reverse lookup index for each stem token
  • Other modules faster but search options weak

Need to look further

Perl XS and SWIG interface to CLucene C++ text search engine

investigating clucene
Investigating CLucene
  • Wrote similar C++ test harness
  • Speed good: search 20,000 CVs <1 secondload 3 CVs per sec (mostly Word->text)
  • Code written as VC++ degree project and registered at SourceForge
  • Jimmy Pritts changed layout and added GNU autoconf files configure.ac Makefile.in to let it build cross-platform on Windows, cygwin, Linux
  • Had C DLL interface used by PHP wrapper

Decided to write Perl wrapper

Perl XS and SWIG interface to CLucene C++ text search engine

interfacing perl to c
Interfacing Perl to C++
  • When I wrote this wrapper, Perl to C++ interfacing via XS or SWIG was tricky and despite the optimism expressed at http://www.johnkeiser.com/perl-xs-c++.html I had difficulties mapping the CLucene API to XS
  • Reasons: C++ namespace mangling; object and method mapping; C++ memory garbage collection
  • So I decided to go via the C DLL wrapper to hide this complexity

Perl XS and SWIG interface to CLucene C++ text search engine

perl xs
Perl XS
  • Always start with h2xs utility
  • Code is C with macro extensions
  • Write C code (XSUBs)
  • Call internal Perl routines (perlguts) to create variables, allocate arrays…newSViv(IV), sv_setiv(SV*, IV) ~ scalar integer variable
  • Complicated
  • Nyarlathotep / “Crawling Chaos”

Perl XS and SWIG interface to CLucene C++ text search engine

enter swig
Enter SWIG
  • Creates XS for you from a .i definition file
  • Parses C/C++ .h header files to get types and function prototypes
  • Allows for inline C/XS code

Perl XS and SWIG interface to CLucene C++ text search engine

swig xs sample
Swig XS Sample

From argv.i

// Creates a new Perl array and places a NULL-terminated char ** into it

%typemap(out) char ** {

AV *myav;

SV **svs;

int i = 0,len = 0;

/* Figure out how many elements we have */

while ($1[len])

len++;

svs = (SV **) malloc(len*sizeof(SV *));

for (i = 0; i < len ; i++) {

svs[i] = sv_newmortal();

sv_setpv((SV*)svs[i],$1[i]);

};

myav = av_make(len,svs);

free(svs);

$result = newRV((SV*)myav);

sv_2mortal($result);

argvi++;

}

Perl XS and SWIG interface to CLucene C++ text search engine

diagram of layers
Diagram of Layers

Perl OO Wrapper

CLucene.pm

Low Level Perl

CLuceneWrap.pm

SWIG

generated

SWIG XS C Code

clucene_wrap.c

C DLL Interface

clucene_dll.o

CLucene C++ Library

clucene.so

Perl XS and SWIG interface to CLucene C++ text search engine

clucene c interface
CLucene C++ Interface

src/CLucene/search/SearchHeader.h:

#include "CLucene/StdHeader.h"

#ifndef _lucene_search_SearchHeader_

#define _lucene_search_SearchHeader_

#include "CLucene/index/IndexReader.h“

using namespace lucene::index;

namespace lucene{ namespace search{

//predefine classes

class Searcher;

class Query;

class Hits;

class HitDoc {

public:

float_t score;

int_t id;

lucene::document::Document* doc;

HitDoc* next; // in doubly-linked cache

HitDoc* prev; // in doubly-linked cache

HitDoc(const float_t s, const int_t i);

~HitDoc();

};

Perl XS and SWIG interface to CLucene C++ text search engine

clucene c dll interface
CLucene C DLL Interface

src/wrappers/dll/clucene_dll.h:

#ifndef _DLL_CLUCENE

#define _DLL_CLUCENE

#include "CLucene/CLConfig.h"

#ifdef _UNICODE

//unicode methods

# define CL_UNLOCK CL_U_Unlock

# define CL_OPEN CL_U_Open

# define CL_DOCUMENT_INFO CL_U_Document_Info

# define CL_ADD_FILE CL_U_Add_File

CLUCENEDLL_API int CL_U_Unlock(const wchar_t* dir);

CLUCENEDLL_API int CL_U_Delete(const int resource, const wchar_t* query,

const wchar_t* field);

CLUCENEDLL_API int CL_U_Add_Field(const int resource, const wchar_t* fie

ld, const wchar_t* value, const int value_length, const int store, const int ind

ex, const int token);

Perl XS and SWIG interface to CLucene C++ text search engine

swig definition file clucene i
SWIG Definition File clucene.i

%module "FulltextSearch::CLuceneWrap"

%{

#include "clucene_dllp.h"

%}

// our definitions for CLucene variables and functions

%include "clucene_perl.h"

//%include "clucene_dll.h" // could use this but then would need to call CL_N_Se

arch not CL_SEARCH etc.

%include typemaps.i

%include argv.i

// helper functions where pointers to result buffers are expected

// would be better done with a %typemap(out) if I knew enough about perlguts

%inline %{

int val_len;

char * val;

int CL_GetField1(int resource, char * field)

{

return CL_GETFIELD(resource,field,&val,&val_len);

}

}

Perl XS and SWIG interface to CLucene C++ text search engine

swig generated xs clucenewrap pm
SWIG-Generated XS CLuceneWrap.pm

# This file was automatically generated by SWIG

package FulltextSearch::CLuceneWrap;

require Exporter;

require DynaLoader;

@ISA = qw(Exporter DynaLoader);

package FulltextSearch::CLuceneWrapc;

bootstrap FulltextSearch::CLuceneWrap;

package FulltextSearch::CLuceneWrap;

@EXPORT = qw( );

# ---------- BASE METHODS -------------

package FulltextSearch::CLuceneWrap;

sub TIEHASH {

my ($classname,$obj) = @_;

return bless $obj, $classname;

}

sub CLEAR { }

# ------- FUNCTION WRAPPERS --------

package FulltextSearch::CLuceneWrap;

*CL_OPEN = *FulltextSearch::CLuceneWrapc::CL_OPEN;

*CL_CLOSE = *FulltextSearch::CLuceneWrapc::CL_CLOSE;

# ------- VARIABLE STUBS --------

package FulltextSearch::CLuceneWrap;

*clucene_perl = *FulltextSearch::CLuceneWrapc::clucene_perl;

*NULL = *FulltextSearch::CLuceneWrapc::NULL;

*val_len = *FulltextSearch::CLuceneWrapc::val_len;

*val = *FulltextSearch::CLuceneWrapc::val;

*errstr = *FulltextSearch::CLuceneWrapc::errstr;

Perl XS and SWIG interface to CLucene C++ text search engine

swig generated xs clucene wrap c
SWIG-Generated XS clucene_wrap.c

#ifdef __cplusplus

extern "C" {

#endif

XS(_wrap_CL_OPEN) {

{

char *arg1 ;

int arg2 = (int) 1 ;

int result;

int argvi = 0;

dXSARGS;

if ((items < 1) || (items > 2)) {

SWIG_croak("Usage: CL_OPEN(path,create);");

}

if (!SvOK((SV*) ST(0))) arg1 = 0;

else arg1 = (char *) SvPV(ST(0), PL_na);

if (items > 1) {

arg2 = (int) SvIV(ST(1));

}

result = (int)CL_OPEN(arg1,arg2);

ST(argvi) = sv_newmortal();

sv_setiv(ST(argvi++), (IV) result);

XSRETURN(argvi);

fail:

;

}

croak(Nullch);

}

Perl XS and SWIG interface to CLucene C++ text search engine

clucene pm perl oo wrapper
CLucene.pm Perl OO Wrapper
  • Back into the realms of sanity
  • Normal OO package with methods
  • Calls XS wrapper functions

sub open

{

my $this = shift;

my %arg = @_;

my $path = $arg{path} || $this->{path} || confess "path undefined";

my $create = anyof ( $arg{create}, $this->{create}, 0 );

$this->{resource} = FulltextSearch::CLuceneWrap::CL_OPEN ( $path, $creat

e )

or confess "Failed to CL_OPEN $this->{path} create $create errst

r ".$this->errstrglobal();

$this->{path} = $path;

$this;

}

Perl XS and SWIG interface to CLucene C++ text search engine

build environment
Build Environment
  • Uses GNU autotools and m4 macro processor

Definition files

  • configure.ac ~ top level build definitions
  • Makefile.am ~ makefile flags definitions

Programs

  • libtool ~ generalised library building
  • aclocal ~ builds aclocal.m4 from configure.ac
  • autoconf ~ reads configure.ac to create configure script
  • autoheader ~ creates C header defines for configure
  • automake ~ creates Makefile.in from Makefile.am
  • autoreconf ~ manually remake whole tree of GNU build files

Perl XS and SWIG interface to CLucene C++ text search engine

bootstrap shell script
Bootstrap shell script

#!/bin/sh

# Bootstrap the CLucene installation.

mkdir -p ./build/gcc/config

set -x

libtoolize --force --copy --ltdl --automake

aclocal

autoconf

autoheader

automake -a --copy --foreign

Perl XS and SWIG interface to CLucene C++ text search engine

autoconf configure ac file
Autoconf configure.ac file

dnl Process this file with autoconf to produce a configure script.

dnl Written by Jimmy Pritts.

dnl initialize autoconf and automake

AC_INIT([clucene], [1])

AC_PREREQ([2.54])

AC_CONFIG_SRCDIR([src/CLucene.h])

AC_CONFIG_AUX_DIR([./build/gcc/config])

AC_CONFIG_HEADERS([config.h])

AM_INIT_AUTOMAKE

dnl Check for existence of a C and C++ compilers.

AC_PROG_CC

AC_PROG_CXX

dnl Check for headers

AC_HEADER_DIRENT

dnl Configure libtool.

AC_PROG_LIBTOOL

dnl option to use UTF-8 as internal 8-bit charset to support characters in Unicodeâ

¢

AC_ARG_ENABLE(utf8,

AC_HELP_STRING([--enable-utf8],[UTF-8 as internal 8-bit charset to support characters in Unicodeâ

¢ (default=no)]),

[AC_DEFINE([UTF8],[],[use UTF-8 as internal 8-bit charset to support characters in Unicodeâ

¢])],enable_utf8=no)

AM_CONDITIONAL(USEUTF8, test x$enable_utf8 = xyes)

AC_CONFIG_FILES([Makefile src/Makefile examples/Makefile examples/demo/Makefile examples/tests/Makefile examples/util/Makefile wrappers/Makefile wrappers/dll/Makefile wrappers/dll/dlltest/Makefile])

AC_OUTPUT

Perl XS and SWIG interface to CLucene C++ text search engine

makefile am files
Makefile.am files

src/Makefile.am:

AUTOMAKE_OPTIONS = 1.6

include_HEADERS = CLucene.h

lsrcdir = $(top_srcdir)/src/CLucene

lib_LTLIBRARIES = libclucene.la

libclucene_la_SOURCES =

include CLucene/analysis/Makefile.am

include CLucene/analysis/standard/Makefile.am

include CLucene/debug/Makefile.am

include CLucene/document/Makefile.am

include CLucene/index/Makefile.am

include CLucene/queryParser/Makefile.am

include CLucene/search/Makefile.am

include CLucene/store/Makefile.am

include CLucene/util/Makefile.am

include CLucene/Makefile.am

./Makefile.am:

## Makefile.am -- Process this file with automake to produce Makefile.in

INCLUDES = -I$(top_srcdir)

SUBDIRS = src wrappers examples .

src/CLucene/document/Makefile.am:

documentdir = $(lsrcdir)/document

dochdir = $(includedir)/CLucene/document

libclucene_la_SOURCES += $(documentdir)/DateField.cpp

libclucene_la_SOURCES += $(documentdir)/Document.cpp

libclucene_la_SOURCES += $(documentdir)/Field.cpp

doch_HEADERS = $(documentdir)/*.h

Perl XS and SWIG interface to CLucene C++ text search engine

recap
Recap
  • We saw how and why I selected an external Perl library
  • We looked at GNU autotools to provide a cross-platform build environment
  • We investigated the layers of code needed to interface perl to a C++ library ~ SWIG, C, XS inline helpers, low and high level Perl modules

Perl XS and SWIG interface to CLucene C++ text search engine

lessons learned
Lessons Learned
  • Start off a new external library using GNU autotools and keeping in mind that the API should be easy to use through SWIG
  • Use SWIG not XS to wrap a C/C++ library
  • Always use h2xs to start a Perl extension
  • Open Source feedback and testing are more valuable than you expect (2 emails this week alone)

Perl XS and SWIG interface to CLucene C++ text search engine

where to get more information
Where to Get More Information
  • Perl XS http://en.wikipedia.org/wiki/XS_%28Perl%29http://www.perl.com/doc/manual/html/pod/perlguts.html
  • C++ / XS http://www.johnkeiser.com/perl-xs-c++.html
  • SWIG http://en.wikipedia.org/wiki/SWIGhttp://www.swig.org/
  • Lucene http://en.wikipedia.org/wiki/Lucene
  • CLucene http://sourceforge.net/projects/clucene/
  • Autoconfhttp://www.gnu.org/software/autoconf/
  • Book “Extending and Embedding Perl”, Jenness & Couzens (Manning, 2002)
  • Any Questions
  • These slides are at http://perl.dragonstaff.com/

Perl XS and SWIG interface to CLucene C++ text search engine