Extraction and Analysis of Document Examiner Features from Vector Skeletons of Grapheme ‘th’

Extraction and Analysis of Document Examiner Features from Vector Skeletons of Grapheme ‘th’ Vladimir Pervouchine and Graham Leedham Forensics and Security Lab School of Computer Engineering Nanyang Technological University Singapore

Outline • Background of the problem • Focus of our research • Choice of features to study • Extraction of features • Analysis of feature usefulness • Results and conclusions

Background • Forensic Document Examination to determine authorship, disguise or forgery of handwriting has been carried out for over 100 years using standarised techniques. • The scientific acceptability of Forensic Document Examination has (since 1993) been successfully challenged in courts – the techniques used have no proven scientific basis. • Srihari et al. have demonstrated that handwriting can be used to identify a person with a reasonable accuracy. • Recent experiments carried out by Kam et al. as well as Found & Rogers have shown that forensic document examiners perform better than lay people in writer identification and forgery detection. • Thus, the techniques may lead to the difference in performance when compared to lay people.

Our research focus • Formalise some of document examiner features so that they can be extracted reliably and unambiguously from handwriting. • Study some frequent characters/grapheme and extract as many document examiner features as possible. • Analyse the usefulness of the features for writer classification under conditions of genuine unconstrained handwriting. • Determine whether the studied features can help to distinguish writers, that is, whether use of these features in forensic document analysis makes sense.

Choice of document examiner features • Features extracted correspond to some of the “21 discriminating elements of handwriting.” • Only micro features are studied. • In our previous experiments, characters “d”, “y”, “f”, and grapheme “th” were used as the most frequent characters/grapheme with ascenders/descenders. • Main contribution to writer classification accuracy was due to features of “th”. In the current experiments, features of “th” were studied. • Use of vector skeletonisation allowed extraction of a number of micro features that could not be extracted previously.

Vector skeletonisation method Original image • 1st stage: vectorisation. Spline-approximated skeletal branches are formed • 2nd stage: minimum cost configuration of branch interconnections is found. Branches are grouped into strokes • For each retraced segment of stroke restoration of hidden loop is attempted • 3rd stage: Near-junction and loop spline knots are adjusted to make strokes smoother Vectorisation Binary encoding of junction points configuration GA optimisation to find configuration with lowest cost Adjustment of loop and near-junction knots

Height Width Height to width ratio Distance HC Distance TC Distance TH Angle between TH and TC Slant of stem of t Slant of stem of h Position of t-bar Connected/disconnected t and h Average stroke width Average pseudo-pressure Standard deviation of average pseudo-pressure List of features

List of features • Standard deviation of stroke width • Number of strokes • Number of loops and retraced branches • Straightness of t-stem • Straightness of t-bar • Straightness of h-stem • Presence of loop at top of t-stem • Presence of loop at top of h-stem • Maximum curvature of h-knee • Average curvature of h-knee • Relative size (diameter) of h-knee Position of t-bar feature is binary: 1 if t-bar crosses stem and 0 if touches or is separated or missing

Feature extraction Average stroke width: Slant of a stroke: Average pseudo-pressure: Standard deviation of pseudo-pressure: Straightness of a stroke:

Feature extraction Input: original image, binarised image, skeleton • Extraction software performed analysis of shape to detect various parts of character • Analysis was performed step by step • At each step some feature was extracted • If at least one feature was not extracted or extracted incorrectly, the sample was counted as “failure” Feature vector Height, width, height to width ratio Analysis of branches originating from top end points Stem features Search for t-bar …

Search for the best feature sets • Best feature subsets searched by wrapper method • Each feature set represented as a binary string • ‘1’ if the feature is included, ‘0’ otherwise • DistAl neural network used as a classifier Initial randomly generated strings GA with sharing Evolution of population of strings Next generation of strings (feature subsets) Evaluation of accuracy (string fitness) for each string by 5-fold cross validation Array of fitness values (accuracies) Best strings found

Features extracted from ‘th’, 1..25 Feature usefulness: results and conclusions • Samples from 165 different writers • Between 15 and 27 samples of ‘th’ per writer 17 indispensable features: 1,4..12,16..18,20,23..25 6 partially relevant features: 2,3,13,19,21,22 2 irrelevant features: 14,15

Feature usefulness: results and conclusions • A number of features, commonly used by forensic document examiners do possess discriminative power. Hence, we conclude that the methods of forensic handwriting analysis are at least partially justified. • Use of vector skeletonisation algorithm resulted in increase in accuracy of feature extraction and better writer classification accuracy as compared to our previous experiments on the same data set (58% for raster skeleton vs. 67% for vector skeleton).

Extraction and Analysis of Document Examiner Features from Vector Skeletons of Grapheme ‘th’

Extraction and Analysis of Document Examiner Features from Vector Skeletons of Grapheme ‘th’

Presentation Transcript

Overview

The Implementation Advance Planning Document

14 Vector Autoregressions, Unit Roots, and Cointegration

Edgerton Examiner Training

Extraction Site Ridge Preservation

VECTOR CALCULUS

Information Extraction

Information Network Analysis and Extraction Extraction and Integration of the Semi-Structured Web

Extraction Metallurgy

Recursive Bipartite Spectral Clustering for Document Categorization

VECTOR CALCULUS

Forensic Pathology

VECTOR CALCULUS

District 9WR Vessel Examiner Training

Introduction to audio signal processing

Outline

Feature Extraction for speech applications

Chapter 3 Vector Spaces

Modeling the Internet and the Web: Text Analysis