200 likes | 323 Views
This project focuses on the Scene of Crime Information System (SOCIS), aimed at enhancing the indexing and retrieval of crime scene photographs using Natural Language Processing (NLP). By employing automated mechanisms for indexing and retrieving images, SOCIS supports crime scene officers in documenting evidence efficiently. The system facilitates fast access to relevant case information, patterns among cases, and effective storage of photographs and documentation. Preliminary evaluations demonstrate promising accuracy and usability, making SOCIS a potentially transformative tool for crime investigation.
E N D
Automatic indexing and retrieval of crime-scene photographs Scene of Crime Information System (SOCIS) Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield
Outline • Application Scenario • Project Overview • SOCIS features • Text-based approaches • Using NLP: • The Indexing mechanism • The Retrieval mechanism • Preliminary system evaluation • Links Cambridge 2002
Crime Scene Documentation:Current Practices • Scene of Crime Officers: • attend crime scene • photograph the scene • collect evidence (package and label items) • write reports and create indexed photo-album(s) • case-files piled in storage rooms Cambridge 2002
Examples Cambridge 2002
IT support for CSI • Crime Investigation requires: • Fast and accurate retrieval of case-related info (and therefore efficient classification of this info) • Identification of “patterns” among cases • IT support for Crime Investigation: • Governmental agencies’ Systems (HOLMES) • Commercial Systems (LOCARD, SOCRATES) (Crime Management and Administration Systems) Needed: “Intelligent”support for Crime Investigation Cambridge 2002
2000 - 2003 Project Overview • Domain: Scene of Crime Investigation (SOC) • Scenario: Use of digital photography and speech to populate a central police database with case related information • Objective: Creation of a prototype system that allows for intelligent indexing and retrieval of crime photographs Cambridge 2002
SOCIS features • Access through the web (JSP application) • Storage of case documentation & meta-information in central database • Automatic indexing of photographs • Automatic retrieval of photographs • Automatic population of official forms Cambridge 2002
“view of deceased with computer cable removed” Cambridge 2002
Text-based image indexing & retrieval: approaches • Manual assignment of keywords • Automatic extraction of keywords (statistics +/ semantic expansion) [Smeaton’96, Sable’99, Rose’00] • Extraction of logical form representations (syntactic relations and concept classification) [Rowe’99] Precision and recall increase as indexing terms go beyond keywords capturing relational info Cambridge 2002
Text-based image indexing & retrieval: problems • keyword barrier • syntactic relations need to be complemented with semantic information • Consider: • “view to the loft” vs. “view into loft” • “position of baby with no bedding” • “position of baby with bedding removed” Cambridge 2002
Pipeline of processing resources: tokeniser sentence splitter POS tagger lemmatizer NE recognizer parser discourse interpreter (+ triple extraction layer) Indexing terms Query triples ARG1 REL ARG2 ARG1 REL ARG2 Indexing-Retrieval Mechanism captions matching OntoCrime + KB Free text query Cambridge 2002
Corpus and Domain Model • 1200 captions from 350 different crime cases dealt by South Yorkshire Police (text files) • 65 captions (transcribed speech experiment) Different lengths but same characteristics: Phrasal constructions, named entities, meta-info, what and where references Domain model = OntoCrime and knowledge base Role = selection restrictions for triples’ arguments and semantic expansion for retrieval Cambridge 2002
Triple Extraction • 17Relations : AND, AROUND, MADE-OF, OF, ON, WITHOUT etc. • Form of triples: ARG1 REL ARG2 • Restrictions and filters for arguments • Rules for captions with multiple relations • Inferences restricted to certain cases Cambridge 2002
Triple Extraction examples • “body on floor surrounded by blood” Body ON floor blood AROUND floor blood AROUND body • “shot of footprint on top of bar” • “photograph from behind bar of body on floor” • “bottle, gun and ashtray on table” • “footprint with zigzag and target on chair” Cambridge 2002
Class: Class: Retrieval Mechanism • Allow for free text query • Extract relational facts from the query • Match the query triples with the indexing triples of each captioned photograph • Allow for exact match of arguments or class info ARG1, RELATION, ARG2 • If no triples can be extracted, keyword matching takes place with semantic expansion if needed Cambridge 2002
Preliminary Evaluation • Indexing mechanism evaluation run on 600 captions indicated refinements on the rules (80% accuracy in extracting and inferring triples) • Preliminary usability evaluation with real users: Relational information considered to be an intuitive way for forming queries for image retrieval • Future work: overall evaluation of free text query for image retrieval Cambridge 2002
Conclusions • Could the SOCIS approach be ported to other domains ? • Thorough testing and experimentation needed • However, it is a corpus-driven approach: Not just an alternative image indexing/retrieval approach,but the one dictated by a real application For more information on SOCIS: http://www.dcs.shef.ac.uk/nlp/socis Cambridge 2002