200 likes | 328 Views
UGFIDD. Unstructured Geospatial File Indexer and Distributed Dissemination. 1. Present Scenario. Transported. User must know what to search on. Very slow. Search Criteria. Users need data in low com situations. Data. Bottle Neck. UGFIDD Overview.
E N D
UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1
Present Scenario Transported User must know what to search on Very slow Search Criteria Users need data in low com situations Data Bottle Neck
UGFIDD Overview • Provide a simple to use Web Service interface • This allows for customized clients • Free text “Google” like searches • Complete un-structed data – No need for a data model to follow • Communication is done over HTTP through SOAP (Simple Object Access Protocol ) messages • Currently supports PDFs, Microsoft Docs, JPEGs • Provide usable return types • RSS Feeds – Allow users to subscribe to standing queries • KML Results – Allow users to visually represent their data spatially • Plain Text – Give users their information fast and reliably • Bittorent – Allow users to distribute data quickly and distributed
Code Enviroment • Subversion • I have written a lot of code and have spent a lot of time, provides a piece of mind • All of the code written was done under version control. This is very important in today’s commercial atmosphere. • Allows for many developers to work at the same type • Assists with merge conficts • Allows easy reverts and diff’s to be done • Maven • New and upcoming build tool • Allows for easy integration and dependency management • Completely written in XML • Repositories allow for open source projects to be easily be pulled in to assist in program development
Pom File (Snippet) • <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> • <modelVersion>4.0.0</modelVersion> • <groupId>com.p2p</groupId> • <artifactId>Peer2peer</artifactId> • <packaging>war</packaging> • <version>1.0-SNAPSHOT</version> • <name>Peer2peer</name> • <url>http://maven.apache.org</url> • <properties> • <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> • <final.version>1.0</final.version> • <artifact.name>${artifactId}</artifact.name> • <java.version>1.6</java.version> • </properties> <dependency> • <groupId>junit</groupId> • <artifactId>junit</artifactId> • <version>3.8.1</version> • <scope>test</scope> • </dependency> • <dependency> • <groupId>jpath</groupId> • <artifactId>jpathwatch</artifactId> • <version>0.93</version> • </dependency> • <dependency>
High Level Architecture EndPoints SoapUI Daemon WebServiceEndpoints Startup & Shutdown Extractors + Publishers Interfaces Doc / Jpeg Parser Bittorent Publisher Rss Feed / Kml Feed File Monitor Jetty Core Services Utilities Query Indexing Solr
Ingest of a file Ingest Orchestration Uses Tikka document extractors to extract header information along with binary data. JPEG parser parses Geospatial data Metadata Extraction XML Metadata Ingest monitor is triggered off of system level events. File Monitor File System HTTP Schema has been customized to store location and other valuable data File Solr
Publish of results Query Orchestration Parse Query Web Service Endpoint Depending on the return type and call UGFIDD will query and return customized results HTTP User enters the query “Syracuse” Files Core Services Google Earth XML Metadata Publish HTTP RSS Solr Use query to search index Torrent
GeoHash (Example GeoHash.java) • GeoHash algorithm recently developed by Gustavo Niemeyer • Publicly released in 2008 • Very new way of representing geo-spatial data • UGFIDD takes advantage of the single hash produced by the algorithm • Found many implementations in other languages (Python), ported it over to Java for the UGFIDD project • Distance searches • Geohash produces bounding boxes by nature • This is a perfect fit for UGFIDD and it’s free search capability • Geospatial searches are now extremely fast and easy to implement • Do not need complicated point radius algorithms which slow processing down • WKT (Well Known Text) • A new spec to represent vector geometry on single lines • User can query using single strings and does not need to represent points as Lat, Lon
WKT • POINT(6 10) • LINESTRING(3 4,10 50,20 25) • POLYGON((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)) • MULTIPOINT((3.5 5.6), (4.8 10.5)) • MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4)) MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3))) • http://en.wikipedia.org/wiki/Well-known_text
Bittorrent • Allows for distributed downloading • Users download .torrent files which represent the tracker and information about the file or files • Many free available clients available to use • Bittorrent takes pressure off of the central server • Users only download the .torrent file • Communicate via the tracker (UGFIDD is using a open source tracker) • Users download from each other while there is a seed • UGFIDD will always be the initial seeder • Extremely fast downloads • Users download from each other and do not tie up the bandwidth pipe going to the server • Utilizes file pieces described in the .torrent file (pieces are downloaded from each other
Torrent Torrent file has been created and seeded. Others can now download the torrent file and connect to the swarm File will then be downloaded from the server as well as clients
RSS Feed • Users want their information when they aren’t there • RSS Feed allows the user to set up specific query and walk away • Query will be “standing” for a configurable amount of time • Feed will be updated as the query is hit • Fast and easy to learn publish and subscribe system • Most users know how to use RSS (easy to use) • RSS page is unique to that user and query • User can however pass the URL to other users who then can subscribe to the query too • Example : A group of users is interested in “IED and Iraq”. A RSS query is set up, as products are placed into the monitor directory, that information is passed onto the user’s RSS feed
Google Earth • KML (Keyhole Markup Language) • XML data that Google Earth knows how to display • Visually represent data • More and more users are using tools to see their data visually • Can see similarities (such as distance and location) • Quickly find relevant data
GeoCoder • UGFIDD utilizes geo-coder web services provided by Google • Passing in a String will result in null if nothing is found or a Lat Lon for the location • Example: • User searches for “Syracuse” • UGFIDD will return hits for documents that contain “Syracuse” and also geospatial results near Syracuse, NY • http://code.google.com/apis/maps/documentation/geocoding/
Future Work • Make it faster! • Multiple SOLR implementations. Distributed data implementation • Java Executor Service allows for multi-thread workers. This has been implemented but will take time to adjust based off of system • Create a client • Currently UGFIDD is a server only implementation • Creating a client is easy with web services • Allow user to ingest files using HTTP and FTP upload • Distributed Queries • Currently only one server is queried at a time • Would like to make a middle “tracker” to distribute queries and results
Demo • Server is running on my home computer with an ingest directory already set up • Will move files into ingest directory • Demonstrate query capability • Demonstrate publishing capability • Will use SOAP UI a web services test utility to demonstrate client interaction • http://www.soapui.org/ • Code is located at: http://code.google.com/p/peer2peersuny/source/browse/
UGFIDD • Questions?