ugfidd n.
Skip this Video
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 20

UGFIDD - PowerPoint PPT Presentation

  • Uploaded on

UGFIDD. Unstructured Geospatial File Indexer and Distributed Dissemination. 1. Present Scenario. Transported. User must know what to search on. Very slow. Search Criteria. Users need data in low com situations. Data. Bottle Neck. UGFIDD Overview.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'UGFIDD' - keiki

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Unstructured Geospatial File Indexer and Distributed Dissemination


present scenario
Present Scenario


User must know what to search on

Very slow

Search Criteria

Users need data in low com situations


Bottle Neck

ugfidd overview
UGFIDD Overview
  • Provide a simple to use Web Service interface
    • This allows for customized clients
    • Free text “Google” like searches
    • Complete un-structed data – No need for a data model to follow
    • Communication is done over HTTP through SOAP (Simple Object Access Protocol ) messages
    • Currently supports PDFs, Microsoft Docs, JPEGs
  • Provide usable return types
    • RSS Feeds – Allow users to subscribe to standing queries
    • KML Results – Allow users to visually represent their data spatially
    • Plain Text – Give users their information fast and reliably
    • Bittorent – Allow users to distribute data quickly and distributed
code enviroment
Code Enviroment
  • Subversion
    • I have written a lot of code and have spent a lot of time, provides a piece of mind
    • All of the code written was done under version control. This is very important in today’s commercial atmosphere.
    • Allows for many developers to work at the same type
    • Assists with merge conficts
    • Allows easy reverts and diff’s to be done
  • Maven
    • New and upcoming build tool
    • Allows for easy integration and dependency management
    • Completely written in XML
    • Repositories allow for open source projects to be easily be pulled in to assist in program development
pom file snippet
Pom File (Snippet)
  • <project xmlns="" xmlns:xsi=""
  • xsi:schemaLocation="">
  • <modelVersion>4.0.0</modelVersion>
  • <groupId>com.p2p</groupId>
  • <artifactId>Peer2peer</artifactId>
  • <packaging>war</packaging>
  • <version>1.0-SNAPSHOT</version>
  • <name>Peer2peer</name>
  • <url></url>
  • <properties>
  • <>UTF-8</>
  • <final.version>1.0</final.version>
  • <>${artifactId}</>
  • <java.version>1.6</java.version>
  • </properties>


  • <groupId>junit</groupId>
  • <artifactId>junit</artifactId>
  • <version>3.8.1</version>
  • <scope>test</scope>
  • </dependency>
  • <dependency>
  • <groupId>jpath</groupId>
  • <artifactId>jpathwatch</artifactId>
  • <version>0.93</version>
  • </dependency>
  • <dependency>
high level architecture
High Level Architecture





Startup &


Extractors + Publishers


Doc / Jpeg Parser

Bittorent Publisher

Rss Feed / Kml Feed

File Monitor


Core Services





ingest of a file
Ingest of a file

Ingest Orchestration

Uses Tikka document extractors to extract header information along with binary data. JPEG parser parses Geospatial data





Ingest monitor is triggered off of system level events.

File Monitor




Schema has been customized to store location and other valuable data



publish of results
Publish of results

Query Orchestration

Parse Query

Web Service Endpoint

Depending on the return type and call UGFIDD will query and return customized results


User enters the query “Syracuse”


Core Services

Google Earth







Use query to search index


geohash example geohash java
GeoHash (Example
  • GeoHash algorithm recently developed by Gustavo Niemeyer
    • Publicly released in 2008
    • Very new way of representing geo-spatial data
    • UGFIDD takes advantage of the single hash produced by the algorithm
    • Found many implementations in other languages (Python), ported it over to Java for the UGFIDD project
  • Distance searches
    • Geohash produces bounding boxes by nature
    • This is a perfect fit for UGFIDD and it’s free search capability
    • Geospatial searches are now extremely fast and easy to implement
    • Do not need complicated point radius algorithms which slow processing down
  • WKT (Well Known Text)
    • A new spec to represent vector geometry on single lines
    • User can query using single strings and does not need to represent points as Lat, Lon
  • POINT(6 10)
  • LINESTRING(3 4,10 50,20 25)
  • POLYGON((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2))
  • MULTIPOINT((3.5 5.6), (4.8 10.5))
  • MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4)) MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3)))
  • Allows for distributed downloading
  • Users download .torrent files which represent the tracker and information about the file or files
  • Many free available clients available to use
  • Bittorrent takes pressure off of the central server
    • Users only download the .torrent file
    • Communicate via the tracker (UGFIDD is using a open source tracker)
    • Users download from each other while there is a seed
    • UGFIDD will always be the initial seeder
  • Extremely fast downloads
    • Users download from each other and do not tie up the bandwidth pipe going to the server
    • Utilizes file pieces described in the .torrent file (pieces are downloaded from each other

Torrent file has been created and seeded.

Others can now download the torrent file and connect to the swarm

File will then be downloaded from the server as well as clients

rss feed
RSS Feed
  • Users want their information when they aren’t there
  • RSS Feed allows the user to set up specific query and walk away
    • Query will be “standing” for a configurable amount of time
    • Feed will be updated as the query is hit
    • Fast and easy to learn publish and subscribe system
    • Most users know how to use RSS (easy to use)
  • RSS page is unique to that user and query
    • User can however pass the URL to other users who then can subscribe to the query too
    • Example : A group of users is interested in “IED and Iraq”. A RSS query is set up, as products are placed into the monitor directory, that information is passed onto the user’s RSS feed
google earth
Google Earth
  • KML (Keyhole Markup Language)
    • XML data that Google Earth knows how to display
  • Visually represent data
    • More and more users are using tools to see their data visually
    • Can see similarities (such as distance and location)
    • Quickly find relevant data
  • UGFIDD utilizes geo-coder web services provided by Google
  • Passing in a String will result in null if nothing is found or a Lat Lon for the location
  • Example:
    • User searches for “Syracuse”
    • UGFIDD will return hits for documents that contain “Syracuse” and also geospatial results near Syracuse, NY
future work
Future Work
  • Make it faster!
    • Multiple SOLR implementations. Distributed data implementation
    • Java Executor Service allows for multi-thread workers. This has been implemented but will take time to adjust based off of system
  • Create a client
    • Currently UGFIDD is a server only implementation
    • Creating a client is easy with web services
    • Allow user to ingest files using HTTP and FTP upload
  • Distributed Queries
    • Currently only one server is queried at a time
    • Would like to make a middle “tracker” to distribute queries and results
  • Server is running on my home computer with an ingest directory already set up
  • Will move files into ingest directory
  • Demonstrate query capability
  • Demonstrate publishing capability
  • Will use SOAP UI a web services test utility to demonstrate client interaction
  • Code is located at:
  • Questions?