Ugfidd
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

UGFIDD PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

UGFIDD. Unstructured Geospatial File Indexer and Distributed Dissemination. 1. Present Scenario. Transported. User must know what to search on. Very slow. Search Criteria. Users need data in low com situations. Data. Bottle Neck. UGFIDD Overview.

Download Presentation

UGFIDD

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ugfidd

UGFIDD

Unstructured Geospatial File Indexer and Distributed Dissemination

1


Present scenario

Present Scenario

Transported

User must know what to search on

Very slow

Search Criteria

Users need data in low com situations

Data

Bottle Neck


Ugfidd overview

UGFIDD Overview

  • Provide a simple to use Web Service interface

    • This allows for customized clients

    • Free text “Google” like searches

    • Complete un-structed data – No need for a data model to follow

    • Communication is done over HTTP through SOAP (Simple Object Access Protocol ) messages

    • Currently supports PDFs, Microsoft Docs, JPEGs

  • Provide usable return types

    • RSS Feeds – Allow users to subscribe to standing queries

    • KML Results – Allow users to visually represent their data spatially

    • Plain Text – Give users their information fast and reliably

    • Bittorent – Allow users to distribute data quickly and distributed


Code enviroment

Code Enviroment

  • Subversion

    • I have written a lot of code and have spent a lot of time, provides a piece of mind

    • All of the code written was done under version control. This is very important in today’s commercial atmosphere.

    • Allows for many developers to work at the same type

    • Assists with merge conficts

    • Allows easy reverts and diff’s to be done

  • Maven

    • New and upcoming build tool

    • Allows for easy integration and dependency management

    • Completely written in XML

    • Repositories allow for open source projects to be easily be pulled in to assist in program development


Pom file snippet

Pom File (Snippet)

  • <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  • xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  • <modelVersion>4.0.0</modelVersion>

  • <groupId>com.p2p</groupId>

  • <artifactId>Peer2peer</artifactId>

  • <packaging>war</packaging>

  • <version>1.0-SNAPSHOT</version>

  • <name>Peer2peer</name>

  • <url>http://maven.apache.org</url>

  • <properties>

  • <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

  • <final.version>1.0</final.version>

  • <artifact.name>${artifactId}</artifact.name>

  • <java.version>1.6</java.version>

  • </properties>

    <dependency>

  • <groupId>junit</groupId>

  • <artifactId>junit</artifactId>

  • <version>3.8.1</version>

  • <scope>test</scope>

  • </dependency>

  • <dependency>

  • <groupId>jpath</groupId>

  • <artifactId>jpathwatch</artifactId>

  • <version>0.93</version>

  • </dependency>

  • <dependency>


High level architecture

High Level Architecture

EndPoints

SoapUI

Daemon

WebServiceEndpoints

Startup &

Shutdown

Extractors + Publishers

Interfaces

Doc / Jpeg Parser

Bittorent Publisher

Rss Feed / Kml Feed

File Monitor

Jetty

Core Services

Utilities

Query

Indexing

Solr


Ingest of a file

Ingest of a file

Ingest Orchestration

Uses Tikka document extractors to extract header information along with binary data. JPEG parser parses Geospatial data

Metadata

Extraction

XML

Metadata

Ingest monitor is triggered off of system level events.

File Monitor

File

System

HTTP

Schema has been customized to store location and other valuable data

File

Solr


Publish of results

Publish of results

Query Orchestration

Parse Query

Web Service Endpoint

Depending on the return type and call UGFIDD will query and return customized results

HTTP

User enters the query “Syracuse”

Files

Core Services

Google Earth

XML

Metadata

Publish

HTTP

RSS

Solr

Use query to search index

Torrent


Geohash example geohash java

GeoHash (Example GeoHash.java)

  • GeoHash algorithm recently developed by Gustavo Niemeyer

    • Publicly released in 2008

    • Very new way of representing geo-spatial data

    • UGFIDD takes advantage of the single hash produced by the algorithm

    • Found many implementations in other languages (Python), ported it over to Java for the UGFIDD project

  • Distance searches

    • Geohash produces bounding boxes by nature

    • This is a perfect fit for UGFIDD and it’s free search capability

    • Geospatial searches are now extremely fast and easy to implement

    • Do not need complicated point radius algorithms which slow processing down

  • WKT (Well Known Text)

    • A new spec to represent vector geometry on single lines

    • User can query using single strings and does not need to represent points as Lat, Lon


Place holder for geohash performance

Place Holder for Geohash performance


Ugfidd

WKT

  • POINT(6 10)

  • LINESTRING(3 4,10 50,20 25)

  • POLYGON((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2))

  • MULTIPOINT((3.5 5.6), (4.8 10.5))

  • MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4)) MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3)))

  • http://en.wikipedia.org/wiki/Well-known_text


Bittorrent

Bittorrent

  • Allows for distributed downloading

  • Users download .torrent files which represent the tracker and information about the file or files

  • Many free available clients available to use

  • Bittorrent takes pressure off of the central server

    • Users only download the .torrent file

    • Communicate via the tracker (UGFIDD is using a open source tracker)

    • Users download from each other while there is a seed

    • UGFIDD will always be the initial seeder

  • Extremely fast downloads

    • Users download from each other and do not tie up the bandwidth pipe going to the server

    • Utilizes file pieces described in the .torrent file (pieces are downloaded from each other


Torrent

Torrent

Torrent file has been created and seeded.

Others can now download the torrent file and connect to the swarm

File will then be downloaded from the server as well as clients


Rss feed

RSS Feed

  • Users want their information when they aren’t there

  • RSS Feed allows the user to set up specific query and walk away

    • Query will be “standing” for a configurable amount of time

    • Feed will be updated as the query is hit

    • Fast and easy to learn publish and subscribe system

    • Most users know how to use RSS (easy to use)

  • RSS page is unique to that user and query

    • User can however pass the URL to other users who then can subscribe to the query too

    • Example : A group of users is interested in “IED and Iraq”. A RSS query is set up, as products are placed into the monitor directory, that information is passed onto the user’s RSS feed


Google earth

Google Earth

  • KML (Keyhole Markup Language)

    • XML data that Google Earth knows how to display

  • Visually represent data

    • More and more users are using tools to see their data visually

    • Can see similarities (such as distance and location)

    • Quickly find relevant data


Jpeg product displayed via published kml

JPEG Product displayed via published KML


Geocoder

GeoCoder

  • UGFIDD utilizes geo-coder web services provided by Google

  • Passing in a String will result in null if nothing is found or a Lat Lon for the location

  • Example:

    • User searches for “Syracuse”

    • UGFIDD will return hits for documents that contain “Syracuse” and also geospatial results near Syracuse, NY

    • http://code.google.com/apis/maps/documentation/geocoding/


Future work

Future Work

  • Make it faster!

    • Multiple SOLR implementations. Distributed data implementation

    • Java Executor Service allows for multi-thread workers. This has been implemented but will take time to adjust based off of system

  • Create a client

    • Currently UGFIDD is a server only implementation

    • Creating a client is easy with web services

    • Allow user to ingest files using HTTP and FTP upload

  • Distributed Queries

    • Currently only one server is queried at a time

    • Would like to make a middle “tracker” to distribute queries and results


Ugfidd

Demo

  • Server is running on my home computer with an ingest directory already set up

  • Will move files into ingest directory

  • Demonstrate query capability

  • Demonstrate publishing capability

  • Will use SOAP UI a web services test utility to demonstrate client interaction

  • http://www.soapui.org/

  • Code is located at: http://code.google.com/p/peer2peersuny/source/browse/


Ugfidd1

UGFIDD

  • Questions?


  • Login