1 / 19

Rohini K. Srihari State University of New York at Buffalo May 6, 2003

UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR). Rohini K. Srihari State University of New York at Buffalo May 6, 2003. Tracking suspicious web browsing.

emmett
Download Presentation

Rohini K. Srihari State University of New York at Buffalo May 6, 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State University of New York at Buffalo May 6, 2003

  2. Tracking suspicious web browsing Shouldwe let him see it? Should we monitor his next moves? • What Information has the user obtained till now? • What was inferred from the visited pages? • What additional information can they infer with this new web-page? • Did we intend to reveal this information? • Should we be alerted if this is unintended? User has visited these pages http://www.faa.gov/apa/safer_skies/fsstats.htm http://www.faa.gov/certification/aircraft/sfar88/01hstry2.pps User is requesting http://www.awp.faa.gov/fsdo/docs/spm_info/what/fy2000/sdplan00.doc Measuring Unintended Information Revelation(UIR) for visited and requested pages will answer these questions FAA Workshop May 2003

  3. Outline • Unintended Information Revelation • Problem Definition • Solutions with Existing Technology • Proposed Solution • UIR System Architecture • Extracting Concepts and Associations • Creating Concept Chain Graphs (CCG) • Mining and visualization of CCGs • Evaluation Methodology • Preliminary Results • Summary FAA Workshop May 2003

  4. User’s previous request Important Concepts • safer skies, fatal accidents, runway incursions, hijack, etc. Interesting Information • Number and percentage of Fatal Accidents in 1996 • Runway Incursions • Ice/Snow • In-Flight fire Fact Sheet: Aviation Accident Statistics http://www.faa.gov/apa/safer_skies/fsstats.htm FAA Workshop May 2003

  5. User’s current request Fuel tank ignition events http://www.faa.gov/certification/aircraft/sfar88/01hstry2.pps Important Concepts • fatalities, fuel tank ignition, hull loss, electrostatics, etc. Interesting Information • Identifies causes for fuel tank ignition accidents • Small bomb • Faulty Wiring • Pump Faults FAA Workshop May 2003

  6. Synthesized Information • In-flight fire can cause accidents • Fuel-tank ignitions caused by small bombs, faulty pumps/wirings, etc. • Domain Knowledge: In-flight fires and fuel-tank ignitions are aviation hazards. • Inference: faulty wirings can cause in-flight fires FAA Workshop May 2003

  7. 3 4 1 2 7 4 1 6 11 12 9 1 Alerts Log UIR Alert Agent UIR is a phenomenon where information synthesized from multiple documents is more than the information provided by the sum of the individual documents Generate alerts for unintended information revelation based on user’s browsing history and requested pages User Browsing History A Alert Generated on User B UIR Alert Agent B C FAA Workshop May 2003

  8. 3 2 11 7 12 4 10 3 1 2 4 6 8 5 7 9 12 10 11 1 10 Architecture of UIR System Pre-existing Domain Ontology/Lexicon (e.g Aviation Ontology) Concept Chain Graphs (CCG) Document Collection (web pages) Information Extraction Input: User surfing web pages on sites of interest to national security Document subset UIR Output: web pages that reveal too much information; human monitor can visualize paths in CCG CCG instantiated for subset of interest UIR Alert Module Accident-hazard-fuel tank -… ice/snow-hazard-fatalities-… User alerts / logs

  9. Proposed Solution Step 1: Determine significant concepts and associations in target domain (offline, semi-automatic) • use of existing ontologies such as DAML ontology on aviation • use of information extraction to automatically extract concepts and associations from representative document collection Step 2: Create Concept Chain Graph (CCG) • consolidates underlying domain knowledge, specific documents • weights concepts and associations using both domain weights, individual document weights Step 3: Visualization and text mining operations on CCG Step 4: UIR Alert agent invoked • tracking user surfing patterns • what-if scenarios FAA Workshop May 2003

  10. Evaluation Methodology Evaluate precision and recall of IR system IR system includes query expansion Typical IR evaluation TREC Query: find pages that discuss ways of causing air disasters Ranked web pages TREC Narrative: Pages that are relevant to causing air disasters will mention aircraft maintenance operations or passenger screening procedures Evaluate ability to generate narrative Relevant web pages UIR System UIR Evaluation CCG FAA Workshop May 2003

  11. Step 1: Extracting Concepts and Associations Extracting Concepts: • Use InfoXtract engine from Cymfony • Named Entity Tagger (NE) identifies common Entities like Date, Time, Location, State, Country, Organization, Person. • InfoXtract also identifies significant noun groups, verb groups • e.g. fuel tanker, runway de-icing Extracting Associations: • Concept Co-occurrence in documents • Concept Proximity in sentences/paragraphs Advanced Techniques using machine learning Output implies: System has 85% confidence that runway and taxiway associated by some relation. … The designation for one end of the runway should be used on the sign only when the taxiway intersects the beginning of that runway. Taxiways that intersect the runway at intermediate points must have the designations for both runway ends. ... Association Learning (runway, taxiway): 0.85 FAA Workshop May 2003

  12. Sample Information Extraction output Concepts and Named Entities are marked up during information extraction DATE: October 23, 1992 NO. 92-03 TO: AIRPORT CERTIFICATION PROGRAM INSPECTORS TOPIC: Effects Of Type II Deicing Fluid On RunwayFriction The FAA's Technical Center in conjunction with the Port Authority of New York and New Jersey conducted tests to determine the effects of Type II aircraft deicing fluids on runway friction. The tests were conducted this past July and August at La Guardia and John F. Kennedy International Airports on grooved asphaltic pavement. Since the tests were conducted in the summer no attempt was made to simulate ice or snow on the pavement surface. (See future test programs.) Two specially instrumented B-727's and two Saab friction devices were used to measure the runway friction. The purpose of this effort was to test the premise that Type II deicing fluid deposited on a runway poses a hazard to aircraft landing on the runway. At the present time it is unknown to what extent Type II actually falls off a departing aircraft and what portion of it is deposited on the runway. (See future test programs.) FAA Workshop May 2003

  13. Step 2: Create Concept Chain Graph • Create concept chain graph based on underlying domain knowledge (concepts, associations). • Weight concept nodes based on frequency, type, user-defined importance • weight associations based on proximity, importance of concepts they link, uniqueness • Project/Map documents viewed by user onto CCG • A document is represented as a probabilistic sub-graph in the CCG • Proximity and other metrics are used to assign weights on the concepts(nodes) and associations(edges) discovered in a document 1 0.101 0. 088 0.013 0. 239 0.124 0.123 0. 01 Aviation Ontology 0. 1065 0.54 Document-specific concepts, associations, with weights FAA Workshop May 2003 0.2324

  14. Step 2: Instantiated Concept Chain Graph Accident Statistics Lightning AIRPLANE Fuel Tank HAZARD Wiring AVIATION Statistics Windshear Fuel Tank Ignition events ACCIDENT Pumps Air_traffic_ _control_tower Ice/snow Small Bomb In-flight fire Fatalities Runway Incursions hull losses Associations in Document Domain Knowledge Fuel tank ignition events FAA Workshop May 2003

  15. Step 3: Mining the CCG • Goals • detecting information-rich concept chains • e.g. air disaster - onboard explosion - fuel tanker • quantifying information revealed • issue alerts when too much information is revealed • “what-if” scenarios to enable dissemination of benign information • Graph traversal • generate CCG representing documents viewed by user • start with explicit query/search terms as seed concepts; could be multiple terms • strategies: • try to find best paths/chains that connect “seed” concepts; could generate multiple chains • try to find best subgraph • various graph traversal algorithms are suitable FAA Workshop May 2003

  16. Graph Traversal Techniques • minimum cover techniques • INSTANCE: Graph G = {V, E} • SOLUTION: A vertex cover for G, i.e., a subset V’ Vsuch that, for each edge (u,v)  E, at least one of u and v belongs to V'. • MEASURE: Cardinality of the vertex cover, i.e., |V’ |. • Flow networks • given a network (G,s,t,c) where G = (V,E) is a directed graph with n vertices and m edges, s and t are two vertices (source and sink), and c: E-> R+ is a function that defines capacities of edges • find maximum flow from s to t that satisfies capacity constraints • Energy minimization (used in image processing) • active contours (e.g. snakes) used for tracking various shapes, including road detection • dynamic programming solutions available FAA Workshop May 2003

  17. Step 4: Track user surfing with UIR module Lightning AIRPLANE Fuel Tank HAZARD Wiring AVIATION Statistics Fuel Tank Ignition events Windshear ACCIDENT Pumps Air_traffic_ _control_tower Ice/snow Small Bomb In-flight fire Previously viewed page(s) Fatalities hull losses Runway Incursions requested page UIR module determines that these two documents reveal new association between wiring and accidents. FAA Workshop May 2003

  18. Preliminary Experiments FAA Workshop May 2003

  19. Summary Benefits to FAA • Automated monitoring information acquired by users of the FAA website and alert mechanism for unintentionally revealed information. • Shortlist and identify documents and concepts seen by the user that reveal unintended information • Domain map visualization tool facilitates concept and association based queries Claims • new, richer representation for information retrieval that combines keyword statistics (bag-of-words model) with NLP-based information extraction • Solution is general to any domain; only domain map needs to be customized/retrained • Experts can intervene, guide the process, if desired; tools provided FAA Workshop May 2003

More Related