Outline

Secure Semantic Information Grid for Network Centric OperationsDr. Bhavani ThuraisinghamPrincipal Investigator The University of Texas at Dallasbhavani.thuraisingham@utdallas.eduJune 2009

Outline • Objectives • Tasks/Team Members • History • Accomplishments • Directions

Objectives • Develop technologies for secure information grid to support the DoD’s Network Centric Operations • Complement the current projects funded by AFOSR for handling large amounts of data sets • Dependable information sharing • Dual-use technologies

Tasks/Team Members • Policy based accountability for secure grids • Purdue University (Elisa Bertino, Lorenzo Martino) • Secure pervasive infrastructures • University of Texas at Arlington (Sajal Das, Yonghe Liu) • Secure Distributed Storage • University of Texas at Dallas (I-Ling Yen) • Encrypted Data Storage • University of Texas at Dallas Murat Kantarcioglu • Secure Query Processing in Clouds • University of Texas at Dallas (Latifur Khan) • Other • Secure Geosocial Information Grid • University of Texas at Dallas (Latifur Khan, Murat Kantarcioglu) • Dependability Issues • University of Texas at Dallas (Latifur Khan, Kevin Hamlen, Eric Wong) • Research and Integrated Demo (FY10 and Beyond) • Administrative Assistant • Ms Jamie McDonald, The University of Texas at Dallas

Recent History Information Operations through Infospheres: Assured Information Sharing AFOSR 2005-2008 Assured Information Sharing AFOSR MURI (Tim Finin) UMBC/Purdue/UTD/UIUC/ UTSA/UM Secure Information Grid (Congressional Funds) Policy based Semantic Web NSF Semantic Framework With Blackbook IARPA Dependable Information Sharing IARPA Open Science Grid DOE/NSF (Planned)

AccomplishmentsUTDallasUTArlingtonPurdue

Layered Architecture

Documents Documents BLACKBOOK Ontology based Heuristic Reasoning Rule based reasoning, Data mining Other Services Entity Extraction Entity Extraction, Relationship Extraction e.g., Security Integrity RDF Graph Store Management Storage, Query, Integration RDF Graph Store RDF Graph Store RDF Graph Store Secure Semantic Framework

Storing RDF Data in Hadoop And Retrieval Dr. Latifur Khan lkhan@utdallas.edu

Objectives/Environment • Objectives • To build efficient storage using Hadoop for Peta-bytes of data • To build an efficient secure query mechanism • Possible outcomes • Open Source Framework for RDF • Integration with Jena • Environment • 4 node cluster in Semantic Web Lab • 10 node cluster in Cloud Computing Lab • 4 GB main memory • Intel Pentium IV 3.0 GHz processor • 640 GB hard drive • OpenCirrus HP labs test bed • Collaboration with Andy Seaborne, HP Labs

Preprocessing Steps

Some Query Results Horizontal axis: Number of Triples Vertical axis: Time in milliseconds

Design and Analysis of Querying EncryptedData in Relational Databases Dr. Murat Kantarcioglumuratk@utdallas.edu

Our Contributions • Storage • Performance of cipher modes are analyzed under different granularity & disk access patterns • CTR based page level encryption method is proposed • Query • Performances are compared under different query types • We propose the vertical partitioning approach to prevent unnecessary cryptographic operations over non-sensitive attributes • First we focus on single table partitioning; Later on we generalize this problem to cover the entire schema and have proposed a heuristic to prevent exhaustive search space

Threat Models Trusted Components Untrusted Component Query Engine Disk Access Hard Disk Transformed Query Query Results Authentication and Query Transformation Query Results Plain Query Client Application

Counter (CTR4)

Decrypting 1 GB data • CTR4 is faster than CBC, CTR, OFB, CFB

Accountability Mechanisms for Grid Systems Dr. Elisa Bertino Research Director CERIAS Computer Science Department Purdue University Bertino@cs.purdue.edu

Contributions • What is accountability? • Accountability is defined as “A is accountable to B when A is obliged to inform B about A’s past or future actions and decisions, to justify them, and to suffer punishment in the case of eventual misconduct” • Accountability is an important aspect of any computer system for assuring that every action executed in the system can be traced back to some entity • The dynamic and multi-organizational nature of grid systems requires effective and efficient accountability system • Contributions • We have developed a distributed mechanism to capture provenance information available during the distributed execution of jobs in a grid • Our approach is based on the notion of accountability agents • We have developed a simple yet effective language to specify the accountability data to collect • We have implemented a prototype of the accountability system on an emulated grid testbed

Overall Architecture of Accountable Grid Systems

Two approaches Combined • Job-flow based approach • Jobs flow across different organizational units • Long computations are often divided into many sub-jobs to be run in parallel • A possible approach is to employ point-to point agents which collect data at each node that the job traverses • Grid node based approach • It focuses on a given location in the flow and at a given instant of time for all jobs • Viewpoint is fixed • The combination of two approaches allows us to collect complementary information

Accountability Policies Example A job is submitted to Purdue University SP and then assigned for execution to the RPs, A-state University, and B-state University. Purdue agrees to send job relation data (handle, job-id, subjob-id, RP-id, timestamp) to A-state and B-state when the processed job enters into active state. Additionally, A-state locally collects resource data (memory consumption, cpu time, network bandwidth, disk bandwidth) every day during the week. The policies for such scenario are as follows: [at Purdue University] shared_policyPurdue:= send_job_data (agent@Purdue, agents_in_job_relationPurdue, active, dataSetactive, job-id) collect_job_data (agent@Purdue, active, dataSetactive, DBPurdue) agents_in_job_relationPurdue := agent@A-state (AND) agent@B-state dataSetactive := handle (AND) job-id (AND) subjob-id (AND) RP-id (AND) timestamp [at A-state University] local_policyA-state:= collect_resource_data (agent@A-state, dataSetlocal, time_constraintsA-state, DBA-state) dataSetlocal := memory consumption (AND) cpu time (AND) network bandwidth (AND) disk bandwidth time_constraintsA-state:= weekdays (AND) all.days

Job Submission SP RP RP HN HN CN CN CN Experimental Evaluations Exp. 1 / Scalability with respect to the number of computing nodes • The response time is computed as the difference between the time at which the user receives the result and the time at which a user submits the job • Blue bars show the overhead introduced by accountability, which is negligible

A Framework for Pervasively Secure Grid Infrastructure Sajal K. Das, Director Center for Research in Wireless Mobility and Networking (CReWMaN) Department of Computer Science and Engineering The University of Texas at Arlington das@uta.edu http://crewman.uta.edu

Mobile / Pervasive Grid: A New Paradigm • Next-generation information / knowledge Grid. • Huge resource pool of laptops, mobile devices, and wireless sensors. • A pervasive computing infrastructure of smart devices connected to the Grid across heterogeneous wireless networks and service providers. • Context awareness (e.g., activity, user / device / node mobility) is the key. • Applications: e-Learning, e-Health, banking, power grid, security, border control, disaster / crisis management, emergency response and rescue, … Computational Grid Wireless Access Points Grid Community Data Grid

Research Objectives and Challenges • Objectives • Dynamic resource management in (wireless) pervasive grids to handle multi-mission, often conflicting tasks • Development of multi-level security framework for high assurance information sharing in pervasive grids • Context / situation-aware data collection, aggregation (fusion), and mining from heterogeneous sensors, surveillance, monitoring, and tracking devices • Learning patterns via information fusion leading to anomaly detection, hence potential security threats • Intelligent decision making in an integrated, adaptive, autonomous and scalable manner for high information assurance, safety and security • Security Challenges • Limited resource in wireless mobile devices and sensors  Limited defense capability • Uncertain, often unattended or hostile environment • Node compromises (insider attacks)  revealed secrets • Post-deployment access and control is an issue • Lack of centralized control  Potential loss of integrity and confidentiality due to information fusion • Multiple-attacking angles Single level defense mechanism highly vulnerable

Resource assignment WAP1 Users under WAP1 Task Allocation Grid Community Users under WAPp Task Allocation Resource WAPp Research Contributions • Novel game-theoretic framework for pervasive grid infrastructure that tracks node mobility and optimizes resource usage (e.g., wireless bandwidth, response time) for single and multi-class tasks Pool of (mobile) Users WAP: Wireless Access Point

Research Contributions (cont’d) • Mathematical models and framework for multi-level security in pervasive grids (wireless sensor networks) - distributed key management (among a cluster of nodes) - secure information gathering, fusion, and routing - modeling smart adversaries - detection of compromised and replicated nodes - controlling propagation of internal attacks Discredit normal nodes Compromised Node Selective packet dump Report false data Infect other nodes Forge command False routing info

Experimental Results Medium Bandwidth grid systems High Bandwidth grid systems • COOP: Cooperative algorithm • OPTIM : Optimal algorithm Low Bandwidth grid systems • PRIMOB: Pricing based Mobile algorithm (game)

Multi-Level Security Framework Highly Assured Grid Network Operation Architectural Components Epidemic Theory Theoretical Foundations Contain Outbreak Compromise Process Modeling Topology Control Detect Compromise Trust / Reputation Model Information Theory Key Management Revoke Revealed Secrets Statistical Learning and Classification Secure Aggregation Self-Correct Tampered Data Node Compromise Digital Watermarking Secure Routing Purge Tampered Data DoS Defense Uncertainty Characterized Resource Limited Pervasive Grid Environment Intrusion Detection

Trust Model: ReputationResults Attack Scenarios • No malicious nodes  all nodes’ reputation close to 1 • Reputation of malicious nodes is significantly lower than legitimate ones • Reputation of malicious nodes is proportional to amount of true data they send

Secure, Highly Available, and High Performance Peer-to-PeerCloud Storage Systems Dr. I-Ling Yen The University of Texas at Dallas

What is the good storage design for cloud environment? • The role of the storage system in assured information sharing • Basis for availability, security, access performance assurance • If the storage system is not available or does not offer good access performance, then the upper layer data applications cannot have high availability or good performance • If some nodes in the storage system are compromised and the system cannot tolerate it, then the secure data are compromised • Various storage system designs • Cluster based systems versus widely distributed cloud environment • Replication and encryption based strategies and secret sharing based strategies • Secret sharing, erasure coding, short secret sharing (SSS) • Directory management

Good storage design for cloud? Widely distributed solutions perform better In security and access costs: better In availability: almost the same (=0.01, σ=2*). m=40, t=5, pnf=0.01, variant x=log10(pef). x=log10(Data Size), m=40, t=5 t

Good storage design for cloud? SSS based storage performs better In security: same as secret sharing better than replication In availability: almost the same for all schemes In access cost: better than secret sharing mostly better than replication In storage cost: SSS and replication are similar and better than secret sharing (=0.01, σ=2*). m=40, t=5, pnf=0.01, variant x=log10(pef). x=log10(Data Size), m=40, t=5 t

Major design issues Use widely distributed infrastructure and SSS Directory management Specifically maintain directories Costly in widely distributed systems Needs additional directory access cost Use P2P solution: Design DHT for SSS Access protocols In SSS, the consistency issue is much more important than that in conventional storage systems If a share is inconsistent, the reconstructed data is fully incorrect In SSS, the access protocol design is much more important than that in conventional storage systems One may keep on getting inconsistent shares If only one share is maintained at each servers, then there may be data losses since these shares may not be consistent Need to: Design efficient access protocols

DHT for SSS • DHT for SSS • Hash algorithm for SSS • d.si.identifier = (d.identifier + i*2m / n) % 2m • d: data object • d.si: the ith share of d • 2m: the size of the identifier space for DHT • Adopt One-Hop-Lookup • Reduced routing time • Method to support accesses of shares from near-by servers • Each server stores the geographical locations and IPs of all other servers • Use geographical distance to approximate the real latency when no other information is available • Conducted experimental studies to understand the relationship between geographical distance and ping latency and the potential error rate

Efficient Access Protocols • Use version number to validate share consistency • When updating, generate and attach a version number to shares • During retrieval, validate the consistency of shares • Problem: how to efficiently generate version numbers • Fully decentralized approach: the new updates may not have the largest version number • Centralized version server: bottleneck problem, single failure point • Solution: distributed version servers: For each data, use the hash value of its first share to determine the version server • Very efficient, impact of failure is localized • Maintain share history to avoid data losses • Problem • E.g., (10, 5) sharing scheme, Current version number = 10 • Client x updated 3 shares and failed (version no = 11) • Client y updated 4 shares and failed (version no = 12) • 3 shares with vn = 10, 3 shares with vn = 11, 4 shares with vn = 12 • Solution: Maintain multiple versions • Need to properly determine which share to retrieve • Need to have a share removal protocol

Compare with other access methods Update latency (versus k) Read latency (versus k) Space cost Get version number from all servers Our approach Fully decentralized appraoch

Directions • Integrated demonstration • E.g., Accountability into Hadoop Prototype • Host prototype on secure infrastructure • Integrate with Blackbook • Feed results into other projects • AFOSR-MURI, IARPA • Participate in DOE/NSF Open Science Grid • Transfer technology to DoD/NCES

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: