Semantically Enhanced and Efficient Enforcement of Mobile Consumer’s Privacy Preferences

Semantically Enhanced and Efficient Enforcement of Mobile Consumer’s Privacy Preferences Nabil R. Adam,Mahmoud Youssef and Vijay Atluri Center for Information Management Integration and Connectivity (CIMIC) Rutgers University Presentation at SAP Research Labs, Palo Alto 4/5/2005

Outline • Introduction • Research Problem • Part I: A Solution with Focus on Efficiency • Controlling Information Flow • Access Control Model • Evaluation and Enforcement Mechanism • Comments on the System Design • Performance Evaluation Study • Summary (Part I) • Part II: A Solution with Focus on Expressiveness • Another Privacy Criterion • Quick Look at Description Logics, OWL , and RDF • Preferences Ontology • Query Processing • Implementation • Performance Evaluation Study • Summary (Part II)

Scenario: Location-based Advertising • With the availability of positioning and tracking technology, it is possible to • Track different entities, e.g., vehicles, containers, and individuals • The Location Service (LS) aggregates location information • Location information is managed as Moving Objects Database (MOD) • Merchants customize offers based on consumer profile and location.

R1 R1 R2 R3 R4 R2 R3 2 3 4 The Moving Objects Problem Moving objects need special data modeling due to: • The rate of update • Too many customers sending continuous updates. • Traditional databases are not designed for intensive updates • The same problem exists in the RFID domain. • Queries usually need to address the future • Type of queries • Queries submitted by the consumers are also moving 3 2 6 R2 7 5 1 8 R4 R3 1 2 3 4 5 6 7 8 R1 4 (a) 3 R1 2 6 R3 5 7 1 8 4 R2 5 1 6 7 8 (b) The structure of R-tree has to changes drastically due to the movement of the objects

The Moving Objects Spatio-temporal Model (MOST) In the MOST, [Sistla et al., 1997]: • A database attribute that is continuously changing is considered a dynamic attribute • That requires less updates • linear change is assumed. • MO indexing schemes index objects in the projections, in the d-dimensional space, or in a transformed space. • We use projected trajectories in our computations • How good is the linearity assumption? Y O1 X X tnow T (Time)

The Tradeoff BetweenPersonalization and Privacy • Personalization involves • collection of profile and location information which raises privacy concerns • Studies by Chellappa et al., Harn et al., and Spiekermann et al. show: • consumers do not opt-in to online services when they do not trust merchants for their profiles. • consumers are willing to tradeoff their information with trusted vendors for convenience. • Even among the privacy-concerned consumers.

The Tradeoff Between Personalization and Privacy (Cont’d) Analysis of over 120 surveys show [Westin, 2002]: • Change in the attitude of 3/4 of American consumers towards privacy from a modest to a high intensity matter • Three segments of consumers: fundamentalists, unconcerned, and pragmatic. • The size of the pragmatic group is 125 Million. • The challenge to businesses is to address the needs of this group • Provide convenience • Protect Privacy

One Trusted Third Party A Proposed Solution: The consumer to trust only one third-party The Problem: The consumer has to trust too many merchants for her profile It stands to reason then that the LS assumes that role

Basic Approaches to Privacy Protection • Device-based Approach (e.g., [Schilit et al., 2003]) • Advantage: consumer does not have to trust anyone • Limitations • Consumer receives all the messages. Who is being charged for transmission? • Too much load on the network. • Requires powerful devices. • Trusted Third Party • Anonymity Approaches [The Anonymizer Project] • Do not support identity-based analysis (e.g., purchase history) • Consumers still have to trust the anonymizer. • Our Proposed Approach (Access Control with Controlled Information Flow)

The Environment: The Players The player in this environment are: • The LS: maintains consumer information, enforce their privacy policy, and provides answers to queries • Information Requester: a merchant or a marketing intermediary • Location Information Providers: e.g., the Wireless Networks • Information Owners : the consumers

The Research ProblemPolicy Requirements • Preventing unauthorized sharing of consumer information among information requesters • Consider the spam problem • Preventing misuse of permitted access to consumer information • If access policies are based on merchant identity, merchants can violate consumer preferences in terms of time and location. • Access policies need to have spatio-temporal constraints

The Research ProblemUser Interface Requirements Consumers need a user-friendly approach to defining policy rules: • Access rules should be defined at different granularities. • However, such representation will create granularity conflict • Example: • R1: (Hilton, c1_info, read, -) <EssexCounty, all_time> • R2: (Hotels, c1_info, read, +) <NJ, week_days>

The Research ProblemPolicy Enforcement Requirements Two Capabilities are required: • Addressing the impact of consumer motion and its interaction with the spatio-temporal constraints. • Spatio-temporal conflict: The location query may intersect with the spatio-temporal constraints of more than one access rule, e.g., • During the time interval of the query the customer will pass by two locations (Hudson County and NYC) which she has different permissions for. • Translating between geospatial coordinates, as expressed in the MOD, and civil names, as expressed in the constraints, e.g., • MOD: Current Location of Customer C1 (74. 32145, 40.75321) • Access Rule: (Hotels, No Access) <New York City, All times>

The Research ProblemScalability and Efficiency Requirements • The system has to accommodate for growth in the number of consumers and merchants • yet • Not adversely impacting the overall performance of the query processing.

Summary of The Challenges • How to prevent the illegal sharing of consumer information? • How to efficiently resolve spatio-temporal and granularity conflicts? • How to efficiently compute the interaction among the spatio-temporal constraints and the location information? • How to translating between geospatial coordinates and civil names?

Part I: A Solution with Focus on Efficiency

Overview of the Proposed Solutions • 1. Control information flow to merchants • 2. Develop an access control model that allows: • Specification of spatio-temporal policies • Example: merchant Hilton has access to my information when I am outside New Jersey on Weekdays. • Representation of merchants, location, and time at different levels of granularity. • 3. Efficient enforcement of access control • Turn the problem into a string search problem

1. Controlling Information Flow Solution • Merchants send information related to a specific offer along with query to the LS; • The LS runs the query producing a list of consumers’ IDs who satisfy the merchant criteria; • The LS enforces the access control which filters the IDs; • The filtered IDs are then forwarded with the advertisement to the wireless networks to deliver them to the consumers; • The wireless network sends the offers to the consumer devices and reports to the LS; then • The LS sends pseudonyms to the merchants.

2. The Proposed Access Control Model • An access rule consists of an authorization triple and a constraint: (s, o, +/-), <stc> Where • s S is a subject, i.e., a merchants at some granularity. • o O is an object, i.e, a consumer ID +  {l,p}, where l, p is location and profile information. • +/- is a flag ,i.e, grant/deny. • stc is a spatio-temporal constraint consisting of a civil location and a time interval. • Spatial and temporal constraints are generalized to stc • The only Access Mode is “read”. • no need to represent it in the model. • Generic access rule (s, [ID+{l+p}], +/-), <stc>

2.1 Model Components Representation • All Components are represented as hierarchies (except the ID and the flag) • These hierarchies hold several properties: • In every level in a hierarchy, the nodes are exact decomposition of their parents (i.e., the parent is the union of children and the children are disjoint). Thus: • the root always represents “All Members” • the leaves are the members at their most specific representation. • No multiple inheritance. Subject Hierarchy

2.2 Order of Hierarchies and Precedence • We adopt the following order ID  Object  Subject  Location  Time  Flag • The order among hierarchies implies precedence • Precedence has no impact on the model behavior as long as the same order is followed in the specification and evaluation of access rules • However, it has impact on the notion of relative specificity, as we will see later

2.3 The System State • The system state includes partial instantiations of the hierarchies. • Each instance of a hierarchy includes only the nodes that have access rules defined on them. • The instances belonging to the same consumer can be seen as a tree

2.4 Conflict Resolution • For spatio-temporal conflict  denial precedes grants • I.e., being conservative. • For Granularity Conflict  Inheritance with Overriding • Nodes not in the instance are • Assumed to virtually exist, and • Inherit permissions from the next existing ancestor • Nodes in the instance • More specific rules override less specific ones • The semantics must be conveyed to the consumer

2.4.1 Relative SpecificityAmong Rules For two rules R1 and R2 for the same consumer, R1 is more specific than R2 if: • R1 has a more specific object than R2; i.e., R2 has {l+p} and R1 has l or p; • R1 and R2 have the same object AND R1 has a more specific subject; • R1 and R2 have the same object and subject AND R1 has a more specific location; or • R1 and R2 have the same object, subject, and location AND R1 has a more specific time.

2.5 Advantages of the Model • Support for more efficient search. • Overriding motivates that the search starts from the most-specific representation. • We exploit that by adaptively searching for the most specific rule that matches some search key. • The system size is kept small • Since instances are partial instantiations. • Component representation is granular. • This streamlines the user interface, • Provides support for aggregate queries.

3. Evaluation of Access Control • Evaluation involves: • Compose search keys • Match them against the access rules. • Each consumer in the query result can generate multiple search keys • based on the intersection between the query and the consumer’s motion line. • Granular representation is another source of search keys. • Definition: A spatio-temporal windowis a combination of a time leaf and a location leaf.

3.1 The Evaluation Procedure The evaluation proceeds as follows*: • For each consumer and for each spatio-temporal window that the consumer passes through, a search key is created. • For each of the created keys, an adaptive search operation is performed and a flag is retrieved. • The flags that belong to the same consumer are combined using the ‘denial precedes grants’ rule. * Check [YAA05] for detailed computations

3.2 Components of the Evaluation and Enforcement Mechanism • A spatio-temporal module: • Provides computations for interaction between moving objects and consumer location information. • Translates geospatial coordinates to civil names. • Built on top of Oracle Spatial using Oracle Pre-compiler (Pro*C/C++ and PL/SQL) • An encoder • Encodes both access rules and search keys into equal-length alphabetical strings. • The ASM-trie (the Adaptive Search Multiway-trie) • Performs the adaptive search on specially encoded strings.

3.2.1 The Encoder • In the access rule (search key), each hierarchy substring is drawn from a table that encodes that hierarchy. • Depending on the max cardinality of children, one or more letters are used for each level, e.g., one letter for region, and 2 letters for state. • There is no one-to-one relationship between nodes in an access rule and the nodes in the ASM-Trie. • Adaptive search is not just back tracking

The Encoder’s Support for the Adaptive Search • Letter ‘a’ is used as padding to give equal length to all substrings: • This way, it also represents the ‘parent’ node in the access rule. • Letter ‘a’ is never used in encoding a child. • The ID substring is encoded in uppercase to indicate that adaptive search is not supported.

3.2.2 The ASM-trie • The ASM-trie is a main memory structure that supports adaptive search. • In the ASM-trie, the node includes • 27 pointers-to-node to represent the alphabet and a null character • Letters are implied by their order (radix) (e.g., 0=null, 1=a, 2=b, …) • A pointer to its parent for adaptive search, • A pointer to the previous-letter for backward traversal, and • A Boolean variable to indicate whether adaptive search is supported in this level. • For the Insert and Search algorithms, check [YAA05]

Performance Evaluation Study • ASM-trie vs. main memory trie with linear scan vs. Oracle linear scan. • Machine: Xeon 2.4 GHz with 2 GB RAM. • 100 search key sets and 30 data replicas • The ASM-trie had a constant search time, around 32000 keys/Sec. • The ASM-trie exhibited linear space utilization around 1200 access rules per MB. • The difference between the ASM-trie and the regular trie can be attributed to the adaptive search. ASM-trie Main Memory Linear scan Oracle Linear scan

Comments on the Design The choice of a memory resident approach. • The limit on main memory size should not affect the scalability of the LS for several reasons: • The LS is implemented as a distributed system where every node is responsible for a specific service area • 64-bit processors becoming a commonplace. • New directions in implementing large-scale services, e.g., Google: • rely on multiple cheap servers • all the data is indexed in the memory. • This year, a first conference on data management on new hardware

Summary (Part I) Contribution: • An access control model for moving objects and consumer profiles that supports granular representation. • An efficient enforcement mechanism that utilizes a new data structure, the ASM-trie. • A design of information flow that prevents merchants from sharing consumer information. Future work • Disk-based ASM-trie

Part II: A Solution with Focus on Expressiveness

Another Criterion for Privacy • Why do customers accept receiving advertisement? • Convenience of timely and location-based offers • Related to their interests. • Have incentives. • The current privacy policy considers 1 and 2, but not 3. • Can we add incentives to the privacy criteria? • Yes, but this type of domains is difficult to model with some data structures like the hierarchies in Part I. • In general, it is difficult to model exceptions in such hierarchies. • Consider NYC (a city that is composed of five counties). It violates the hierarchy’s structural properties.

KR Techniques • Knowledge Representation (KR) Techniques • Modeling approaches based on KR techniques are more expressive • KR techniques can be broadly classified into • logic-based and non logic-based. • Description Logics (DLs) is a class of logic-based KR that has been used recently as a basis for designing the Ontology Web Language (OWL) We propose a solution based on modeling incentives and the other preference as an ontology and enforcing these preferences using DL reasoning techniques.

Overview • 1. A brief overview of DLs • 2. Preferences Ontology • 3. Query Processing • 4. Implementation • 5. Performance Evaluations Study

1. DLs – A Brief Overview • The basic building block of KR in DLs is • The concept -- defined as a set of individuals • Concepts and the IS-A relationship are used to build hierarchical terminologies (taxonomies). • Terminologies are the intensional knowledge • The extensional knowledge comes from assertions about individuals • In addition to the IS-A, DLs can represent other types of relationships • “roles”

1.1 A Minimal DL language and its Interpretation DLs have well-defined model-theoretic interpretation The following is the interpretation of the AL language.  = (the universal concept, thing) = (bottom concept, nothing) (A) = \ A (atomic negation) (CD) =CD(conjunction) (CD) =CD(disjunction) (R.C)= { | . (, ) RC} (Value restriction) (R.)= { | . (, ) R}(limited value existential quanti.)

1.2 SHIQ(D) and OWL • SHIQ(D) is equivalent to AL plus full concept negation, transitive roles, qualified cardinality restrictions, role hierarchies, inverse roles, and datatypes. • SHIQ(D) has a good balance between expressiveness and computational efficiency (computability and decidability) • SHIQ(D) is almost equivalent to OWL • For an excellent reference on DLs, check the DLHB.

1.3 Important Features of DLs • Two types of terminology axioms: inclusion (e.g., ) and equality (e.g., ). • A definition is an equality with atomic left side. • A finite set of definitions T is called terminology or TBox. • A finite set of assertions about individuals is called ABox. • The Open-world semantics • The unique name assumption

1.4 Reasoning in DLs Assuming a knowledge base K, concepts C and D, and an individual a: • TBox reasoning includes: • Classsubsumption queries: determine if C is a subclass of D with respect to K. • Classhierarchy queries: given a class C, return all or the most-specific (most-general) superclasses (subclasses) of C in K. • Classsatisfiability queries: given a class C, determine if C is satisfiable (consistent) with respect to K. • ABox reasoning includes • Ground: determine whether a given individual a is an instance of C. • Open: determine all the individuals in K that are instances of C. • All-classes: given an individual a, determine all the classes in K that have element a.

2. Preferences Ontology • The ontology includes six taxonomies: • IncentiveType, • IncentiveValue, • Location, • Time, • Products, • Merchants. • Both Consumer preferences and merchant queries are • Subsumed by a class called CPP (Consumer Privacy Preferences).

ModelingPromotions Promotion Techniques: • Price reduction, • Happy hour (i.e., price reduction for a short time), • No payment for a specific period, • Payments on installments, • More items for free, • Bundle (which could be homogeneous — reduced price for second item, or heterogeneous —another product at a reduced price), • Premium (i.e., a free non-related product or service, e.g., free miles), • Prize, • Contest (i.e., based on a skill), • Sweepstakes (i.e., based on chance), and • Rebates or refund (i.e., cash refund, coupon refund, or escalating refund). • We analyzed these techniques and found that a promotion includes: • Incentive type  IncentiveType Taxonomy • incentive value  IncentiveValue Taxonomy • Conditions  Property Restrictions on the IncentiveType Taxonomy

2.1 Incentive Type Taxonomy IncentiveType T Monetary IncentiveType Coupon IncentiveType TimeSlack IncentiveType ExtraItems IncentiveType PayOnInstallments IncentiveType InstantRefund Monetary DelayedRefund Monetary • For each of these subclasses, an object property is defined. • Promotion conditions are expressed as property restrictions on the class IncentiveType and its subclasses • Example: product condition • Property: hasProduct • Range: Products_Services • Restriction: allValuesFrom AllProduct and with cardinality = 1.

2.2 Incentive Value Taxonomy • The IncentiveValue taxonomy includes five subclasses: • PercentageReduction, • ScalarReduction, • Price, • TimeSlack, and • NumberOfInstallments • The taxonomy also includes five datatype properties hasPercentageValue, hasScalarValue, etc. • The range for these properties is the XML integer data type. • Example: 20 IncentiveValue.hasPercentageValue

2.3 Location Taxonomy • The main class in the location taxonomy is AllLocations where its semantics is the set of all cities. • since we are using class subsumption for reasoning, we represented cities as primitive classes instead of individuals Example

2.4 Products and Services Taxonomy • Used the United Nations Standard Products and Services Code (UNSPSC) . • UNSPSC provides five levels taxonomy: Segment, Family, Class, Commodity, and Business Function. • Imported from XML to OWL.

2.5 Time Taxonomy • The main class in the time taxonomy is AllTimes where its semantics is the set of all hours (in one year) • You can express things like Labor Day even though it does not have a specific date.

Semantically Enhanced and Efficient Enforcement of Mobile Consumer’s Privacy Preferences