html5-img
1 / 53

Online Mechanisms

Online Mechanisms. Seminar on game theory, David C Parkes. Presentation by Alon Baram. Talk Layout. Review on mechanisms. Online mechanism definitions. Single value domains. Review – Mechanism design. A Mechanism for n players is given by: Players types spaces Players action spaces

jesse
Download Presentation

Online Mechanisms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Mechanisms Seminar on game theory, David C Parkes. Presentation by Alon Baram.

  2. Talk Layout • Review on mechanisms. • Online mechanism definitions. • Single value domains.

  3. Review – Mechanism design • A Mechanism for n players is given by: • Players types spaces • Players action spaces • Alternatives set A • Valuations • Outcome functions • Payment functions • Linear utility : valuation for player – price of player • Social choice function: • Strategy • Mechanism implements social choice function

  4. Review – Mechanism design(2) • In dominant strategies if for some dominant strategy equilibrium for all types of players the function chooses the same outcome as the strategy, then it is implemented by the mechanism. • Mechanism is incentive compatible (truthful), if telling the truth about it’s type is dominant strategy. • Revelation principle – any mechanism with dominant strategies profile can be converted to be truthful. • Individual rationalism – if at equilibrium all agents have non-negative utility.

  5. Ice Cream Stand • Moshe has an ice cream stand. He is very lazy and makes one cone per hour. • If no one buys the cone within that hour the ice cream melts, and he is very sad. • Buyers come and stand in the line. Upon arrive they declare how many hours they are willing to wait, and how much the cone is worth for them.

  6. Ice Cream Stand – Cont. • Every buyer holds inner information • The true time he is willing to wait for the cone • The true value of the cone to him. • Thus a player type is a triplet defined by: (Available to stand in line time, has to leave time, cone value) • This triplet is declared (but not necessarily the real one) to Moshe once the buyer arrives. • Moshe decides who he wants to give this hour’s cone to.

  7. Ice Cream Stand - example • There are 3 buyers, With types: (9:00,11:00,100),(9:00,11:00,80),(10:00,11:00,60). • Lets say Moshe uses the Vickery generalization for his decision. Every hour he sells to the highest bidder in the snd’ highest price. • If every bidder is truthful: • Buyer 1 will get his cone for 80 in the first hour. • Buyer 2 will get his cone for 60 in the second hour. • But what if buyer one chooses to lie?

  8. Ice Cream Stand – Manipulation • Buyer 1 may declare (9:00,11:00,61): • The cone in the first hour is won by buyer 2 for 61 • The cone in the second hour is won by buyer 1 for 60 instead of 80. • Buyer 1 may come to the line at hour 10:00 instead of 9:00. • Buyer 2 wins the first cone for 0. • Buyer 1 wins the second cone for 60. • The Vickery auction in online setting is untruthful because buyers may choose the auction to participate in.

  9. Dynamic auction with expiring items • Formal model for our example. • Discrete time periods : • N agent types denoted • a is the arrival time. • d is the departure time. • w is the value for allocation of the single item. • The value is for allocating one unit during • Payment p is collected from the agent. • Utility is quasi-linear

  10. Online Mechanisms motivation • Model the notion of dynamic environments: • Selling seats on airplanes for buyers arriving over time. • Allocating computational resources for jobs arriving over time. • Selling adverts on search engine to changing groups of buyers with uncertain future supply. (Adwords) • Allocating tasks to dynamically changing team of agents.

  11. Online Mechanisms challenges • Agents may arrive or depart at any time. • Uncertainty about feasible actions in the future. • The types of agents who did not arrive yet are unknown. • Agent can lie about their arrival and departure time as well as the valuation. • We may restrict the type of lies an agent is capable of. • Example: An agent may not report an arrival time earlier than the one he actually arrives.

  12. Online Mechanisms formal model • Discrete time periods set • Set of feasible outcomes Where is the possible outcomes at time t. • Sequence of decisions Where is the decision at time t. • Agent i type denoted • Valuation function • Quasi linear utility function • Arrival period is the first time the agent may report its type. • Valuation component may depend on choices and time.

  13. Direct Revelation Mechanisms • An agent may send one message to the mechanism regarding it’s type. • The agent gets no information prior to this reporting. • For the ice cream example: • Buyers reported their values and departure time upon arrival. • Buyers didn’t know about the other buyers before arriving.

  14. Direct Revelation Mechanisms - Formally • Mechanism state Captures all information relevant for time t decision. • Allow stochastic events • The state is • State space finite, countable or continues. • Feasible decisions at time t.

  15. Direct Revelation Mechanisms – Formally 2 • Mechanism • Single claim about type. • Decision policy , • Payment policy , for every active agent. • Decision policy may be stochastic. • Payment policy may collect over several periods.

  16. Ice cream revisited • The ice cream is a direct revelation mechanism. • The state is current active agents list. • The policy is allocate to highest active unallocated bidder. • The payment policy is 2nd active bidder.

  17. Limited Misreports • Scenarios where agents have restrictions on the possible lies they can make. • Formally , for is the set of available misreports for agent with type • No early arrival misreports – agent cannot report arrival time before they actually arrive. • No late departures – agent cannot report departure time that is after the one they leave on. • In the ice cream, an agent couldn’t misreport early arrival, because he wasn’t there.

  18. Truthful online mechanisms • An online mechanism is truthful if: • For each agent i. • Given a known set of misreports for i. • For every fixed choice of other players reported types, and every stochastic event, It occurs that - The utility of I while reporting it’s true type is greater (equal) to his utility while reporting any available lie. • While utility is : • The valuation of the agent while given his type and the policy decision on his reported type, and other agents types. • The appropriate payment for current state.

  19. Truthful online mechanisms – cont • Formally: • For a stochastic decision policy the expected utility of being truthful should be maximal, over all other reports and events. • For Bayes-Nash incentive compatibility • All agents know distribution of agent types, events. • The expected utility of truth-telling for all agent is larger then telling available lie, if other agents are truthful. • Weaker then previous definition.

  20. Online revelation principle • The revelation principle for offline MD, states that arbitrary mechanism implementing dominant strategy may be emulated by a truthful one. • In general setting it might not be true for online MD. • But it is true when we limit the misreports to no early arrivals and no late departures.

  21. Single value Domains • Introduction • Dynamic auction with expiring items • Adaptive limited supply auction

  22. Interesting groups • “Moshe city travel” sells equipment for travelling the city. • As we all know you can’t drive the city by car. • They sell the following items: • Bicycle. • Bicycle pumps. • Segway. • There are 4 types of agents: • (1,1,({Bicycle,Bicycle pumps},20)) • (2,2,({Bicycle,Bicycle pumps},15)) • (1,1,({Bicycle pumps},30)) • (1,2,({Segway},40))

  23. Interesting groups-cont • Assume Moshe is lazy works 2 hours a day… • Moshe today has to offer one of each item. • Say we have 4 buyers, one from each type. • Possible allocation outcome, each spot represents the buyers allocation: • In the eyes of agent 1 (bikes+pumps) we can define order:

  24. Single value domain • In a single value domain agents wants their preference to be at least some part of the decision, at any time they are present in the mechanism. • The value they get is either a constant when their request is satisfied, or 0 otherwise. • We can use the language of interesting sets, to define what is interesting for a player, and to find all choices that include this interest. • This is done by creating for every agent a set of L sets. Each set represent some subset of decisions. By defining a partial order on them we can check when the agent is satisfied.

  25. Single value domain formally • Let be the set of interesting sets for agent i. A subset of decisions. • Define partial order • Now is the value on interesting set. • formal definition for single value domain: • Lets assume the mechanism knows each type IS.

  26. Under this assumption define: • A partial order on types which sorts conflicting types by their value.

  27. Single value combinatorial auction • Multiple units of indivisible items. • Uncertain supply, no storage between periods. • Single value preferences, all allocations of interesting items for agent i. • Partial order • Agent I with type

  28. The store policy is to use vickery pricing, where each price is now for the smallest contained sub-set. Ie for {Bicycle,Bicycle pumps} , {Bicycle pumps} , the smallest contained subset is {Bicycle pumps} . After choosing some policy, we can calculate the Critical value for an agent. Which is while fixing other agents, the minimum value he has to have in order to win his interest at some point. In our example: Agent 1 needs to beat agent 3 in first round so his critical is 30+epsilon. Agent 2 cannot win so his critical value is inf. Agent 3 has to beat agent 1 so his critical value is 20+epsilon. Agent 4 wins anyway so his critical value is 0. Revisit last example (1,1,({Bicycle,Bicycle pumps},20)) (2,2,({Bicycle,Bicycle pumps},15)) (1,1,({Bicycle pumps},30)) (1,2,({Segway},40)) Critical value

  29. Critical value - formally • Where means that the agent was allocated at some period.

  30. Monotonic policy • A deterministic monotonic policy is one that • Fix agent i • for every choice of types for all other agents, we can replace agent I type with a “bigger” one (ie value is larger, including the timeframe) • If I was allocated so will the new one be. • Formally: • The strictness is to ensure the value is higher. • The previous policy is monotonic.( I think)

  31. Non monotonic example • Types • 1=(1,1,(10,{bike})), 2=(1,2,(15,{bike})) • 2 agents, one bike. • In this policy we have 2 bigger then 1 but in our example 2 isn’t allocated when faced with another 2.

  32. Lemma • Meaning – critical value is determined by the other agents and the time interval. • Proof : fix other players and events. Assume • For 2 types of I critical_value(theta’)< critical_value(theta) • But theta’<=theta Replace values to critical_value(theta’) But still theta’<=theta

  33. Proof-cont • But critical_value(theta’)<critical_value(theta) so theta isn’t allocated and theta’ is, contradiction to monoticity.

  34. Truthfulness in single value domains • It is possible to implement truthfully any monotonic deterministic policy, given no early arrivals or late departures misreports. • This is done by charging a departing agent on the time of departure his critical value if he was allocated. • Formally :

  35. Truthfulness in single value domains • Proof: fix other agent types, events, and fix agent i type • Case a, agent allocated. • Any legal misreport will cause, • Either limiting the range more thus causing a type that needs larger critical value. (by previous lemma). • Just increasing r(i). • In any case r(i) will have to be increased causing the agent to lose utility. • Case b, agent is not allocated. • Critical value is larger than the agent value. • Any type he report must be bigger (otherwise it will be smaller), thus increasing the value, causing negative utility.

  36. Necessary conditions for truthfulness • A mechanism satisfies individual rationality when every agent has non-negative utility in equilibrium. • We now examine the necessary conditions for truthfulness. • Reasonable misreporting – an agent can at least lie about later arrival time, earlier departure time and any value.

  37. Necessary conditions for truthfulness • Proof: fix other players and events. Assume • theta <= theta’ • theta is allocated, theta’ isn’t. • r(i)>vc(theta) • => agent theta has strictly positive utility.

  38. Necessary conditions for truthfulness • Agent theta’ which is not allocated has weakly negative utility. (he might be charged) • Agent theta’ should lie and report type theta and will have profit. • Thus the mechanism is untruthful.

  39. Dynamic auction with expiring items • Examples: • Ice cream stand. • Time on shared computer. • Network resources. • Model Assumptions: • No early arrivals. • No late departures. • Can be justified by withholding the item/ result until departure.

  40. Competitive analysis • Use competitive analysis adversary model. • Competitive – how good is our algorithm verses the optimal offline algorithm with full information. • Optimality criterion – value of best possible offline allocation. • Adversary – chooses the worst input type he can find. • Has a model indicating it’s power to select bad input.

  41. Competitive analysis formally • y_i – bid I was allocated (0,1) • X_it – when bid was allocated. • Define c-competitive • Z – set of available inputs • c>=1

  42. Ice Cream Stand - Reloaded • There are 3 buyers, With types: (9:00,11:00,100),(9:00,11:00,80),(10:00,11:00,60). • Moshe complained that the customers lie all the time and asked agent Smith’s help to better choose his policy. • Smith suggested that he should use the critical value in order to decide payments for his customers. • Also ties between highest bidders should be broken randomly. • Sells for agent 1 for 60 in round one • Sells for agent 2 for 60 in round two

  43. The auction formally

  44. Truthfulness and 2 competitive • In our setting the auction is truthful and 2-competitive. • Proof: • For truthful, it is enough to see that the policy is monotonic. • If agent I won in some period, it will obviously win if he extend this period , and/or increase his value. • For competitiveness, look at each allocation of offline algorithm • If agent I was allocated with the offline but other in the online, charge the value onto the online agent. • If the same agent was allocated for both, charge him in the online. • Every online agent was charged at most twice and for a value that is at most his value. • Therefore the total value of the offline algorithm is at most twice that the online.

  45. Lower bounds

  46. The secretary problem • There are N job applicants. • Each has a rank. • While interviewing the rank of the current applicant is learnt relative to the others who were interviewed. • The interviewer must decide in place whether to hire. • The adversary may choose the qualities but not the order. • The applicants are sampled uniformly. • The optimal policy is to interview t-1 applicants and hire the next one who is better.

  47. The secretary problem - cont • T is • As N goes to infinity: • The probability to hire the best goes to 1/e • So does the ratio t/N • Policy is sample N/e applicants and then accept the next one who is better than the ones interviewed.

  48. Adaptive limited supply auction • N agents • Single indivisible item. • No early arrivals misreports. • The differences between secretary and auction: • Bidders have entry and exit time. • Bidders are strategic – can misreport. • Adversary creates set of arrival departure, set of values and types are defined by randomly sampling (uniform without replacement) from both sets. • Revenue optimality criterion – compare total payments against offline Vickery auction(2nd price).

  49. Adaptive limited supply auction • The competitive ratio is: • The optimal policy is divided to : • Learning phase. • Accepting phase. • Naïve solution: • Observe [N/e] reports set p=max_value • Sell with p to the first agent to report equal or greater value.

  50. Adaptive limited supply auction • Consider the following example: • Six types : 1=(1,6,6),2=(3,7,2),3=(4,8,4),4=(6,7,8) and two more arriving later. • Transition to accepting phase after 2 bids. • Agent 4 wins in period 6 and pays 6. • If 1 reports (5,7,6) he wins in 5 for payment 4. • Naïve solution doesn’t work. But it would have worked if all agents were impatient.

More Related