1 / 15

Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques

Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques. George Boulougaris, Kostas Kolomvatsos , Stathes Hadjiefthymiades. Pervasive Computing Research Group, Department of Informatics and Telecommunications University of Athens, Greece. WCCI – IJNN 2010

Download Presentation

Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades Pervasive Computing Research Group, Department of Informatics and Telecommunications University of Athens, Greece WCCI – IJNN 2010 Barcelona - Spain

  2. Outline • Introduction • Market Members • Scenario • Buyer Q-Table • Buyer Purchase Behavior • Results

  3. Introduction • Intelligent Agents • Autonomous software components • Represent Users • Learn from their owners • Electronic Markets • Places where entities not known in advance can negotiate over the exchange of products • Reinforcement Learning • General framework for sequential decision making • Leads to the maximum long-term reward at every state of the world

  4. Market Members • Buyers • Sellers • Middle entities (matchmakers, brokers, market entities) Intelligent agents may represent each of these entities • Entities do not have any information about the rest in the market

  5. Scenario (1/2) • Buyers: • could interact with sellers • could interact with brokers or matchmakers (matchmakers cannot sell products) • want to buy the most appropriate product in the most profitable price • We focus on the interaction between buyers and selling entities (sellers or brokers) • Most of the research efforts focus only on the reputation of entities • We utilize Q-Learning that is appropriate to result actions that lead to the maximum long-term reward (based on a number of parameters) at every state of the world

  6. Scenario (2/2) • The products parameters for each selling entity are: • ID • Time validity • Price • Time availability • Relevance • Each selling entity represents the state that the buyer is

  7. Buyer Q-Table (1/3) • The buyer has one Q-Table for each product • Rows represent states and columns represent actions • There are M+1 columns (M is the number of selling entities) • Actions [1..M] represent the transition to the [1..M] entity (row of the Q-Table) • The transition to another entity corresponds to a ‘not-buy-from-this-entity’ action • Action M+1 represent the purchase action (from the specific entity) • The buyer final Q-Table is a 3D table

  8. Buyer Q-Table (2/3) • The buyer takes into consideration the following information in order to build the Q-Table: • Relevancy factor • Price • Response time • Number of transitions • The equation used is: where l is the learning rate, r is the reward, γ is the future reward discount factor, st and at is the state and the action at the time t

  9. Buyer Q-Table (3/3) • Issues concerning the reward: • has 5% decrement when deal with entities not having the product • is based on: • the reward for the relevancy • the reward for the price • the reward for the response time • the reward for the required transitions • the greater the relevancy is the greater the reward becomes • the smaller the price is the greater the reward becomes • the smaller the response time is the greater the reward becomes • the smaller the number of transitions is the greater the reward becomes

  10. Buyer Purchase Behavior • The buyer is based on the Q-Table for the purchase action • There are two phases in its behavior • First Phase • It creates the Q-Table • It uses a specific number of episodes in the training phase • Second Phase • It utilizes the Q-Table for its purchases • At first randomly selects an entity (row) for a specific product • Accordingly selects the action with the highest reward • If the best action is to return to a previous visited entity with inability to deliver, the purchase is not feasible

  11. Results (1/4) • We consider a dynamic market where the number and the characteristics of entities is not static • In our experiments we take into consideration the following probabilities: • 2% that a new product is available in an entity • 5% that a product is totally new in the market • 5% that a product is no longer available in an entity • 2% that an entity is totally new in the market • 1% that an entity is not able anymore for negotiations • We examine the purchases of 400 products in each experiment

  12. Results (2/4) • Tables creation time results

  13. Results (3/4) • Q-Learning reduces the required purchase steps

  14. Results (4/4) • Q-Learning reduces the average price and the average response time as the number of entities increases • Q-Learning does not affect basic parameters as the number of products increases

  15. Thank you http://p-comp.di.uoa.gr

More Related