JuKeCB

JuKeCB Justin Karneeb

What is JuKeCB? • A Case Based Reasoning system developed for use in DOM, a Domination style game • A Research Project which has been under development over the past three years by: • Justin Karneeb • Kellen Gillespie • Stephan Lee-Urban • Professor Munoz Avila

How about some more detail • JuKeCB is a CBR system that learns stochastic policies by observation • Stochastic Policy: A non deterministic “strategy” • Imitates winning strategies that have been observed in the past to win future battles • Continues to learn as it observes more games

A short aside: DOM Game • In order to understand JuKeCB, you first need to understand the game it is playing. • SCREEN SHOT OF DOM

DOM: The Rules • DOM is a Domination style game • Team based gameplay • Scoring based on holding or “dominating” key points on the map • Easily visible abstract strategies • Basic Strategy • Capture enemy Dom points • Defend owned Dom points • Own more Dom points than opponent

DOM: Winning • Score is updated every five game turns • Each team is awarded 1*NumberDomPointsHeld points • Two possible game modes • Score Limit: Game ends when one team’s score exceeds X points • Turn Limit: Game ends when X number of turns have passed

DOM: Meet the easy teams! • DomOneHuggerTeam • All Bots go to domination Point 1 • FirstHalfOfDomPointsTeam • Evenly distribute all bots to go to the first half of domination points • SecondHalfOfDomPointsTeam • Evenly distribute all bots to go to the second half of domination Points

DOM: Meet the tough teams… • EachBotToOneDomTeam • Send all bots to a different domination point • GreedyDistanceTeam • Send all bots to their closest unowned domination point • Smart OpportunisticTeam • Sends each bot to a different unowned domination point

Questions about DOM Game? • DOM PICTURE

Meanwhile… Back at JuKeCB • What was all that stochastic policy nonsense you were talking about? • Each case in JuKeCB stores two stochastic policies • WinningStrategy • LosingStrategy • JuKeCB can employ a winning strategy against similar losing strategies • Example: SecondHalf beats Dom1Hugger • Why use stochastic policies at all? • Why not plans/single actions/scripts ect

So what makes a good Policy? • Feature selection can be very difficult • It took us almost a full semester to get a set of features that seemed to work • Each feature must supply information about the strategy or game state • Do not include unnecessary information • Each feature must be reproducible • Able to reproduce similar results when ran • As a whole, features must completely identify a strategy and the game state (ideally)

Brainstorm! • SCREENSHOT OF DOM GAME

Here’s what we came up with • All features based on a timeframe or window • Domination Point Destinations • Probability each bot went to specific dom points • Unowned • Probability each bot went to an unowned point • Closest • Probability each bot went to its closest point • Score Difference • Difference in scores during the time window • Is that enough?

Features Failure • The DomOneHugger issue • Who owns those other points! • Not enough information on game state • Makes us think the strategy is similar when in fact it is not • Needed more features! • Domination Point Held Ratios • The probability that Team0 held a given dom point • Still not perfect, good enough

The Case Base Cycle • Retain • Observe game state • Store in case base • Retrieve • Observe game state • Forward similar case to JuKeCBTeam for reuse • Reuse • Enact strategy found in case

Observation: Retain • JuKeCB does all of its learning by observation • Game Window Monitoring • Most features built over the course of the window • DomPointDest • DomHeldRatios • Unowned/Closest • Some features are created at the very end • DeltaScore • Some are static for a game • Num Domination points • Num Bots per team • Dom Point Distances

Retain Continued • Once the window ends, the case is created

Case Retrieval • JuKeCB uses a three-stage retrieval process to reduce search time • Stage One • Runs only at game start • Remove all cases that do not pertain to the map • Stage Two • Runs at every retrieve update • Get all cases that are 95% similar or higher • Stage Three • Runs after stage two • Gets the case with the highest delta score

Stage One • Left out some features earlier… • Number of Domination points • Number of Bots per team • Distance between Domination points • These features only pertain to Stage One similarity • Temporarily remove all cases that • Have a different number of dom points • Have a different number of bots per team • Do not have a similar set of dom point distances

Stage Two • Responsible for finding cases pertinent to the situation • Computes similarity between the enemies current strategy with all losing strategies • If no case is more than 60% similar, run randomly • If no case is more than 95% similar, return most similar case • Otherwise, return all cases more than 95% similar

Stage Two Similarity • Compares only the following features • Dom point destinations (per bot) • ToUnowned (per bot) • ToClosest (per bot) • Dom Held Ratios • All features are real numbers • Similarity formula: • Weight*( 1-|(V1-V2)/(VMax-VMin)| ); • VMax and VMin are 1 and 0 for all features

Stage Two Similarity Cont • Weights were something we tweaked for a long time • Currently, they are as follows • Destinations: 40% (.4/numdompoints/numbots) • Dom Point Ratios: 20% (.2*numdompoints) • ToUnowned: 20% (.2/numbots) • ToClosest: 20% (.2/numbots) • Could probably still be further refined

Stage Three • Only run if stage two returns more than one case • Looks at the DeltaScoreWinning feature of each returned case • Return case with highest score • This was to combat a perfect similarity beating a very similar case with superior results

Reuse: JuKeCB Team • JuKeCBTeamrecieves as input a strategy • Not a full case, just the winning policy of the case • Uses a random number generator to try to follow the given distribution as best as possible • Randomly roll numbers for each feature and act accordingly • Rank each destination by how many criteria it meets • If tied, choose one at random

Reuse: Example • Bot0: • To Dom0: 20% -- Is owned • To Dom1: 20% -- Is unowned • To Dom2: 60% -- Is owned • To Unowned: 80% • To Closest: 10% • Dom Roll (0-100): 68 • Unowned Roll (0-100): 27 • Closest Roll (0-100): 92 • Dom0 Score = 0 • Dom1 score = 1 • Dom2 score = 1

Maintanence • At the end of every game, JuKeCB compiles the list of all recently created cases • Attempts to add them to the case base • If no case is 95% similar, add it • If a case is 95% similar and the new case has a higher delta score, swap them • On demand, run a full check • Over time, swapping cases can cause redundant cases in the case base. Running a full check can be very time intensive

Performance • Able to beat all ‘easy’ teams with ease • DomOneHugger • FirstHalfDomPoints • SecondHalfDomPoints • Able to win or be competitive against ‘hard’ teams • EachBotToOneDom • GreedyDistance • SmartOpportunistic

Performance • The following results were run on this map • IMAGE OF MAP

Untrained GreedyDistance

Trained GreedyDistance

Untrained DynamicTeam

Trained DynamicTeam

Problems • Retrieval can take a long time • Num Cases: 129, Average Time Taken: 221ms • Num Cases: 258, Average Time Taken: 721ms • Num Cases: 516, Average Time Taken: 3,063ms • Num Cases: 1032, Average Time Taken: 12,231ms • Cant beat SmartOpportunistic • We lack the features to properly define its strategy • No ‘defend’ features • All cases appear to be random

Additional Work • Parallelizing retrieval • Direct speed up by using more CPUs • Clustering the Case Base • Greedy clustering • Similarity clustering • Using Asynchronous retrieval • Hide the delay

Clustering JuKeCB • Speed up retrieval by dividing up the parts of the case base needed for any given retrieval • Greedy Clustering • Create new ‘clusters’ depending on the greedy policy • 010: Bot0-Dom0, Bot1-Dom1, Bot2-Dom0 • This clustering scheme got us very poor results • Too much data loss

Clustering JuKeCB • Similarity Clustering • Each cluster gets a representative case • New cases are added to a cluster if the similarity is over a certain threshold • New clusters are created if no similar clusters found • Quite good results • Moderate speedup (sorry, cant find numbers!) • Only slight performance drop

Parallelizing JuKeCB • Divide up the Case Base into X number of chunks • X = number of processors on the machine • Have each processor run stage two on its own chunk • Run stage three on the results of all chunks • Speedup was almost optimal • OldRetrievalTime/NumProcessors

Asynchronous Retrieval • Hide retrieval delay by hiding it in a new thread • Works only if the game is running at ‘human playable speed’ • Gets near identical results to normal system with no visible delay • Similar to parallelizing, sacrifice slight speed up gain for better responsiveness

Possible additional work • Combining all the previous methods into one ultra fast case based reasoning machine! • A clustered case based whose retrieval was done asynchronously in parallel • Optimising JuKeCB, JuKeCBTeam, and DOM Game • Some things were coded somewhat sloppily and could easily be improved– such as the reuse phase • Adding more features • Like we discussed earlier, we do not have enough features to properly define some strategies

Closing/Questions • Overall JuKeCB was a great system for me to work on • It gave me substantial knowledge in the CBR field • A paper: Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse was published and presented at the ICCBR 2010 conference • ….I swear I did not name the system….

JuKeCB

JuKeCB

Presentation Transcript