1 / 53

E Pluribus Unum Matchmaking in Halo 3

E Pluribus UnumMatchmaking in Halo 3. Chris ButcherBungie Studiosbutcher@bungie.com Game Developers Conference 2008. Overview. What Is Matchmaking?Matchmaking BasicsLessons from Halo 2Halo 3 Design GoalsVoice, Identity, Community ReinforcementSkill Measurement and Reward SystemsTechnical DesignTrueSkillMatchmaking AlgorithmsRecommendationsResults from Halo 3 Live Operation.

owen
Download Presentation

E Pluribus Unum Matchmaking in Halo 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    2. E Pluribus Unum Matchmaking in Halo 3

    3. Overview What Is Matchmaking? Matchmaking Basics Lessons from Halo 2 Halo 3 Design Goals Voice, Identity, Community Reinforcement Skill Measurement and Reward Systems Technical Design TrueSkill Matchmaking Algorithms Recommendations Results from Halo 3 Live Operation

    4. What Is Matchmaking?

    5. Manual Game Browsing User is presented with a list of possible games Tries to find an open slot Tries to find a fair game Inconsistent experience Not good for casual gamers “I Just Want To Play!” Player is responsible for finding a slot themselves, which is tedious. Lots of people all trying to join the same games you are Can’t join up with friends Each game is the luck of the draw regarding difficulty of opponentsPlayer is responsible for finding a slot themselves, which is tedious. Lots of people all trying to join the same games you are Can’t join up with friends Each game is the luck of the draw regarding difficulty of opponents

    6. Terminology Manual game browsing is a standard technique Host Game / Join Game options in UI are common Xbox LIVE refers to this as the “Matchmaking” API Quick Match, Custom Match In this presentation, “matchmaking” means an automated peer-to-peer system that organizes players into groups based on user preference Game could still be client / server once the game starts Could even use dedicated servers

    7. Vision of Matchmaking Provide an experience that is: Fast Reliable Consistent

    8. Matchmaking Basics

    9. Matchmaking Ecosystem Continuous stream of groups entering matchmaking Some groups decide to start gathering a game The remainder search for games to join Each group can be multiple machines and players

    10. Xbox LIVE Matchmaking Service Gatherers register with XBL service Each group has unique matchmaking desires Type of game, skill level, spoken language, etc Searchers query service with parameter filters Service returns matching candidates Lots of parameters depending on your game. I’ll talk about these in detail later. Lots of parameters depending on your game. I’ll talk about these in detail later.

    11. Candidate Evaluation Searcher evaluates all candidates in parallel, best matches first Ping network connectivity, get current group state Measure quality of connection using Xbox QoS probes Group-to-group XNetConnect Group-to-group session join Network layer handles as asynchronous processes Join protocol has multiple phases, each of these is an asynchronous process. You need to architect your network layer so that they can execute in parallel to multiple targets. Lots of work here to make this seamless and robust.Join protocol has multiple phases, each of these is an asynchronous process. You need to architect your network layer so that they can execute in parallel to multiple targets. Lots of work here to make this seamless and robust.

    12. Matchmaking Life Cycle Groups enter Matchmaking continuously Each group chooses to gather or search Gather: register session with XBL service Search: query service for candidates Search, evaluate candidates, try to join If no suitable candidates, search again Halo specific game flow: Gatherer waits until game is full Determine game settings, host selection Start game First section: Matchmaking principles (true for any game) Second section: Halo specific usageFirst section: Matchmaking principles (true for any game) Second section: Halo specific usage

    13. Lessons from Halo 2

    14. Halo 2 has had good longevity Year-on-Year retention is > 80% Players seem to like Matchmaking Provides enjoyment for many thousands of games Players seem to like Matchmaking Provides enjoyment for many thousands of games

    15. Game is well suited to Matchmaking Small-group gameplay (2-5 per team) Interact with friends in your group Both coordinated effort and individual skill required Opponents are anonymous and interchangeable Long term goals are self-driven rather than peer-driven I want to reach Level 30 Not: I want to be the best on my server

    16. Lessons Learned - Matchmaking Received well by the majority of players Always something to do, a mix of novelty and the familiar Configurable experience allows longevity Required several early updates to operate robustly DLC maps locking people out was a problem People don’t like feeling they have no control International experience was poor

    17. Lessons Learned - Skill System Modified ELO rating system Both a skill measurement and also reward for investment Non-zero-sum for levels 1-20 to give a “hill-climbing” experience Was abused through boosting Zero-sum competition for advancement Skill level achievement is always in jeopardy Leads to anxiety, anger and frustration in players “WTF I lost my level 30, my team sucks” Players are locked in a continuous struggle to get and then retain their skill level, as the only visible sign of achievement. This creates tension because it’s hard to attain and easy to lose. Tends to manifest as negative emotions. Playing Halo 2 is stressful!Players are locked in a continuous struggle to get and then retain their skill level, as the only visible sign of achievement. This creates tension because it’s hard to attain and easy to lose. Tends to manifest as negative emotions. Playing Halo 2 is stressful!

    18. Ranked Matchmaking in Halo 2 Hyper-Competition + Anonymity + Loss Anxiety = Negative Emotional Pressure We’ve made several online multiplayer games so we know that people tend to be jerks online. But, even we were surprised at how this combination of factors led to a pretty negative emotional tone in the community.We’ve made several online multiplayer games so we know that people tend to be jerks online. But, even we were surprised at how this combination of factors led to a pretty negative emotional tone in the community.

    19. Ranked Matchmaking in Halo 2 And we’re not the only ones who noticed this.And we’re not the only ones who noticed this.

    20. Design Goals for Halo 3

    21. Overall Goals Make the online experience approachable Provide accountability and identity Give players a reason to keep coming back Tools: Voice Identity Skill System Reward System New Player Experience

    22. Voice Design Can’t predict how players will use voice Give listeners control over what they hear Remove temptation to use voice negatively Allow time for socialization that isn’t under pressure Make it easy for players to opt out or mute Positive: Chatting idly with friendly strangers Negative: Being abused by hostile anonymous bigots Everyone has a different opinion on the correct use of voice communication online. No way to matchmake based on this – we don’t know if someone is a mellow player or a foulmouthed bigot. You have to put control in the hands of the listener. We only allow you to communicate with your enemies after the game. This was a tough decision for us, but the right one The difference between a positive and a negative voice interaction is often one of control. Put control in the hands of the listener.Everyone has a different opinion on the correct use of voice communication online. No way to matchmake based on this – we don’t know if someone is a mellow player or a foulmouthed bigot. You have to put control in the hands of the listener. We only allow you to communicate with your enemies after the game. This was a tough decision for us, but the right one The difference between a positive and a negative voice interaction is often one of control. Put control in the hands of the listener.

    23. Identity Design Every player has a public Service Record Persistent individual identity reduces anonymity Goal is to reduce anonymity and provide long-term identification Publicly accessible in-game to everyone Reduce sock-puppeting that was prevalent in Halo 2 Rewards are individual Success recognized directly, or via social comparison with friends No global leaderboards! Primarily competing with yourself Making the Service Record visible to everyone reduces anonymity Also want to reduce the sock-puppeting that was prevalent in Halo 2 Global leaderboards just encourage cheaters. Make rewards individual and direct, rather than forcing players to compete for global recognition. Acts to change the tone of competition.Making the Service Record visible to everyone reduces anonymity Also want to reduce the sock-puppeting that was prevalent in Halo 2 Global leaderboards just encourage cheaters. Make rewards individual and direct, rather than forcing players to compete for global recognition. Acts to change the tone of competition.

    24. Skill System Design Range 1-50; everyone starts at level 1 Almost everyone gains levels quickly, providing positive feedback After 50-100 games, skill level stabilizes Needs to still feel dynamic and not stagnant. But shouldn’t “lose a level” from one bad game. Skill should be a statistic, not a reward After time the skill system will converge to accurately measure your skill. It still needs to move as your ability changes, or you go on a winning / losing streak. But it shouldn’t be so reactive that you worry about “losing a level”. Note that the goal is not to provide a reward for players any more, we are introducing a separate system After time the skill system will converge to accurately measure your skill. It still needs to move as your ability changes, or you go on a winning / losing streak. But it shouldn’t be so reactive that you worry about “losing a level”. Note that the goal is not to provide a reward for players any more, we are introducing a separate system

    25. Reward System Design Reward for playing “Experience points” (XP) Only for wins, to prevent boosting Penalty for quitting games early Experience rating hierarchy Ratings require both skill and XP Emphasized in UI over skill Permanent; no loss anxiety

    26. New Player Experience Separate “Boot Camp” playlist for new players only Limited set of maps and weapons to ease players in Small groups for socialization Reward early and often! Move skilled players out quickly 5 wins triggers ‘graduation’

    27. Technical Design (Skill)

    28. Xbox LIVE Skill System – TrueSkill A mathematical library implemented on XBL back end Bayesian estimation techniques developed by Microsoft Research Cambridge Models player skills as probability density functions [ľ, s] ľ is mean (current estimate), s is standard deviation (uncertainty) TrueSkill is stored and updated invisibly by XBL back end Start in the middle as a wide possibility band, mu=0, std dev s0 As games are scored the skill adjusts rapidly and its band shrinks as the system is more certain of a player’s skill Once the skill converges (s is small) it moves much slower Start in the middle as a wide possibility band, mu=0, std dev s0 As games are scored the skill adjusts rapidly and its band shrinks as the system is more certain of a player’s skill Once the skill converges (s is small) it moves much slower

    29. Using TrueSkill in Halo 3 Don’t show players the raw mathematics of [ľ, s] Use skill lower bound: s = ľ - ks (we chose k=4) Transform by remap function into range 1-50 for display in UI

    30. Customizing TrueSkill Mathematical configuration variables ß (performance factor), ? (dynamics factor), draw probability Left ß alone: dangerous, affects final skill distribution Increased ? so that players’ skill never fully converges Draw probability must be accurate, if it is set too low then ties will be considered highly significant Update Weight – modifies rate of change of [ľ, s] We used this to give players a “hill-climbing” experience by initially decreasing their TrueSkill update weight Weights start out small and return to normal over 50-100 games in a playlist Even though we can identify good or bad players after 8 games, it is more satisfying for them to feel they earned their skill over time Performance factor (ß) describes the randomness of players’ performance. High values of ß mean that each game has less effect on skill. You might be tempted to modify this so that convergence is faster or slower. This is dangerous because it affects the final distribution of all players. We left it alone. Dynamics factor (?) is safe to modify. Higher values mean the skill never fully converges and remains ‘live’ even after many games. Draw probability is important to get right as otherwise ties will be treated as highly significant and cause unexpected updates.Performance factor (ß) describes the randomness of players’ performance. High values of ß mean that each game has less effect on skill. You might be tempted to modify this so that convergence is faster or slower. This is dangerous because it affects the final distribution of all players. We left it alone. Dynamics factor (?) is safe to modify. Higher values mean the skill never fully converges and remains ‘live’ even after many games. Draw probability is important to get right as otherwise ties will be treated as highly significant and cause unexpected updates.

    32. TrueSkill Summary Advantages Already implemented for you by Xbox LIVE Converges quickly Provides good estimate of player skill for matchmaking TrueSkill developers are very helpful and knowledgeable Disadvantages Complex mathematics, takes an expert to understand and tweak Hard to predict overall convergence of system Default behavior does not fit our ideals for a skill system Vulnerable to exploitation, both real and perceived (ties increase rank) This is a hard problem with no clear solution TrueSkill is a good system for what it does. However when you extend it outside its domain of mathematical estimation and try to turn it into something you can show to players, you will run into trouble. It is possible – but it takes a lot of work. And the details will never really be under your control. Default behavior does not fit our ideals, and changing the settings makes its behavior hard to predict – overconstraining the system. Multivariate representation leads to perceived issues, e.g. ties increasing rank. Showing skill to players is valuable but was it worth all this? Hard problem. We don’t know what the right answer is here.TrueSkill is a good system for what it does. However when you extend it outside its domain of mathematical estimation and try to turn it into something you can show to players, you will run into trouble. It is possible – but it takes a lot of work. And the details will never really be under your control. Default behavior does not fit our ideals, and changing the settings makes its behavior hard to predict – overconstraining the system. Multivariate representation leads to perceived issues, e.g. ties increasing rank. Showing skill to players is valuable but was it worth all this? Hard problem. We don’t know what the right answer is here.

    33. Technical Design (Matchmaking)

    34. Search Criteria Use precise initial query parameters to find an ideal match Skill, Experience, Network Connection Quality Initial queries are less likely to find a match Allows tight matches in large populations Query parameters must include all selection criteria Halo 2 had some criteria that were not stored in XBL service Searcher spent time querying candidates that they would never want to join e.g. due to spoken language Wastes bandwidth and also wastes precious search time We start out with precise filters where we are looking for an ideal match. Queries are less likely to succeed but the penalty for an empty query is relatively low. Lets us scale up to large populations and provide tight matchmaking. Very important that you should not have the client doing any post-filtering on candidates. Did this in Halo 2 for a few attributes (spoken language) Wasting time is VERY bad because each set of candidates is only viable for a short period of timeWe start out with precise filters where we are looking for an ideal match. Queries are less likely to succeed but the penalty for an empty query is relatively low. Lets us scale up to large populations and provide tight matchmaking. Very important that you should not have the client doing any post-filtering on candidates. Did this in Halo 2 for a few attributes (spoken language) Wasting time is VERY bad because each set of candidates is only viable for a short period of time

    35. Search Expansion ‘Fuzzy match’ in many dimensions Analog parameters (skill, experience, network connection) Binary parameters (language, country, DLC maps) Treat binary parameters as “soft filters” Expansion has multiple phases Look for ideal match, expand analog filter a bit Remove binary “soft filters” Expand analog filter out to max, relax connection quality Keep trying intermittently, switch to gathering

    36. Ecosystem Balance Must have good balance of searchers and gatherers Halo 2: Easy to model in theory, impossible in practice Global Internet network properties Latency in Live service updating Network engine internals (time to discover, time to join) Expire lists of candidates quickly Searchers are in a race to join limited set of active games Make the ecosystem adaptive Gatherers can also search If nobody is joining you, you have a chance to join someone Ecosystem can adaptively balance for low-population scenario This bit us hard when we launched Halo 2. Required multiple updates to fix. No way to diagnose these problems in the wild. (H2 matchmaking still unknowably broken.) Trying to make H3 ecosystem more fault tolerantThis bit us hard when we launched Halo 2. Required multiple updates to fix. No way to diagnose these problems in the wild. (H2 matchmaking still unknowably broken.) Trying to make H3 ecosystem more fault tolerant

    37. Recommendations

    38. Is Matchmaking Right For You? Works with different genres Works with different game models Could match into games in progress Could use dedicated servers Scales to wide range of user populations Halo 3 playlists range from < 1k to 100k concurrent users Significant investment in client software 2-3 developers for project lifecycle Back end functionality optional but helps a lot Payoff comes from building a lasting community

    39. Design For Security Users have no control so you must provide safe games for them Every aspect of your game will be attacked Hardware attacks, network attacks (bridging, standby, DoS), game attacks (modified content, LSP interception), exploits (skill de-leveling, out-of-map), many more Halo 2 required five updates over three years for security This is an entire talk in itself

    40. Test Early, Test Often MS-Internal Alpha and Beta (10k) – 11/06 and 4/07 Rich text data mining as primary feedback Searchable centralized logging system with event severities Find hard bugs in client (edge cases, crashes, network protocol) Transcontinental Matchmaking and network testing Public Beta (900k) – 5/07 PR boost, some gameplay feedback also Tune TrueSkill distribution curves on real player skill mix Load balancing of LSP servers to avoid Day 1 meltdown High population games must involve XBL in testing Easy to create a scalability problem on back end

    41. Collect Data From Production You will need retail environment instrumentation Can’t use data mining – too much data to store and transmit For Halo 2 launch we had no alternative Halo 3 uses special-purpose binary uploads End-of-game report for bungie.net analysis Matchmaking status report for ecosystem diagnosis Network QoS report for research Volume of data is massive, we discard 90%+ Slim fire-and-forget stateless HTTP-over-XLSP upload Per-machine settings for deep investigation

    42. Results

    43. Results – Overall Deployed successfully, no client update needed Some normal LSP scalability balancing in first few days Reviews mention MP as “streamlined, transparent” High penetration of online multiplayer 5.9M unique users observed on Xbox LIVE 5.2M have played in Matchmaking (88%) Approximately 3x Halo 2 peak concurrency 560k peak concurrent users in 15 playlists Longevity is an open question Tracking steady at 1.2M unique users per week Deployment of Halo 3 matchmaking was successful compared to Halo 2 which required 2 early client updates Deployment of Halo 3 matchmaking was successful compared to Halo 2 which required 2 early client updates

    44. Results – Launch This graph shows Halo 2 and Halo 3 MP games/day recorded by bungie.net. Looks like a sharp decline in H3 but we think this is actually the expected seasonal decline that happens every year in Oct/Nov. This graph shows Halo 2 and Halo 3 MP games/day recorded by bungie.net. Looks like a sharp decline in H3 but we think this is actually the expected seasonal decline that happens every year in Oct/Nov.

    45. Results – Player Community Skill numbers that you can actually believe in! Perception is that level 50 means skilled, not a cheater Some experience boosting We assumed it would be possible to circle boost XP There were ways that let you do it many times faster than normal No way to advance to higher ratings without ranked play This was probably a mistake Player identity features very well received Social community seems to be better than Halo 2

    46. Results – MP Game Selection Custom Games: 16% Matchmaking: 84% These figures are in terms of player-games, so one 4v4 game counts as 8 and a 1v1 game counts as 2These figures are in terms of player-games, so one 4v4 game counts as 8 and a 1v1 game counts as 2

    47. Results – Player Retention 5.9M connected to Live 5.2M enter Matchmaking at least once 60% play at least 100 games; this indicates that online multiplayer is no longer a niche 5.9M connected to Live 5.2M enter Matchmaking at least once 60% play at least 100 games; this indicates that online multiplayer is no longer a niche

    48. Results – Skill in Team Slayer

    49. Results – Overall Skill

    50. Results – Experience Rating

    51. Future Design Thoughts New player experience was good, not great Implemented late, needs goal-driven UI flow 58% of players went on to play 100 games or more But 19% of players stopped after < 20 games Online model is focused on skill improvement But most players don’t care about skill, they like reward better Negative behavior was reduced somewhat Tremendous amount of room for improvement Reputation and social history as part of public player identity Empowering players to change their experience is the right path

    52. Future Technical Thoughts Starting to feel like a solved problem technically Ecosystem could be more self adjusting Still some search / gather balance issues Move more of the ecosystem to a centralized service? Ubiquitous Matchmaking? Not just as explicit UI Invisible fabric of online experience Peer-to-peer is the future

    53. Credits Bungie This system is the work of many people Design, Networking, UI, bungie.net, more Microsoft Research MSR Cambridge Applied Games Group (TrueSkill) MSR Networking Research Group (QoS data analysis) Microsoft Game Studios Xbox Platform XDC (XNA Developer Connection) Xbox LIVE Team Xbox LIVE Operations Team

More Related