1 / 62

From Theory To Real-World Applications: Technology Transfer in Practice

From Theory To Real-World Applications: Technology Transfer in Practice. Ralf Herbrich. Although all of the information in these slides has been published and is non-confidential, it may contain intellectual property owned by my prior employers. Overview. Background

platt
Download Presentation

From Theory To Real-World Applications: Technology Transfer in Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Theory To Real-World Applications: Technology Transfer in Practice Ralf Herbrich Although all of the information in these slides has been published and is non-confidential, it may contain intellectual property owned by my prior employers.

  2. Overview • Background • Technology Transfer Case Studies • TrueSkill: Gamer Rating and Matchmaking • Click-Through Rate Prediction in Online Advertising • Technology Transfer Lessons • Process • Technical

  3. Overview • Background • Technology Transfer Case Studies • TrueSkill: Gamer Rating and Matchmaking • Click-Through Rate Prediction in Online Advertising • Technology Transfer Lessons • Process • Technical

  4. Background

  5. Overview • Background • Technology Transfer Case Studies • TrueSkill: Gamer Rating and Matchmaking • Click-Through Rate Prediction in Online Advertising • Technology Transfer Lessons • Process • Technical

  6. Technology Transfer • Definition (Wiki): Process of moving promising research topics into a level of maturity ready for bulk manufacturing or production • Practice: Often failing due to • Different Success Criteria (Product vs. Publication) • No Training Programs for Technology Transfer • Processes Are Hard to Generalize (Structure?)

  7. TrueSkill™ Joint work with ThoreGraepel, Tom Minka & Phillip Trelford

  8. TrueSkill Technology Transfer Halo 2 Beta Test Jul 2004 Sep 2004

  9. Given: Match outcomes: Orderings among k teams consisting of n1, n2 , ..., nk players, respectively Questions: Skill si for each player such that Global ranking among all players Fair matches between teams of players The Skill Rating Problem

  10. TrueSkill Technology Transfer Halo 2 Beta Test Research Jul 2004 Sep 2004 Dec 2004

  11. Latent Gaussian performance model for fixed skills Possible outcomes: Player 1 wins over 2 (and vice versa) Two Player Match Outcome Model s1 s2 p1 p2 y12

  12. Skill of a team is the sum of the skills of its members Two Team Match Outcome Model s1 s2 s3 s4 t1 t2 y12

  13. Possible outcomes: Permutations of the teams Multiple Team Match Outcome Model s1 s2 s3 s4 The total order relation among the teams is not reflected in the graph structure! t1 t2 t3 y

  14. But we are interested in the (Gaussian) posterior! Multiple Team Match Outcome Model s1 s2 s3 s4 t1 t2 t3 y12 y23

  15. Leaderboard Global ranking of all players Matchmaking For gamers: Most uncertain outcome For inference: Most informative Both are equivalent! Applications to Online Gaming

  16. Experimental Setup • Data Set: Halo 2 Beta • 3 game modes • Free-for-All • Two Teams • 1 vs. 1 • > 60,000 match outcomes • ≈ 6,000 players • 6 weeks of game play • Publically available

  17. Convergence Speed 40 35 30 25 Level 20 15 char (TrueSkill™) 10 SQLWildman (TrueSkill™) char (Halo 2 rank) 5 SQLWildman (Halo 2 rank) 0 0 100 200 300 400 Number of Games

  18. Convergence Speed (ctd.) 100% char wins SQLWildman wins 80% Both players draw 60% Winning probability 40% 20% 5/8 games won by char 0% 0 100 200 300 400 500 Number of games played

  19. TrueSkill Technology Transfer Code Devel Work with Game Devs Research Halo 2 Beta Test Research Jul 2004 Dec 2005 Mar 2005 Sep 2004 Dec 2004 Nov 2005

  20. Directed Models: Bayesian Networks • Definition: Graphical representation of joint probability distribution (Pearl, 1988) • Nodes: = Variables • Directed Edges: Conditional probability distribution • Semantic: • Ancestral relationship of dependency a b c

  21. Factor Graphs • Definition: Graphical representation of product structure of a function (Wiberg, 1996) • Nodes: = Factors = Variables • Edges: Dependencies of factors on variables. • Semantic: • Local variable dependency of factors a b c

  22. Factor Graphs and Bayes’ Law • Bayes’ law • Factorising prior • Factorising likelihood • Inference: Sum out latent variables s1 s2 s t1 t2 d y

  23. Factor Trees: Separation y Observation: Sum of products becomes product of sums of all messages from neighbouring factors to variable! f3(x,y) v w x f1(v,w) f2(w,x) z f4(x,z)

  24. Messages: From Factors To Variables y Observation: Factors only need to sum out all their local variables! f3(x,y) w x f2(w,x) z f4(x,z)

  25. Messages: From Variables To Factors y Observation: Variables pass on the product of all incoming messages! f3(x,y) x f2(w,x) z f4(x,z)

  26. The Sum-Product Algorithm • Three update equations (Aji & McEliece, 1997) • Update equations can be directly derived from the distributive law. • Calculate all marginals at the same time! • Only need to pass messages twice along each edge!

  27. Approximate Message Passing * = -5 0 5 -5 0 5 -5 0 5 ≈ ? * = -5 0 5 -5 0 5 -5 0 5

  28. Efficient Approximate Inference Gaussian Prior Factors s1 s2 s3 s4 Ranking Likelihood Factors Fast and efficient approximate message passing using Expectation Propagation t1 t2 t3 y12 y23

  29. TrueSkill Technology Transfer Code Devel Work with Game Devs Develop Research Halo 2 Beta Test Research Jul 2004 Dec 2005 Mar 2005 Mar 2006 Sep 2004 Dec 2004 Nov 2005

  30. Code size: 1400 LOC + 1400 LOC Project size: 2 project / 21 files Development time: 2 month Features Parser: High performance (> 2GB logs in 1 hour) Parser: Recreation of matchmaking server status Viewer: SQL database integration (deep schema) Xbox Live Activity viewer

  31. Skill Distributions of Online Games Golf (18 holes): 60 levels Car racing (3-4 laps): 40 levels UNO (chance game): 10 levels

  32. TrueSkill Technology Transfer Code Devel Work with Game Devs Work with Game Devs Simulation work and Tool Development for Halo 3 Develop Research Halo 2 Beta Test Research Jul 2004 Dec 2005 Jun 2006 Mar 2005 Mar 2006 Mar 2007 Sep 2004 Dec 2004 Nov 2005

  33. Halo 3 in Action

  34. Questions Controllable player skill progression (slow-down!) Controllable skill distributions (re-ordering) Simulations Large scale simulation of > 8,000,000,000 matches Distributed application written in C# using .Netremoting Tools Result viewer (Logged results: 52 GB of data) Real-time simulator of partial update Tools for Halo 3

  35. Code size: 1800 LOC Project size: 11 files Development time: 2 month Features Multithreaded histogram viewer (due to file size) Real-time spline editor (monotonically increasing) Based on WinForms (compatability) Halo 3 Simulation Result Viewer

  36. TrueSkill Technology Transfer Code Devel Work with Game Devs Work with Game Devs Simulation work and Tool Development for Halo 3 Work with Bungie Tool Develop (Halo 3) Develop Research Research Halo 2 Beta Test Research Jul 2004 Jul 2007 Dec 2005 Jun 2006 Mar 2005 Mar 2006 Mar 2007 Sep 2004 Dec 2004 Nov 2005 Nov 2007 May 2007

  37. Code size: 2600 LOC Project size: 10 files Development time: 1 month Features SQL database integration (analysis of beta test data) Full integration of C# TrueSkill code (.Net library) Real time changes Halo 3 Partial Update Analyser

  38. Halo 3 Public Beta Analysis

  39. Xbox 360 Live Launched in September 2005 Every game uses TrueSkill™ to match players > 10 million players > 2 million matches per day > 2 billion hours of gameplay Halo 3 Launched on 25th September 2007 Largest entertainment launch in history > 200,000 player concurrently (peak: 1,000,000) Xbox 360 & Halo 3

  40. TrueSkill Technology Transfer • Lessons Learned: • Pure research takes a short amount of time • Most of development was tool development • A platform feature only lives with a community • Mathematical Optimality ≠ Fun Experience Code Devel Work with Game Devs Work with Game Devs Simulation work and Tool Development for Halo 3 Work with Bungie Tool Develop (Halo 3) Develop Research Research Halo 2 Beta Test Research Jul 2004 Jul 2007 Dec 2005 Jun 2006 Mar 2005 Mar 2006 Mar 2007 Sep 2004 Dec 2004 Nov 2005 Nov 2007 May 2007

  41. adPredictor Joint work with ThoreGraepel, Joaquin Quiñonero Candela, OnnoZoeter, Tom Borchert , Phillip Trelford

  42. AdPredictor Technology Transfer Problem Identify Mar 2007 Jan 2007

  43. Display (according to expected revenue) Charge (per click) Why Predict Probability-of-Click? • Advantages of improved probability estimates: • Increase user satisfaction by better targeting • Fairer charges to advertisers • Increase revenue by showing ads with high click-thru rate $1.00 * 10% =$0.10 $0.80 $2.00 * 4% =$0.08 $1.25 $0.10 * 50% =$0.05 $0.05

  44. AdPredictor Technology Transfer Problem Identify AdCenter Compete Mar 2007 Jan 2007 May 2007

  45. DayLog08FebSimplePageView * Column Name Data Type Allow Nulls DayLog08FebSimplePageView... bigint Category * MatchType * UserID uniqueidentifier CategoryId MatchTypeId Time datetime Description Name ClientIP varchar(15) ClientTZ int UserAgent varchar(255) AgeGroup * DistributionChannel int AgeGroupId DomainID int Name FormCode varchar(16) SearchVID varchar(16) DayLog08FebSimplePageViewReturnedAds * AdTokenRatio * HasPassport bit Column Name Data Type Allow Nu... AdTokenRatioId PassportGender int DayLog08FebSimplePageView... bigint Name PassportAgeGroupId int ImpressionHash bigint PassportZipcode varchar(15) MatchedKeyword varchar(5... PassportCountry varchar(2) AdvertiserId int PassportBDay varchar(15) OrderId int PassportBirthYear int ListingId int PassportOccupation char(1) AdId int LocationCountryId int The Flow of Information • Why structured data? • Data validation and cleaning • Principled feature transformations User interaction Raw Logs Structured Data

  46. Code size: 500 LOC Project size: 1 file Development time: 2 weeks Features Code defines the schema (unlike LINQ)! High-performance insertion via computed bulk-insertion with automated key propagation Code sample is now part of the F# distribution SQL Schema Generator

  47. Strong Typing and SQL Datastores /// Different types of media type MediumType = | PaidSearch | ContextualSearch /// A single displayed advertisement type Advertisement = { AdId : int OrderItemId : int CampDayId : int16 CampHourNum : byte ProductId : ProductType MatchType : MatchType AdLayoutId : AdLayout RelativePosition : byte DeliveryEngineRank : int16 ActualBid : int ProbabilityOfClick : int16 MatchScore : int ImpressionCnt : int ClickCnt : int ConversionCnt : int TotalCost : int } /// A single page-view type PageView = { ClientDateTime : DateTime GmtSeconds : int TargetDomainId : int16 Medium : MediumType option StartPosition : int PageNum : byte [<SqlStringLengthAttribute(256)>] Query : string Gender : Gender option AgeBucket : AgeGroup option ReturnedAdCnt : byte AbTestingType : byte option AlgorithmId : int option ANID : int128 option GUID : int128 option [<SqlStringLengthAttribute(15)>] PassportZipCode : string option [<SqlStringLengthAttribute(2)>] PassportCountry : string option PassportRegion : int [<SqlStringLengthAttribute(2)>] PassportOccupation : char LocationCountry : int LocationState : int LocationMetroArea : int CategoryId : int16 SubCategoryId : int16 FormCode : int16 ReturnedAds : Advertisement array } /// Create the SQL schema let schema = bulkBuild ("cpidssdm18",“Cambridge",“June10") /// Try to open the CSV file and read it pageview by pageview File.OpenTextReader“HourlyRelevanceFeed.csv" |> Seq.map (fun s -> s.Split [|','|]) |> Seq.chunkBy (fun xs -> xs.[0]) |> Seq.iteri (fun i (rguid,xss) -> /// Write the current in-memory bulk to the Sql database if i % 10000 = 0then schema.Flush () /// Get the strongly typed object from the list of CSV file lines let pageView = PageView.Parse xss /// Insert it pageView |> schema.Insert ) /// One final flush schema.Flush ()

  48. Uncertainty: Bayesian Probabilities 102.34.12.201 15.70.165.9 Client IP 221.98.2.187 92.154.3.86 p(pClick) + Match Type Exact Match Broad Match ML-1 Position SB-1 SB-2

  49. Training Algorithm in Action w2 w1 + z c No Click Prediction Training/Update Click

  50. Inference: An Optimization View

More Related