SlideShare a Scribd company logo
1 of 25
LEARNING WITH F#
Phillip Trelford, Applied Games, Microsoft
Research
Overview
 Learning Probabilistic Models
 Factor Graphs
 Inference in Factor Graphs
 Projects
 TrueSkill Analysis
 Internal adCenter competition
 Benefits of F#
Overview
 Learning Probabilistic Models
 Factor Graphs
 Inference in Factor Graphs
 Projects
 TrueSkill Analysis
 Internal adCenter competition
 Benefits of F#
Factor Graphs
 Bi-partite graphs
 Random variables
 Factors
 Two purposes:
 Representation of the structure of a probability
distribution (more fine grained than Bayes Nets)
 Represent an algorithm where computations are
performed along the edges (schedules)
TrueSkill™ Factor Graph
s1
s1 s2
s2 s3
s3 s4
s4
t1
t1
y1
2
y1
2
t2
t2 t3
t3
y2
3
y2
3
Inference in Factor Graphs
 Computational question:
 What are the marginals of the joint probability?
 What is the mode of the joint probability?
 Naive approach require exponential run-time:
 Marginals:
 Mode:
Message Passing in Factor
Graphs
w1w1 w2w2
++
ss
cc
Overview
 Learning Probabilistic Models
 Factor Graphs
 Inference in Factor Graphs
 Projects
 TrueSkill Analysis
 Internal adCenter competition
 Benefits of F#
 Given:
 Match outcomes: Orderings among k teams
consisting of n1, n2 , ..., nk players, respectively
 Questions:
 Skill si for each player such that
 Global ranking among all players
 Fair matches between teams of players
TrueSkill Rating Problem
Xbox 360 Live
 Launched in September 2005
 Every game uses TrueSkill™ to match players
 > 6 million players
 > 1 million matches per day
 > 2 billion hours of gameplay
Xbox Live Activity viewer
 Code size: 1400 LOC + 1400 LOC
 Project size: 2 project / 21 files
 Development time: 2 month
 Features
 Parser: High performance (> 2GB logs in 1 hour)
 Parser: Recreation of matchmaking server status
 Viewer: SQL database integration (deep schema)
Xbox 360 & Halo 3
 Xbox 360 Live
 Launched in September 2005
 Every game uses TrueSkill™ to match players
 > 6 million players
 > 1 million matches per day
 > 2 billion hours of gameplay
 Halo 3
 Launched on 25th
September 2007
 Largest entertainment launch in history
 > 500,000 player concurrently playing
F# Tools for Halo 3
 Questions
 Controllable player skill progression (slow-down!)
 Controllable skill distributions (re-ordering)
 Simulations
 Large scale simulation of > 8,000,000,000
matches
 Distributed application written in C# using .Net
remoting
 Tools
 Result viewer (Logged results: 52 GB of data)
 Real-time simulator of partial update
Halo 3 Simulation Result
Viewer
 Code size: 1800 LOC
 Project size: 11 files
 Development time: 2 month
 Features
 Multithreaded histogram viewer (due to file size)
 Real-time spline editor (monotonically increasing)
 Based on WinForms (compatability)
Halo 3 Partial Update Analyser
 Code size: 2600 LOC
 Project size: 10 files
 Development time: 1 month
 Features
 SQL database integration (analysis of beta test
data)
 Full integration of C# TrueSkill code (.Net library)
 Real time changes
Overview
 Learning Probabilistic Models
 Factor Graphs
 Inference in Factor Graphs
 Projects
 TrueSkill Analysis
 Internal adCenter competition
 Benefits of F#
The adCenter Problem
 Cash-cow of Search
 Selling “web space” at www.live.com
and www.msn.com.
 “Paid Search” (prices by auctions)
 The internal competition focuses on
Paid Search.
The Internal adCenter
Competition
 Start of competition: February 2007
 Start of training phase: May 2007
 End of training phase: June 2007
 Task:
 Predict the probability of click of a few days of real
data from several weeks of training data (logged page
views)
 Resources:
 4 (2 x 2) 64-bit CPU machine
 16 GB of RAM
 200 GB HD
The Scale of Things
 Weeks of data in training:
7,000,000,000 impressions
 2 weeks of CPU time during training:
2 wks × 7 days × 86,400 sec/day =
1,209,600 seconds
 Learning algorithmspeed requirement:
 5,787 impression updates / sec
 172.8 μs per impression update
Tool Chain: Existing Tools
 Excel 2007
 Scientific Visualisation
 Small Scale Simulations
 SQL Server2005
 1.6 TB of “active” data (for 2 weeks of data + indices)
 Ad-Hoc Queries and Stored Procedures
 Visual Studio 2005 & F#
 54 projects solution (many small tools)
 FSI for rapid development and code testing
 Strong typing as a surrogate for correctness
SQL Schema Generator
 Code size: 500 LOC
 Project size: 1 file
 Development time: 2 weeks
 Features
 Code defines the schema (unlike LINQ)!
 High-performance insertion via computed bulk-
insertion with automated key propagation
 Code sample is now part of the F# distribution
Strong Typing and SQL
Datastores
/// A single page-view
type PageView =
{
ClientDateTime : DateTime
GmtSeconds : int
TargetDomainId : int16
Medium : MediumType option
StartPosition : int
PageNum : byte
[<SqlStringLengthAttribute(256)>]
Query : string
Gender : Gender option
AgeBucket : AgeGroup option
ReturnedAdCnt : byte
AbTestingType : byte option
AlgorithmId : int option
ANID : int128 option
GUID : int128 option
[<SqlStringLengthAttribute(15)>]
PassportZipCode : string option
[<SqlStringLengthAttribute(2)>]
PassportCountry : string option
PassportRegion : int
[<SqlStringLengthAttribute(2)>]
PassportOccupation : char
LocationCountry : int
LocationState : int
LocationMetroArea : int
CategoryId : int16
SubCategoryId : int16
FormCode : int16
ReturnedAds : Advertisement array
}
/// A single page-view
type PageView =
{
ClientDateTime : DateTime
GmtSeconds : int
TargetDomainId : int16
Medium : MediumType option
StartPosition : int
PageNum : byte
[<SqlStringLengthAttribute(256)>]
Query : string
Gender : Gender option
AgeBucket : AgeGroup option
ReturnedAdCnt : byte
AbTestingType : byte option
AlgorithmId : int option
ANID : int128 option
GUID : int128 option
[<SqlStringLengthAttribute(15)>]
PassportZipCode : string option
[<SqlStringLengthAttribute(2)>]
PassportCountry : string option
PassportRegion : int
[<SqlStringLengthAttribute(2)>]
PassportOccupation : char
LocationCountry : int
LocationState : int
LocationMetroArea : int
CategoryId : int16
SubCategoryId : int16
FormCode : int16
ReturnedAds : Advertisement array
}
/// Different types of media
type MediumType =
| PaidSearch
| ContextualSearch
/// A single displayed advertisement
type Advertisement =
{
AdId : int
OrderItemId : int
CampDayId : int16
CampHourNum : byte
ProductId : ProductType
MatchType : MatchType
AdLayoutId : AdLayout
RelativePosition : byte
DeliveryEngineRank : int16
ActualBid : int
ProbabilityOfClick : int16
MatchScore : int
ImpressionCnt : int
ClickCnt : int
ConversionCnt : int
TotalCost : int
}
/// Different types of media
type MediumType =
| PaidSearch
| ContextualSearch
/// A single displayed advertisement
type Advertisement =
{
AdId : int
OrderItemId : int
CampDayId : int16
CampHourNum : byte
ProductId : ProductType
MatchType : MatchType
AdLayoutId : AdLayout
RelativePosition : byte
DeliveryEngineRank : int16
ActualBid : int
ProbabilityOfClick : int16
MatchScore : int
ImpressionCnt : int
ClickCnt : int
ConversionCnt : int
TotalCost : int
}
/// Create the SQL schema
let schema = bulkBuild ("cpidssdm18", “Cambridge", “June10")
/// Try to open the CSV file and read it pageview by pageview
File.OpenTextReader “HourlyRelevanceFeed.csv"
|> Seq.map (fun s -> s.Split [|','|])
|> Seq.chunkBy (fun xs -> xs.[0])
|> Seq.iteri (fun i (rguid,xss) ->
/// Write the current in-memory bulk to the Sql database
if i % 10000 = 0 then
schema.Flush ()
/// Get the strongly typed object from the list of CSV file lines
let pageView = PageView.Parse xss
/// Insert it
pageView |> schema.Insert
)
/// One final flush
schema.Flush ()
/// Create the SQL schema
let schema = bulkBuild ("cpidssdm18", “Cambridge", “June10")
/// Try to open the CSV file and read it pageview by pageview
File.OpenTextReader “HourlyRelevanceFeed.csv"
|> Seq.map (fun s -> s.Split [|','|])
|> Seq.chunkBy (fun xs -> xs.[0])
|> Seq.iteri (fun i (rguid,xss) ->
/// Write the current in-memory bulk to the Sql database
if i % 10000 = 0 then
schema.Flush ()
/// Get the strongly typed object from the list of CSV file lines
let pageView = PageView.Parse xss
/// Insert it
pageView |> schema.Insert
)
/// One final flush
schema.Flush ()
Overview
 Learning Probabilistic Models
 Factor Graphs
 Inference in Factor Graphs
 Projects
 TrueSkill Analysis
 Internal adCenter competition
 Benefits of F#
Overview
 Learning Probabilistic Models
 Factor Graphs
 Inference in Factor Graphs
 Projects
 TrueSkill Analysis
 Internal adCenter competition
 Benefits of F#
Benefits of F#
 Four main reasons:
1. A language that both developers and
researchers speak!
2. It leads to
1. “Correct” programs
2. Succinct programs
3. Highly performant code
3. Interoperability with .NET
4. It’s fun to program!

More Related Content

What's hot

R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
MongoDB
 

What's hot (20)

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Webinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsWebinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to Basics
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Indexing and Performance Tuning
Indexing and Performance TuningIndexing and Performance Tuning
Indexing and Performance Tuning
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 

Similar to Learning with F#

Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
FIWARE
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WARE
Fermin Galan
 
Whidbey old
Whidbey old Whidbey old
Whidbey old
grenaud
 
RichFaces: rich:* component library
RichFaces: rich:* component libraryRichFaces: rich:* component library
RichFaces: rich:* component library
Max Katz
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
Build a game with javascript (april 2017)
Build a game with javascript (april 2017)Build a game with javascript (april 2017)
Build a game with javascript (april 2017)
Thinkful
 
Ajax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesAjax Performance Tuning and Best Practices
Ajax Performance Tuning and Best Practices
Doris Chen
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 

Similar to Learning with F# (20)

Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WARE
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Whidbey old
Whidbey old Whidbey old
Whidbey old
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
RichFaces: rich:* component library
RichFaces: rich:* component libraryRichFaces: rich:* component library
RichFaces: rich:* component library
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
 
Fast NoSQL from HDDs?
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs?
 
Build a game with javascript (april 2017)
Build a game with javascript (april 2017)Build a game with javascript (april 2017)
Build a game with javascript (april 2017)
 
Class[3][5th jun] [three js]
Class[3][5th jun] [three js]Class[3][5th jun] [three js]
Class[3][5th jun] [three js]
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
FMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menannoFMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menanno
 
Ajax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesAjax Performance Tuning and Best Practices
Ajax Performance Tuning and Best Practices
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDB
 

More from Phillip Trelford

F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015
Phillip Trelford
 
F# for C# devs - Leeds Sharp 2015
F# for C# devs -  Leeds Sharp 2015F# for C# devs -  Leeds Sharp 2015
F# for C# devs - Leeds Sharp 2015
Phillip Trelford
 

More from Phillip Trelford (20)

How to be a rock star developer
How to be a rock star developerHow to be a rock star developer
How to be a rock star developer
 
Mobile F#un
Mobile F#unMobile F#un
Mobile F#un
 
F# eXchange Keynote 2016
F# eXchange Keynote 2016F# eXchange Keynote 2016
F# eXchange Keynote 2016
 
FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015
 
F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015
 
Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015
 
24 hours later - FSharp Gotham 2015
24 hours later - FSharp Gotham  201524 hours later - FSharp Gotham  2015
24 hours later - FSharp Gotham 2015
 
Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015
 
FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015FSharp On The Desktop - Birmingham FP 2015
FSharp On The Desktop - Birmingham FP 2015
 
Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015Ready, steady, cross platform games - ProgNet 2015
Ready, steady, cross platform games - ProgNet 2015
 
F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015F# for C# devs - NDC Oslo 2015
F# for C# devs - NDC Oslo 2015
 
F# for C# devs - Leeds Sharp 2015
F# for C# devs -  Leeds Sharp 2015F# for C# devs -  Leeds Sharp 2015
F# for C# devs - Leeds Sharp 2015
 
Build a compiler in 2hrs - NCrafts Paris 2015
Build a compiler in 2hrs -  NCrafts Paris 2015Build a compiler in 2hrs -  NCrafts Paris 2015
Build a compiler in 2hrs - NCrafts Paris 2015
 
24 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 201524 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 2015
 
Real World F# - SDD 2015
Real World F# -  SDD 2015Real World F# -  SDD 2015
Real World F# - SDD 2015
 
F# for C# devs - SDD 2015
F# for C# devs - SDD 2015F# for C# devs - SDD 2015
F# for C# devs - SDD 2015
 
Machine learning from disaster - GL.Net 2015
Machine learning from disaster  - GL.Net 2015Machine learning from disaster  - GL.Net 2015
Machine learning from disaster - GL.Net 2015
 
F# for Trading - QuantLabs 2014
F# for Trading -  QuantLabs 2014F# for Trading -  QuantLabs 2014
F# for Trading - QuantLabs 2014
 

Learning with F#

  • 1. LEARNING WITH F# Phillip Trelford, Applied Games, Microsoft Research
  • 2. Overview  Learning Probabilistic Models  Factor Graphs  Inference in Factor Graphs  Projects  TrueSkill Analysis  Internal adCenter competition  Benefits of F#
  • 3. Overview  Learning Probabilistic Models  Factor Graphs  Inference in Factor Graphs  Projects  TrueSkill Analysis  Internal adCenter competition  Benefits of F#
  • 4. Factor Graphs  Bi-partite graphs  Random variables  Factors  Two purposes:  Representation of the structure of a probability distribution (more fine grained than Bayes Nets)  Represent an algorithm where computations are performed along the edges (schedules)
  • 5. TrueSkill™ Factor Graph s1 s1 s2 s2 s3 s3 s4 s4 t1 t1 y1 2 y1 2 t2 t2 t3 t3 y2 3 y2 3
  • 6. Inference in Factor Graphs  Computational question:  What are the marginals of the joint probability?  What is the mode of the joint probability?  Naive approach require exponential run-time:  Marginals:  Mode:
  • 7. Message Passing in Factor Graphs w1w1 w2w2 ++ ss cc
  • 8. Overview  Learning Probabilistic Models  Factor Graphs  Inference in Factor Graphs  Projects  TrueSkill Analysis  Internal adCenter competition  Benefits of F#
  • 9.  Given:  Match outcomes: Orderings among k teams consisting of n1, n2 , ..., nk players, respectively  Questions:  Skill si for each player such that  Global ranking among all players  Fair matches between teams of players TrueSkill Rating Problem
  • 10. Xbox 360 Live  Launched in September 2005  Every game uses TrueSkill™ to match players  > 6 million players  > 1 million matches per day  > 2 billion hours of gameplay
  • 11. Xbox Live Activity viewer  Code size: 1400 LOC + 1400 LOC  Project size: 2 project / 21 files  Development time: 2 month  Features  Parser: High performance (> 2GB logs in 1 hour)  Parser: Recreation of matchmaking server status  Viewer: SQL database integration (deep schema)
  • 12. Xbox 360 & Halo 3  Xbox 360 Live  Launched in September 2005  Every game uses TrueSkill™ to match players  > 6 million players  > 1 million matches per day  > 2 billion hours of gameplay  Halo 3  Launched on 25th September 2007  Largest entertainment launch in history  > 500,000 player concurrently playing
  • 13. F# Tools for Halo 3  Questions  Controllable player skill progression (slow-down!)  Controllable skill distributions (re-ordering)  Simulations  Large scale simulation of > 8,000,000,000 matches  Distributed application written in C# using .Net remoting  Tools  Result viewer (Logged results: 52 GB of data)  Real-time simulator of partial update
  • 14. Halo 3 Simulation Result Viewer  Code size: 1800 LOC  Project size: 11 files  Development time: 2 month  Features  Multithreaded histogram viewer (due to file size)  Real-time spline editor (monotonically increasing)  Based on WinForms (compatability)
  • 15. Halo 3 Partial Update Analyser  Code size: 2600 LOC  Project size: 10 files  Development time: 1 month  Features  SQL database integration (analysis of beta test data)  Full integration of C# TrueSkill code (.Net library)  Real time changes
  • 16. Overview  Learning Probabilistic Models  Factor Graphs  Inference in Factor Graphs  Projects  TrueSkill Analysis  Internal adCenter competition  Benefits of F#
  • 17. The adCenter Problem  Cash-cow of Search  Selling “web space” at www.live.com and www.msn.com.  “Paid Search” (prices by auctions)  The internal competition focuses on Paid Search.
  • 18. The Internal adCenter Competition  Start of competition: February 2007  Start of training phase: May 2007  End of training phase: June 2007  Task:  Predict the probability of click of a few days of real data from several weeks of training data (logged page views)  Resources:  4 (2 x 2) 64-bit CPU machine  16 GB of RAM  200 GB HD
  • 19. The Scale of Things  Weeks of data in training: 7,000,000,000 impressions  2 weeks of CPU time during training: 2 wks × 7 days × 86,400 sec/day = 1,209,600 seconds  Learning algorithmspeed requirement:  5,787 impression updates / sec  172.8 μs per impression update
  • 20. Tool Chain: Existing Tools  Excel 2007  Scientific Visualisation  Small Scale Simulations  SQL Server2005  1.6 TB of “active” data (for 2 weeks of data + indices)  Ad-Hoc Queries and Stored Procedures  Visual Studio 2005 & F#  54 projects solution (many small tools)  FSI for rapid development and code testing  Strong typing as a surrogate for correctness
  • 21. SQL Schema Generator  Code size: 500 LOC  Project size: 1 file  Development time: 2 weeks  Features  Code defines the schema (unlike LINQ)!  High-performance insertion via computed bulk- insertion with automated key propagation  Code sample is now part of the F# distribution
  • 22. Strong Typing and SQL Datastores /// A single page-view type PageView = { ClientDateTime : DateTime GmtSeconds : int TargetDomainId : int16 Medium : MediumType option StartPosition : int PageNum : byte [<SqlStringLengthAttribute(256)>] Query : string Gender : Gender option AgeBucket : AgeGroup option ReturnedAdCnt : byte AbTestingType : byte option AlgorithmId : int option ANID : int128 option GUID : int128 option [<SqlStringLengthAttribute(15)>] PassportZipCode : string option [<SqlStringLengthAttribute(2)>] PassportCountry : string option PassportRegion : int [<SqlStringLengthAttribute(2)>] PassportOccupation : char LocationCountry : int LocationState : int LocationMetroArea : int CategoryId : int16 SubCategoryId : int16 FormCode : int16 ReturnedAds : Advertisement array } /// A single page-view type PageView = { ClientDateTime : DateTime GmtSeconds : int TargetDomainId : int16 Medium : MediumType option StartPosition : int PageNum : byte [<SqlStringLengthAttribute(256)>] Query : string Gender : Gender option AgeBucket : AgeGroup option ReturnedAdCnt : byte AbTestingType : byte option AlgorithmId : int option ANID : int128 option GUID : int128 option [<SqlStringLengthAttribute(15)>] PassportZipCode : string option [<SqlStringLengthAttribute(2)>] PassportCountry : string option PassportRegion : int [<SqlStringLengthAttribute(2)>] PassportOccupation : char LocationCountry : int LocationState : int LocationMetroArea : int CategoryId : int16 SubCategoryId : int16 FormCode : int16 ReturnedAds : Advertisement array } /// Different types of media type MediumType = | PaidSearch | ContextualSearch /// A single displayed advertisement type Advertisement = { AdId : int OrderItemId : int CampDayId : int16 CampHourNum : byte ProductId : ProductType MatchType : MatchType AdLayoutId : AdLayout RelativePosition : byte DeliveryEngineRank : int16 ActualBid : int ProbabilityOfClick : int16 MatchScore : int ImpressionCnt : int ClickCnt : int ConversionCnt : int TotalCost : int } /// Different types of media type MediumType = | PaidSearch | ContextualSearch /// A single displayed advertisement type Advertisement = { AdId : int OrderItemId : int CampDayId : int16 CampHourNum : byte ProductId : ProductType MatchType : MatchType AdLayoutId : AdLayout RelativePosition : byte DeliveryEngineRank : int16 ActualBid : int ProbabilityOfClick : int16 MatchScore : int ImpressionCnt : int ClickCnt : int ConversionCnt : int TotalCost : int } /// Create the SQL schema let schema = bulkBuild ("cpidssdm18", “Cambridge", “June10") /// Try to open the CSV file and read it pageview by pageview File.OpenTextReader “HourlyRelevanceFeed.csv" |> Seq.map (fun s -> s.Split [|','|]) |> Seq.chunkBy (fun xs -> xs.[0]) |> Seq.iteri (fun i (rguid,xss) -> /// Write the current in-memory bulk to the Sql database if i % 10000 = 0 then schema.Flush () /// Get the strongly typed object from the list of CSV file lines let pageView = PageView.Parse xss /// Insert it pageView |> schema.Insert ) /// One final flush schema.Flush () /// Create the SQL schema let schema = bulkBuild ("cpidssdm18", “Cambridge", “June10") /// Try to open the CSV file and read it pageview by pageview File.OpenTextReader “HourlyRelevanceFeed.csv" |> Seq.map (fun s -> s.Split [|','|]) |> Seq.chunkBy (fun xs -> xs.[0]) |> Seq.iteri (fun i (rguid,xss) -> /// Write the current in-memory bulk to the Sql database if i % 10000 = 0 then schema.Flush () /// Get the strongly typed object from the list of CSV file lines let pageView = PageView.Parse xss /// Insert it pageView |> schema.Insert ) /// One final flush schema.Flush ()
  • 23. Overview  Learning Probabilistic Models  Factor Graphs  Inference in Factor Graphs  Projects  TrueSkill Analysis  Internal adCenter competition  Benefits of F#
  • 24. Overview  Learning Probabilistic Models  Factor Graphs  Inference in Factor Graphs  Projects  TrueSkill Analysis  Internal adCenter competition  Benefits of F#
  • 25. Benefits of F#  Four main reasons: 1. A language that both developers and researchers speak! 2. It leads to 1. “Correct” programs 2. Succinct programs 3. Highly performant code 3. Interoperability with .NET 4. It’s fun to program!