SlideShare una empresa de Scribd logo
1 de 60
High-performance computing



High performance
computing in Artificial
Intelligence & Optimization
Olivier.Teytaud@inria.fr + many people

TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.


NCHC, Taiwan.
November 2010.
Disclaimer

Many works in parallelism are about
technical tricks on SMP programming,
message-passing, network organization.
==> often moderate improvements, but
    for all users using a given
 library/methodology
Here, opposite point of view:
  Don't worry for 10% loss due to suboptimal
   programming
  Try to benefit from huge machines
Outline

Parallelism
Bias & variance
AI & Optimization
  Optimization
  Supervised machine learning
  Multistage decision making
Conclusions
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Various cases:
  Many cores in one machine (shared memory)
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Various cases:
  Many cores in one machine (shared memory)
  Many cores on a same fast network
    (explicit fast communications)
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Various cases:
  Many cores in one machine (shared memory)
  Many cores on a same fast network
    (explicit fast communications)
  Many cores on a network
    (explicit slow communications)
Parallelism

Various cases:
  Many cores in one machine (shared memory)
   ==> your laptop
  Many cores on a same fast network
   (explicit fast communications)
     ==> your favorite cluster
  Many cores on a network
   (explicit slow communications)
     ==> your grid or your lab or internet
Parallelism

Definitions:
  p = number of processors
  Speed-up(P) = ratio


   Time for reaching precision  when p=1
  -------------------------------------------------------------
   Time for reaching precision  when p=P
  Efficiency(p) = speed-up(p)/p
    (usually at most 1)
Outline

Parallelism
Bias & variance
AI & Optimization
  Optimization
  Supervised machine learning
  Multistage decision making
Conclusions
Bias and variance

I compute x on a computer.
It's imprecise, I get x'.
How can I parallelize this to
    make it faster ?
Bias and variance

I compute x on a computer.
It's imprecise, I get x'.
What happens if I compute x
   1000 times,
   on 1000 different machines ?
I get x'1,...,x'1000.
x' = average( x'1,...,x'1000 )
Bias and variance

x' = average( x'1,...,x'1000 )
If the algorithm is deterministic:
   all x'i are equal
   no benefit
   Speed-up = 1, efficiency → 0
   ==> not good!       (trouble=bias!)
Bias and variance

x' = average( x'1,...,x'1000 )
If the algorithm is deterministic:
   all x'i are equal
   no benefit
   Speed-up = 1, efficiency → 0
   ==> not good!
If unbiased Monte-Carlo estimate:
    - speed-up=p, efficiency=1
   ==> ideal case! (trouble = variance)
Bias and variance, concluding

Two classical notions for an estimator x':
  Bias = E (x' – x)
  Variance E (x' – Ex')2
Parallelism can easily reduce variance;
parallelism can not easily reduce the bias.
AI & optimization: bias &
variance everywhere

Parallelism
Bias & variance
AI & Optimization
  Optimization
  Supervised machine learning
  Multistage decision making
Conclusions
AI & optimization: bias &
variance everywhere

Many (parts of) algorithms can be rewritten
as follows:


  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization




While (I have time)
  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution
While (I have time)
  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x)
     2=var(x1,...,x)
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
          MANY EVOLUTIONARY
distribution G (mean m, variance  2)
     ALGORITHMS ARE WEAK FOR
While (I have time)
           LAMBDA LARGE.
  GenerateBE EASILY OPTIMIZED
      CAN sample x1,...,x using G
        BY A BIAS / VARIANCE
  Work on x1,...,x, get
              ANALYSIS
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x)
     2=var(x1,...,x)
Ex. 1: bias & variance for EO

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x) <== unweighted!
     2=var(x1,...,x)
Ex. 1: bias & variance for EO

Huge improvement in EMNA for lambda
large just by taking into account bias/variance
decomposition: reweighting necessary for
cancelling the bias.
Other improvements by classical statistical
tricks:
   Reducing      for    large;
   Using quasi-random mutations.


==> really simple and crucial for large
 population sizes. (not just for publishing :-) )
Ex. 1: bias & variance for EO

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x) <== unweighted!
     2=var(x1,...,x)
Example 2: supervised machine
learning (huge dataset)




  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 2: supervised machine
learning (huge dataset D)
 Generate data sets D1,...,D using current
  knowledge (subsets of the database)
 Work on D1,...,D, get f1,...,f. (by learning)
 Average the fis.
 ==> (su)bagging: Di=subset of D
 ==> random subspace: Di=projection of D on
  random vector space
 ==> random noise: Di=D+noise
 ==> random forest: Di = D, but noisy algo
Example 2: supervised machine
learning (huge dataset D)
 Generate data sets D1,...,D using current
   Easy tricks for parallelizing supervised
  knowledge (subsets of the database)
   machine learning:
    - use (su)bagging
 Work on D1,...,D, get f1,...,f. (by learning)
    - use random subspaces
 Average the of randomized algorithms
    - use average fis.
          (random forests)
 ==> (su)bagging: Di=subset of D
    - do the cross-validation in parallel
 ==> random subspace: Di=projection of D on
  random vector space
  ==> from my experience, complicated parallel tools
 ==> randomimportantDi=D+noise
     are not that noise: …
    - polemical issue: many papers on sophisticated parallel
 ==> supervisedforest: Di = D, algorithms; algo
        random machine learning but noisy
    - I might be wrong :-)
Example 2: active supervised
machine learning (huge dataset)


While I have time
  Generate sample x1,...,x using current
   knowledge (e.g. sample the
   maxUncertainty region)
  Work on x1,...,x, get y1,...,y (labels by
   experts / expensive code)
  Update knowledge (approximate model).
Example 3: decision making
under uncertainty


While I have time
  Generate simulations x1,...,x using
   current knowledge
  Work on x1,...,x, get y1,...,y (get
   rewards)
  Update knowledge (approximate model).
UCT (Upper Confidence Trees)




Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT
UCT
UCT
UCT
UCT
      Kocsis & Szepesvari (06)
Exploitation ...
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
... or exploration ?
              SCORE =
                 0/2
               + k.sqrt( log(10)/2 )
Example 3: decision making
under uncertainty


While I have time
  Generate simulation x1,...,x using
   current knowledge (=scoring rule based
   on statistics)
  Work on x1,...,x, get y1,...,y (get
   rewards)
  Update knowledge (= update statistics in
   memory ).
Example 3: decision making
under uncertainty: parallelizing

While I have time
  Generate simulation x1,...,x using
   current knowledge (=scoring rule based
   on statistics)
  Work on x1,...,x, get y1,...,y (get
   rewards)
  Update knowledge (= update statistics in
   memory ).
==> “easily” parallelized on multicore
 machines
Example 3: decision making
under uncertainty: parallelizing

While I have time

   Generate simulation x1,...,x using current knowledge (=scoring rule
     based on statistics)
   Work on x1,...,x, get y1,...,y (get rewards)
   Update knowledge (= update statistics in memory ).

==> parallelized on clusters: one
 knowledge base per machine,
 average statistics only for crucial
 nodes:
   nodes with more than 5 % of the sims
   nodes at depth < 4
Example 3: decision making
under uncertainty: parallelizing

Good news first: it's simple and it
works on huge clusters ! ! !




       Comparison with voting schemes;
       40 machines, 2 seconds per move.
Example 3: decision making
under uncertainty: parallelizing

Good news first: it's simple and it
works on huge clusters ! ! !




     Comparing N machines and P machines
  ==> consistent with linear speed-up in 19x19 !
Example 3: decision making
under uncertainty: parallelizing

Good news first: it's simple and it
works on huge clusters ! ! !

When we have produced these numbers, we
believed we were ready to play Go against very
strong players.


Unfortunately not at all :-)
Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9        MoGo
2008: win against a pro (4p) 19x19, H8    CrazyStone
2008: win against a pro (4p) 19x19, H7    CrazyStone
2009: win against a pro (9p) 19x19, H7        MoGo
2009: win against a pro (1p) 19x19, H6        MoGo
2010: win against a pro (4p) 19x19, H6          Zen

2007: win against a pro (5p) 9x9 (blitz)     MoGo
2008: win against a pro (5p) 9x9 white       MoGo
2009: win against a pro (5p) 9x9 black       MoGo
2009: win against a pro (9p) 9x9 white       Fuego
2009: win against a pro (9p) 9x9 black       MoGoTW

==> still 6 stones at least!
Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9        MoGo
2008: win against a pro (4p) 19x19, H8    CrazyStone
2008: win against a pro (4p) 19x19, H7    CrazyStone
2009: win against a pro (9p) 19x19, H7        MoGo
2009: win against a pro (1p) 19x19, H6        MoGo
2010: win against a pro (4p) 19x19, H6          Zen
                                           Wins with H6 / H7
                                            are lucky (rare)
2007: win against a pro (5p) 9x9 (blitz)           MoGo
                                                  wins
2008: win against a pro (5p) 9x9 white            MoGo
2009: win against a pro (5p) 9x9 black            MoGo
2009: win against a pro (9p) 9x9 white            Fuego
2009: win against a pro (9p) 9x9 black            MoGoTW

==> still 6 stones at least!
Example 3: decision making
under uncertainty: parallelizing

So what happened ?

great speed-up + moderate results;
= contradiction ? ? ?
Example 3: decision making
under uncertainty: parallelizing

So what happened ?

great speed-up + moderate results;
= contradiction ? ? ?


Ok, we can simulate the sequential algorithm very
quickly = success.
But even the sequential algorithm is limited, even
with huge computation time!
Example 3: decision making
under uncertainty: parallelizing

Poorly
handled
situation,
even with
10 days of
CPU !
Example 3: decision making
under uncertainty: limited
scalability

(game of Havannah)




==> killed by the bias!
Example 3: decision making
under uncertainty: limited
scalability

(game of Go)




==> bias trouble ! ! !
we reduce the variance but not the
 systematic bias.
Conclusions

We have seen that “good old”
 bias/variance analysis is
  quite efficient;
  not widely known / used.
Conclusions

easy tricks for evolutionary optimization on
 grids
==> we published papers with great
 speed-ups with just one line of code:
  Reweighting mainly,
  and also
    quasi-random,
    selective pressure modified for large pop size.
Conclusions
easy tricks for supervised machine
 learning:
==> bias/variance analysis here boils
 down to: choose an algorithm with more
 variance than bias and average:
  random subspace;
  random subset (subagging);
  noise introduction;
  “hyper”parameters to be tuned (cross-
   validation).
Conclusions

For sequential decision making under
 uncertainty, disappointing results:
 the best algorithms are not
“that” scalable.


A systematic bias remains.
Conclusions and references

Our experiments: often on Grid5000:
  ~5000 cores         - Linux
  homogeneous environment
  union of high-performance clusters
  contains multi-core machines
Monte-Carlo Tree Search for decision
 making and uncertainty: Coulom, Kocsis
 & Szepesvari, Chaslot et al,...
For parallel evolutionary algorithms: Beyer
 et al, Teytaud et al (this Teytaud is not me...).

Más contenido relacionado

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Último (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Parallel Artificial Intelligence and Parallel Optimization: a Bias and Variance Point of View

  • 1. High-performance computing High performance computing in Artificial Intelligence & Optimization Olivier.Teytaud@inria.fr + many people TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud, Digiteo Labs, Pascal Network of Excellence. NCHC, Taiwan. November 2010.
  • 2. Disclaimer Many works in parallelism are about technical tricks on SMP programming, message-passing, network organization. ==> often moderate improvements, but for all users using a given library/methodology Here, opposite point of view: Don't worry for 10% loss due to suboptimal programming Try to benefit from huge machines
  • 3. Outline Parallelism Bias & variance AI & Optimization Optimization Supervised machine learning Multistage decision making Conclusions
  • 4. Parallelism Basic principle (here!): Using more CPUs for being faster
  • 5. Parallelism Basic principle (here!): Using more CPUs for being faster Various cases: Many cores in one machine (shared memory)
  • 6. Parallelism Basic principle (here!): Using more CPUs for being faster Various cases: Many cores in one machine (shared memory) Many cores on a same fast network (explicit fast communications)
  • 7. Parallelism Basic principle (here!): Using more CPUs for being faster Various cases: Many cores in one machine (shared memory) Many cores on a same fast network (explicit fast communications) Many cores on a network (explicit slow communications)
  • 8. Parallelism Various cases: Many cores in one machine (shared memory) ==> your laptop Many cores on a same fast network (explicit fast communications) ==> your favorite cluster Many cores on a network (explicit slow communications) ==> your grid or your lab or internet
  • 9. Parallelism Definitions: p = number of processors Speed-up(P) = ratio Time for reaching precision  when p=1 ------------------------------------------------------------- Time for reaching precision  when p=P Efficiency(p) = speed-up(p)/p (usually at most 1)
  • 10. Outline Parallelism Bias & variance AI & Optimization Optimization Supervised machine learning Multistage decision making Conclusions
  • 11. Bias and variance I compute x on a computer. It's imprecise, I get x'. How can I parallelize this to make it faster ?
  • 12. Bias and variance I compute x on a computer. It's imprecise, I get x'. What happens if I compute x 1000 times, on 1000 different machines ? I get x'1,...,x'1000. x' = average( x'1,...,x'1000 )
  • 13. Bias and variance x' = average( x'1,...,x'1000 ) If the algorithm is deterministic: all x'i are equal no benefit Speed-up = 1, efficiency → 0 ==> not good! (trouble=bias!)
  • 14. Bias and variance x' = average( x'1,...,x'1000 ) If the algorithm is deterministic: all x'i are equal no benefit Speed-up = 1, efficiency → 0 ==> not good! If unbiased Monte-Carlo estimate: - speed-up=p, efficiency=1 ==> ideal case! (trouble = variance)
  • 15. Bias and variance, concluding Two classical notions for an estimator x': Bias = E (x' – x) Variance E (x' – Ex')2 Parallelism can easily reduce variance; parallelism can not easily reduce the bias.
  • 16. AI & optimization: bias & variance everywhere Parallelism Bias & variance AI & Optimization Optimization Supervised machine learning Multistage decision making Conclusions
  • 17. AI & optimization: bias & variance everywhere Many (parts of) algorithms can be rewritten as follows: Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 18. Example 1: evolutionary optimization While (I have time) Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 19. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution While (I have time) Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 20. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1,...,y. Update knowledge.
  • 21. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update knowledge.
  • 22. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x)  2=var(x1,...,x)
  • 23. Example 1: evolutionary optimization Initial knowledge = Gaussian MANY EVOLUTIONARY distribution G (mean m, variance  2) ALGORITHMS ARE WEAK FOR While (I have time) LAMBDA LARGE. GenerateBE EASILY OPTIMIZED CAN sample x1,...,x using G BY A BIAS / VARIANCE Work on x1,...,x, get ANALYSIS y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x)  2=var(x1,...,x)
  • 24. Ex. 1: bias & variance for EO Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x) <== unweighted!  2=var(x1,...,x)
  • 25. Ex. 1: bias & variance for EO Huge improvement in EMNA for lambda large just by taking into account bias/variance decomposition: reweighting necessary for cancelling the bias. Other improvements by classical statistical tricks: Reducing  for  large; Using quasi-random mutations. ==> really simple and crucial for large population sizes. (not just for publishing :-) )
  • 26. Ex. 1: bias & variance for EO Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x) <== unweighted!  2=var(x1,...,x)
  • 27. Example 2: supervised machine learning (huge dataset) Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 28. Example 2: supervised machine learning (huge dataset D) Generate data sets D1,...,D using current knowledge (subsets of the database) Work on D1,...,D, get f1,...,f. (by learning) Average the fis. ==> (su)bagging: Di=subset of D ==> random subspace: Di=projection of D on random vector space ==> random noise: Di=D+noise ==> random forest: Di = D, but noisy algo
  • 29. Example 2: supervised machine learning (huge dataset D) Generate data sets D1,...,D using current Easy tricks for parallelizing supervised knowledge (subsets of the database) machine learning: - use (su)bagging Work on D1,...,D, get f1,...,f. (by learning) - use random subspaces Average the of randomized algorithms - use average fis. (random forests) ==> (su)bagging: Di=subset of D - do the cross-validation in parallel ==> random subspace: Di=projection of D on random vector space ==> from my experience, complicated parallel tools ==> randomimportantDi=D+noise are not that noise: … - polemical issue: many papers on sophisticated parallel ==> supervisedforest: Di = D, algorithms; algo random machine learning but noisy - I might be wrong :-)
  • 30. Example 2: active supervised machine learning (huge dataset) While I have time Generate sample x1,...,x using current knowledge (e.g. sample the maxUncertainty region) Work on x1,...,x, get y1,...,y (labels by experts / expensive code) Update knowledge (approximate model).
  • 31. Example 3: decision making under uncertainty While I have time Generate simulations x1,...,x using current knowledge Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (approximate model).
  • 32. UCT (Upper Confidence Trees) Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06)
  • 33. UCT
  • 34. UCT
  • 35. UCT
  • 36. UCT
  • 37. UCT Kocsis & Szepesvari (06)
  • 39. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 40. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 41. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 42. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 )
  • 43. Example 3: decision making under uncertainty While I have time Generate simulation x1,...,x using current knowledge (=scoring rule based on statistics) Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (= update statistics in memory ).
  • 44. Example 3: decision making under uncertainty: parallelizing While I have time Generate simulation x1,...,x using current knowledge (=scoring rule based on statistics) Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (= update statistics in memory ). ==> “easily” parallelized on multicore machines
  • 45. Example 3: decision making under uncertainty: parallelizing While I have time Generate simulation x1,...,x using current knowledge (=scoring rule based on statistics) Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (= update statistics in memory ). ==> parallelized on clusters: one knowledge base per machine, average statistics only for crucial nodes: nodes with more than 5 % of the sims nodes at depth < 4
  • 46. Example 3: decision making under uncertainty: parallelizing Good news first: it's simple and it works on huge clusters ! ! ! Comparison with voting schemes; 40 machines, 2 seconds per move.
  • 47. Example 3: decision making under uncertainty: parallelizing Good news first: it's simple and it works on huge clusters ! ! ! Comparing N machines and P machines ==> consistent with linear speed-up in 19x19 !
  • 48. Example 3: decision making under uncertainty: parallelizing Good news first: it's simple and it works on huge clusters ! ! ! When we have produced these numbers, we believed we were ready to play Go against very strong players. Unfortunately not at all :-)
  • 49. Go: from 29 to 6 stones 1998: loss against amateur (6d) 19x19 H29 2008: win against a pro (8p) 19x19, H9 MoGo 2008: win against a pro (4p) 19x19, H8 CrazyStone 2008: win against a pro (4p) 19x19, H7 CrazyStone 2009: win against a pro (9p) 19x19, H7 MoGo 2009: win against a pro (1p) 19x19, H6 MoGo 2010: win against a pro (4p) 19x19, H6 Zen 2007: win against a pro (5p) 9x9 (blitz) MoGo 2008: win against a pro (5p) 9x9 white MoGo 2009: win against a pro (5p) 9x9 black MoGo 2009: win against a pro (9p) 9x9 white Fuego 2009: win against a pro (9p) 9x9 black MoGoTW ==> still 6 stones at least!
  • 50. Go: from 29 to 6 stones 1998: loss against amateur (6d) 19x19 H29 2008: win against a pro (8p) 19x19, H9 MoGo 2008: win against a pro (4p) 19x19, H8 CrazyStone 2008: win against a pro (4p) 19x19, H7 CrazyStone 2009: win against a pro (9p) 19x19, H7 MoGo 2009: win against a pro (1p) 19x19, H6 MoGo 2010: win against a pro (4p) 19x19, H6 Zen Wins with H6 / H7 are lucky (rare) 2007: win against a pro (5p) 9x9 (blitz) MoGo wins 2008: win against a pro (5p) 9x9 white MoGo 2009: win against a pro (5p) 9x9 black MoGo 2009: win against a pro (9p) 9x9 white Fuego 2009: win against a pro (9p) 9x9 black MoGoTW ==> still 6 stones at least!
  • 51. Example 3: decision making under uncertainty: parallelizing So what happened ? great speed-up + moderate results; = contradiction ? ? ?
  • 52. Example 3: decision making under uncertainty: parallelizing So what happened ? great speed-up + moderate results; = contradiction ? ? ? Ok, we can simulate the sequential algorithm very quickly = success. But even the sequential algorithm is limited, even with huge computation time!
  • 53. Example 3: decision making under uncertainty: parallelizing Poorly handled situation, even with 10 days of CPU !
  • 54. Example 3: decision making under uncertainty: limited scalability (game of Havannah) ==> killed by the bias!
  • 55. Example 3: decision making under uncertainty: limited scalability (game of Go) ==> bias trouble ! ! ! we reduce the variance but not the systematic bias.
  • 56. Conclusions We have seen that “good old” bias/variance analysis is quite efficient; not widely known / used.
  • 57. Conclusions easy tricks for evolutionary optimization on grids ==> we published papers with great speed-ups with just one line of code: Reweighting mainly, and also quasi-random, selective pressure modified for large pop size.
  • 58. Conclusions easy tricks for supervised machine learning: ==> bias/variance analysis here boils down to: choose an algorithm with more variance than bias and average: random subspace; random subset (subagging); noise introduction; “hyper”parameters to be tuned (cross- validation).
  • 59. Conclusions For sequential decision making under uncertainty, disappointing results: the best algorithms are not “that” scalable. A systematic bias remains.
  • 60. Conclusions and references Our experiments: often on Grid5000: ~5000 cores - Linux homogeneous environment union of high-performance clusters contains multi-core machines Monte-Carlo Tree Search for decision making and uncertainty: Coulom, Kocsis & Szepesvari, Chaslot et al,... For parallel evolutionary algorithms: Beyer et al, Teytaud et al (this Teytaud is not me...).