SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
MAPREDUCE PARA O
MÉTODO DE REGRESSÃO
    POR MÍNIMOS
 QUADRADOS PARCIAIS
       (MRPLS)

     MACHINE LEARNING I
       LEANDRO ALVIM
      PROF. RUY MILIDIÚ

                          1
MOTIVAÇÃO



CONSTRUIR MODELOS
MAIS ROBUSTOS

UTILIZAÇÃO DO PLS                           PLS                           PLS
                    TEMPO ( PLS/                  TEMPO ( PLS/
                      TOTAL )                       TOTAL )

PROBLEMA              100                           100
                       75                            75
  DESEMPENHO           50                            50
                       25                            25
                        0                             0
                             1     10 20 30               27k 54k 108k 216k
                                   N. FATORES                    N. EXEMPLOS



                                                                                2
MOTIVAÇÃO


PROBLEMA
                    T   X   Y
 PLS - DUAS FASES

   TREINO               Q   B
   (CUSTOSO)

   TESTE




                                3
OBJETIVO


MODELO PLS

 VOLUME ELEVADO DE DADOS

 FASE DE TREINAMENTO

   ALGORITMOS: PLS1 (USA NIPALS), PLS2

   PARADIGMA MAPREDUCE




                                         4
OBJETIVO


INVESTIGAR                  PLS   MRPLS


    DESEMPENHO

    EFICIÊNCIA

    VOLUME DE DADOS

    MODELO




                                          5
MAPREDUCE

DESENVOLVIDO PELA GOOGLE

PARADIGMA DE PROGRAMAÇÃO (CLOUD COMPUTING)

  OBJETIVO

    SIMPLIFICAR A PROGRAMAÇÃO - GRANDES
    VOLUMES DE DADOS

      MASCARAR O PARADIGMA MESTRE/ESCRAVO




                                             6
MAPREDUCE

PROBLEMA

 CONTAGEM DE PALAVRAS

   ENTRADA = [BANANA,MELÃO,MAÇÃ,MELÃO,MAÇÃ]


   SAÍDA DESEJADA = {BANANA: 1, MELÃO: 2, MAÇÃ: 2}




                                                     7
MAPREDUCE

MAP                    REDUCE

 (BANANA,1);(MELÃO,     (BANANA,[1]);
 1);(MAÇÃ,1);(MELÃO,    (MELÃO,[1,1]);
 1)                     [(MAÇÃ,[1])]

                        SOMAR VALORES
                        POR CHAVE




                                         8
MAPREDUCE




            9
MAPREDUCE




            10
HADOOP
DESENVOLVIDO PELA APACHE

  INSPIRADO NO GFS/MAPREDUCE

PLATAFORMA

    OBJETIVOS

      EXECUTAR APLICAÇÕES PARA GRANDES
      VOLUMES DE DADOS

      MÁQUINAS DE CUSTO BAIXO

      EFICIENTE (PARALELISMO LOCAL)

      CONFIÁVEL (HDFS)

                                         11
HADOOP




         12
DATASET

TOY-DATASET (MEAT)

  APROX. 200 EXEMPLOS, 100 CARACTS. E 3 VAR.
  DEPENDENTES

TOY-DATASET

  REPLICAR CONJUNTO DE EXEMPLOS

  1M EXEMPLOS X 100 CARACT. E 3 VAR. DEPENDENTES




                                                   13
METODOLOGIA


ELABORAR A VERSÃO MAPREDUCE DO PLS

ANALISAR A CORRETUDE DOS ALGORITMOS

PREPARAR O DATASET

SIMULAÇÃO

 AMBIENTE PSEUDO-DISTRIBUIDO




                                      14
METODOLOGIA


ESCOLHER/PREPARAR AMBIENTE REAL(CLUSTER)

ANALISAR O TEMPO DE PROCESSAMENTO - MÉTRICAS

  SPEEDUP (SP = TS/TP)

    LINEAR? (SP=P)

  EFICIENCY (EP = SP/P)

RELATÓRIO




                                               15
FERRAMENTAS/EXPERIMENTOS

HADOOP (HDFS)

  HADOOP STREAMING

FRAMEWORK LEARNTRADE

CLUSTER DA TECGRAF




                           16
CRONOGRAMA

 ELABORAR A VERSAO MAPREDUCE DO
                                    ok    07/09/08 - 20/09/08
PLS

    ANALISAR A CORRETUDE DOS
                                    ok    20/09/08 - 22/09/08
    ALGORITMOS

    PREPARAR UM DATASET PARA
                                    ok         01/10/08
    TESTE

    SIMULACAO EM AMBIENTE PSEUDO-
                                    ok    01/10/08 - 03/10/08
    DISTRIBUIDO


    ESCOLHER/PREPARAR AMBIENTE
                                    ok    20/09/08 - 07/09/08
    PARA OS TESTES


    ANALISAR O TEMPO DE
                                    nok   08/09/08 - ??/??/08
    PROCESSAMENTO - METRICAS



    ESCREVER UM RELATORIO           nok    ??/??/08-??/??/08




                                                                17
REFERÊNCIAS



MILIDIU, R. L. ; RENTERIA, Raul . DPLS and PPLS: Two PLS Algorithms for Large Data Sets. Computational Statistics and Data Analysis, v. 48, p. 125-138, 2005.



MapReduce: Simplified Data Processing on Large Clusters


Hadoop Distributed File System


Hadoop Map/Reduce




                                                                                                                                                                18

Más contenido relacionado

Destacado

Lenguaje sas2
Lenguaje sas2Lenguaje sas2
Lenguaje sas2
azmeneses
 
SEM (Structural Equational Model)
SEM (Structural Equational Model)SEM (Structural Equational Model)
SEM (Structural Equational Model)
Chinchilla1984
 
Spss Tutorial 1
Spss Tutorial 1Spss Tutorial 1
Spss Tutorial 1
vinod
 

Destacado (20)

Unidad1. investigación en las ciencias sociales
Unidad1. investigación en las ciencias socialesUnidad1. investigación en las ciencias sociales
Unidad1. investigación en las ciencias sociales
 
Informe de Movilidad Social en México
Informe de Movilidad Social en MéxicoInforme de Movilidad Social en México
Informe de Movilidad Social en México
 
Correspondencias
CorrespondenciasCorrespondencias
Correspondencias
 
Lenguaje sas2
Lenguaje sas2Lenguaje sas2
Lenguaje sas2
 
Sedesol29ene2013
Sedesol29ene2013Sedesol29ene2013
Sedesol29ene2013
 
04-02-11 Migracion en Mexico - Dr. Cesar Lenin
04-02-11 Migracion en Mexico - Dr. Cesar Lenin04-02-11 Migracion en Mexico - Dr. Cesar Lenin
04-02-11 Migracion en Mexico - Dr. Cesar Lenin
 
Análisis de Correspondencias
Análisis de CorrespondenciasAnálisis de Correspondencias
Análisis de Correspondencias
 
Taller de Ecuaciones Estructurales
Taller de Ecuaciones Estructurales Taller de Ecuaciones Estructurales
Taller de Ecuaciones Estructurales
 
SEM (Structural Equational Model)
SEM (Structural Equational Model)SEM (Structural Equational Model)
SEM (Structural Equational Model)
 
Spss Tutorial 1
Spss Tutorial 1Spss Tutorial 1
Spss Tutorial 1
 
Manova
ManovaManova
Manova
 
Manova mb
Manova mbManova mb
Manova mb
 
Escalamiento Multidimensional
Escalamiento MultidimensionalEscalamiento Multidimensional
Escalamiento Multidimensional
 
Análisis multivariado de varianza manova
Análisis multivariado de varianza manovaAnálisis multivariado de varianza manova
Análisis multivariado de varianza manova
 
Introduction to sas in spanish
Introduction to sas in spanishIntroduction to sas in spanish
Introduction to sas in spanish
 
Creación de un modelo pls sem con smart pls y análsiis de resultados
Creación de un modelo pls sem con smart pls y análsiis de resultadosCreación de un modelo pls sem con smart pls y análsiis de resultados
Creación de un modelo pls sem con smart pls y análsiis de resultados
 
Manova Report
Manova ReportManova Report
Manova Report
 
Manova
ManovaManova
Manova
 
Evidencias en la rehabilitación del hombro doloroso
Evidencias en la rehabilitación del hombro dolorosoEvidencias en la rehabilitación del hombro doloroso
Evidencias en la rehabilitación del hombro doloroso
 
Ecuaciones Estructurales
Ecuaciones EstructuralesEcuaciones Estructurales
Ecuaciones Estructurales
 

Similar a MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS)

Variation aware design of custom integrated circuits a hands on field guide
Variation aware design of custom integrated circuits  a hands on field guideVariation aware design of custom integrated circuits  a hands on field guide
Variation aware design of custom integrated circuits a hands on field guide
Springer
 
Chemical process debottlenecking
Chemical process debottleneckingChemical process debottlenecking
Chemical process debottlenecking
Stephen (Steve) Galante
 
Caret max kuhn
Caret max kuhnCaret max kuhn
Caret max kuhn
kmettler
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognition
Sebastian Hafner
 

Similar a MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS) (20)

Variation aware design of custom integrated circuits a hands on field guide
Variation aware design of custom integrated circuits  a hands on field guideVariation aware design of custom integrated circuits  a hands on field guide
Variation aware design of custom integrated circuits a hands on field guide
 
MSc group project presentation
MSc group project presentationMSc group project presentation
MSc group project presentation
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
Chemical process debottlenecking
Chemical process debottleneckingChemical process debottlenecking
Chemical process debottlenecking
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Prediction of soil properties with NIR data and site descriptors using prepro...
Prediction of soil properties with NIR data and site descriptors using prepro...Prediction of soil properties with NIR data and site descriptors using prepro...
Prediction of soil properties with NIR data and site descriptors using prepro...
 
ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07
 
A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...
A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...
A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...
 
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Aghora A High-Order DG Solver for Turbulent Flow Simulations.pdf
Aghora  A High-Order DG Solver for Turbulent Flow Simulations.pdfAghora  A High-Order DG Solver for Turbulent Flow Simulations.pdf
Aghora A High-Order DG Solver for Turbulent Flow Simulations.pdf
 
Caret Package for R
Caret Package for RCaret Package for R
Caret Package for R
 
Caret max kuhn
Caret max kuhnCaret max kuhn
Caret max kuhn
 
The ExoMars Sample Handling and Distribution Subsystem (SPDS)
The ExoMars Sample Handling and Distribution Subsystem (SPDS)The ExoMars Sample Handling and Distribution Subsystem (SPDS)
The ExoMars Sample Handling and Distribution Subsystem (SPDS)
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognition
 

Más de elliando dias

Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
elliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
elliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
elliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
elliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
elliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
elliando dias
 

Más de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS)

  • 1. MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS) MACHINE LEARNING I LEANDRO ALVIM PROF. RUY MILIDIÚ 1
  • 2. MOTIVAÇÃO CONSTRUIR MODELOS MAIS ROBUSTOS UTILIZAÇÃO DO PLS PLS PLS TEMPO ( PLS/ TEMPO ( PLS/ TOTAL ) TOTAL ) PROBLEMA 100 100 75 75 DESEMPENHO 50 50 25 25 0 0 1 10 20 30 27k 54k 108k 216k N. FATORES N. EXEMPLOS 2
  • 3. MOTIVAÇÃO PROBLEMA T X Y PLS - DUAS FASES TREINO Q B (CUSTOSO) TESTE 3
  • 4. OBJETIVO MODELO PLS VOLUME ELEVADO DE DADOS FASE DE TREINAMENTO ALGORITMOS: PLS1 (USA NIPALS), PLS2 PARADIGMA MAPREDUCE 4
  • 5. OBJETIVO INVESTIGAR PLS MRPLS DESEMPENHO EFICIÊNCIA VOLUME DE DADOS MODELO 5
  • 6. MAPREDUCE DESENVOLVIDO PELA GOOGLE PARADIGMA DE PROGRAMAÇÃO (CLOUD COMPUTING) OBJETIVO SIMPLIFICAR A PROGRAMAÇÃO - GRANDES VOLUMES DE DADOS MASCARAR O PARADIGMA MESTRE/ESCRAVO 6
  • 7. MAPREDUCE PROBLEMA CONTAGEM DE PALAVRAS ENTRADA = [BANANA,MELÃO,MAÇÃ,MELÃO,MAÇÃ] SAÍDA DESEJADA = {BANANA: 1, MELÃO: 2, MAÇÃ: 2} 7
  • 8. MAPREDUCE MAP REDUCE (BANANA,1);(MELÃO, (BANANA,[1]); 1);(MAÇÃ,1);(MELÃO, (MELÃO,[1,1]); 1) [(MAÇÃ,[1])] SOMAR VALORES POR CHAVE 8
  • 10. MAPREDUCE 10
  • 11. HADOOP DESENVOLVIDO PELA APACHE INSPIRADO NO GFS/MAPREDUCE PLATAFORMA OBJETIVOS EXECUTAR APLICAÇÕES PARA GRANDES VOLUMES DE DADOS MÁQUINAS DE CUSTO BAIXO EFICIENTE (PARALELISMO LOCAL) CONFIÁVEL (HDFS) 11
  • 12. HADOOP 12
  • 13. DATASET TOY-DATASET (MEAT) APROX. 200 EXEMPLOS, 100 CARACTS. E 3 VAR. DEPENDENTES TOY-DATASET REPLICAR CONJUNTO DE EXEMPLOS 1M EXEMPLOS X 100 CARACT. E 3 VAR. DEPENDENTES 13
  • 14. METODOLOGIA ELABORAR A VERSÃO MAPREDUCE DO PLS ANALISAR A CORRETUDE DOS ALGORITMOS PREPARAR O DATASET SIMULAÇÃO AMBIENTE PSEUDO-DISTRIBUIDO 14
  • 15. METODOLOGIA ESCOLHER/PREPARAR AMBIENTE REAL(CLUSTER) ANALISAR O TEMPO DE PROCESSAMENTO - MÉTRICAS SPEEDUP (SP = TS/TP) LINEAR? (SP=P) EFICIENCY (EP = SP/P) RELATÓRIO 15
  • 16. FERRAMENTAS/EXPERIMENTOS HADOOP (HDFS) HADOOP STREAMING FRAMEWORK LEARNTRADE CLUSTER DA TECGRAF 16
  • 17. CRONOGRAMA ELABORAR A VERSAO MAPREDUCE DO ok 07/09/08 - 20/09/08 PLS ANALISAR A CORRETUDE DOS ok 20/09/08 - 22/09/08 ALGORITMOS PREPARAR UM DATASET PARA ok 01/10/08 TESTE SIMULACAO EM AMBIENTE PSEUDO- ok 01/10/08 - 03/10/08 DISTRIBUIDO ESCOLHER/PREPARAR AMBIENTE ok 20/09/08 - 07/09/08 PARA OS TESTES ANALISAR O TEMPO DE nok 08/09/08 - ??/??/08 PROCESSAMENTO - METRICAS ESCREVER UM RELATORIO nok ??/??/08-??/??/08 17
  • 18. REFERÊNCIAS MILIDIU, R. L. ; RENTERIA, Raul . DPLS and PPLS: Two PLS Algorithms for Large Data Sets. Computational Statistics and Data Analysis, v. 48, p. 125-138, 2005. MapReduce: Simplified Data Processing on Large Clusters Hadoop Distributed File System Hadoop Map/Reduce 18