SlideShare una empresa de Scribd logo
1 de 19
Copyright © 2013 EMC Corporation. All Rights Reserved.
Cientistas
de Dados –
quem e
como?
Ana Oliveira
EMC Centro de P&D
ana.oliveira@emc.com
Copyright © 2013 EMC Corporation. All Rights Reserved.
EMC’s Big Data R&D Center
Centro de Pesquisa
Aplicada no Parque
tecnológico, na Ilha do
Fundão
 Proximidade com a universidade
 Ecossistema:
Schlumberger, Halliburton, BG, Sie
mens, GE, Petrobras, and others
Copyright © 2013 EMC Corporation. All Rights Reserved.
Realidade
Copyright © 2013 EMC Corporation. All Rights Reserved.
Você está pronto para criar,
consumir e gerenciar
40 ZETTABYES
de Dados?
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
De onde vem este Dilúvio?
Geophysical
Exploration
Medical Imaging
Video
SurveillanceMobile Sensors
Video Rendering
Gene Sequencing
Smart Grids
Social Media
READING METERS
EVERY 15 MINS.
IS 3,000X MORE
DATA INTENSIVE COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
$100M IN 2001
TO $10K IN 2011
FACEBOOK
GROWSBY
250 MILLION
PHOTOS / DAY
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Cenário
•Dados estruturados, dados desestruturados, dados
de sensores
•Big Data não é somente sobre os “dados”,
precisamos considerar novas tecnologias como:
• Scale out storage
• Arquiteturas de Bancos de dados Massivamente
paralelos
• Hadoop e o o seu ecossistema
• In-database analytics
• In-memory computing
• Virtualizaçao
• Visualização
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
“The Sexiest Job of the 21st
Century”, Harvard Business Review
(October 2012)
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Role Role Description
Deep Analytical
Talent
People with advanced training in
quantitative disciplines, such as
mathematics, statistics, and machine
learning.
Data Savvy
Professionals
People with a basic knowledge of
statistics and/or machine learning, who
can define key questions that can be
answered using advanced analytics
Technology & Data
Enablers
People providing technical expertise to
support analytical projects. Skills sets
including computer programming and
database administration
Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation,
competition, and productivity
Data
Scientists
Projected U.S.
talent gap:
140,000 to
190,000
Analysts &
Data Savvy
Managers
Projected U.S.
talent gap: 1.5
million
8
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
High
FuturePast TIME
BUSINESS
VALUE
Data
Science
Business
Intelligence
Low
Data Science
• Predictive analysis
• What if…?
Business
Intelligence
• Standard reporting
• What happened?
Qual a diferença?
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Análise - Ciclo de Vida
Copyright © 2013 EMC Corporation. All Rights Reserved.
Processo Analítico - Tradicional
Administrator
Bottleneck
Opaque,
No Collaboration
Reactive,
Unresponsive
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Processo Analítico – Big Data
Self-Service Transparent,
Real Time
Collaboration
Iterative, Agile
Copyright © 2013 EMC Corporation. All Rights Reserved.
Perfil de um Cientista de Dados
Técnico
Quantitativo
Curiosidade
&
Creatividade
Comunicativo
&
Colaborativo
Cético
Copyright © 2013 EMC Corporation. All Rights Reserved.
• Captura
 Programação, Banco de
Dados
 Conhecimento do domínio.
 Modelagem de Dados, DW,
dados não estruturados.
• Análise
 Estatística, ML, Matemática,
Pesquisa Operacional….
• Apresentação
 Visualização
 Storytelling
Skills
Oscar Olmedo, Nasa, data scientist
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
“Analyzing the analyzers”, Harris, Murphy, Vaisman
Como você se vê?
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
“Analyzing the analyzers”, Harris, Murphy, Vaisman
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
“Analyzing the analyzers”, Harris, Murphy, Vaisman
Copyright © 2013 EMC Corporation. All Rights Reserved.
Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
OBRIGADA!

Más contenido relacionado

La actualidad más candente

Tim O'Brien Gartner Symposium EMEA 2012
Tim O'Brien Gartner Symposium EMEA 2012Tim O'Brien Gartner Symposium EMEA 2012
Tim O'Brien Gartner Symposium EMEA 2012
Tim O'Brien
 

La actualidad más candente (20)

ePlus Presents Big Data 101
ePlus Presents Big Data 101ePlus Presents Big Data 101
ePlus Presents Big Data 101
 
Big Data Expo 2015 - IBM Outside the comfort zone
Big Data Expo 2015 - IBM Outside the comfort zoneBig Data Expo 2015 - IBM Outside the comfort zone
Big Data Expo 2015 - IBM Outside the comfort zone
 
Tim O'Brien Gartner Symposium EMEA 2012
Tim O'Brien Gartner Symposium EMEA 2012Tim O'Brien Gartner Symposium EMEA 2012
Tim O'Brien Gartner Symposium EMEA 2012
 
2019 tech forecast jan 2 2019
2019 tech forecast jan 2 20192019 tech forecast jan 2 2019
2019 tech forecast jan 2 2019
 
If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"
 
Artificial intelligence - Digital Readiness.
Artificial intelligence - Digital Readiness.Artificial intelligence - Digital Readiness.
Artificial intelligence - Digital Readiness.
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
 
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBig Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
 
The evolution of the homo data
The evolution of the homo dataThe evolution of the homo data
The evolution of the homo data
 
Improving healthcare with AI
Improving healthcare with AIImproving healthcare with AI
Improving healthcare with AI
 
Iot and ip group
Iot and ip groupIot and ip group
Iot and ip group
 
Big Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictionsBig Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictions
 
Future of Technology
Future of Technology Future of Technology
Future of Technology
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
 
Smart City Alliance final 09 18 2014 cisco+ibm
Smart City Alliance final 09 18 2014 cisco+ibmSmart City Alliance final 09 18 2014 cisco+ibm
Smart City Alliance final 09 18 2014 cisco+ibm
 
How Big Data Will Change Businesses In 2018
How Big Data Will Change Businesses In 2018How Big Data Will Change Businesses In 2018
How Big Data Will Change Businesses In 2018
 
10/28 Top 5 Deep Learning Stories
10/28 Top 5 Deep Learning Stories10/28 Top 5 Deep Learning Stories
10/28 Top 5 Deep Learning Stories
 
Proyecto 2
Proyecto 2 Proyecto 2
Proyecto 2
 
IoT and AI Services in Healthcare | AWS Public Sector Summit 2017
 IoT and AI Services in Healthcare | AWS Public Sector Summit 2017 IoT and AI Services in Healthcare | AWS Public Sector Summit 2017
IoT and AI Services in Healthcare | AWS Public Sector Summit 2017
 

Destacado

Destacado (9)

Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
 
RioInfo 2010 - Fórum de Negócios - Salão da Inovação - Felipe Dorneles
RioInfo 2010 - Fórum de Negócios - Salão da Inovação - Felipe DornelesRioInfo 2010 - Fórum de Negócios - Salão da Inovação - Felipe Dorneles
RioInfo 2010 - Fórum de Negócios - Salão da Inovação - Felipe Dorneles
 
Rio Info 2015 - Projetos de Big Data no Setor Público - Karin Breitman
Rio Info 2015 - Projetos de Big Data no Setor Público - Karin BreitmanRio Info 2015 - Projetos de Big Data no Setor Público - Karin Breitman
Rio Info 2015 - Projetos de Big Data no Setor Público - Karin Breitman
 
Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
 
28/09/2011 - 09h30 às 13h - TI & Petróleo Inyang Effiong
28/09/2011 - 09h30 às 13h - TI & Petróleo Inyang Effiong28/09/2011 - 09h30 às 13h - TI & Petróleo Inyang Effiong
28/09/2011 - 09h30 às 13h - TI & Petróleo Inyang Effiong
 
Seminário TI na Educação - Painel 2 - O papel do vídeo digital no futuro da e...
Seminário TI na Educação - Painel 2 - O papel do vídeo digital no futuro da e...Seminário TI na Educação - Painel 2 - O papel do vídeo digital no futuro da e...
Seminário TI na Educação - Painel 2 - O papel do vídeo digital no futuro da e...
 
Inovação e equipes geograficamente distribuídas - Palestrante: Hugo Fuks
Inovação e equipes geograficamente distribuídas - Palestrante: Hugo FuksInovação e equipes geograficamente distribuídas - Palestrante: Hugo Fuks
Inovação e equipes geograficamente distribuídas - Palestrante: Hugo Fuks
 
Economia Criativa Digital - Incentivos governamentais para a economia criativ...
Economia Criativa Digital - Incentivos governamentais para a economia criativ...Economia Criativa Digital - Incentivos governamentais para a economia criativ...
Economia Criativa Digital - Incentivos governamentais para a economia criativ...
 
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando RibasRio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas
 

Similar a Big data: tendências e oportunidades - Palestrante: Ana Oliveira

UFF Tech 2013 - Big Data - Rafael Borges EMC
UFF Tech 2013 - Big Data - Rafael Borges EMCUFF Tech 2013 - Big Data - Rafael Borges EMC
UFF Tech 2013 - Big Data - Rafael Borges EMC
Sti Uff
 
Compounding Business Value Through Big Data & Advanced Analytics, v2
Compounding Business Value Through Big Data & Advanced Analytics, v2Compounding Business Value Through Big Data & Advanced Analytics, v2
Compounding Business Value Through Big Data & Advanced Analytics, v2
denesuk
 
Cw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emcCw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emc
inevitablecloud
 
vip_day_2._1130_cloud
vip_day_2._1130_cloudvip_day_2._1130_cloud
vip_day_2._1130_cloud
Nicholas Chia
 
reStartEvents TS/SCI & Above Employer Directory 3/31
reStartEvents TS/SCI & Above Employer Directory 3/31reStartEvents TS/SCI & Above Employer Directory 3/31
reStartEvents TS/SCI & Above Employer Directory 3/31
Ken Fuller
 

Similar a Big data: tendências e oportunidades - Palestrante: Ana Oliveira (20)

UFF Tech 2013 - Big Data - Rafael Borges EMC
UFF Tech 2013 - Big Data - Rafael Borges EMCUFF Tech 2013 - Big Data - Rafael Borges EMC
UFF Tech 2013 - Big Data - Rafael Borges EMC
 
Smarter planet and mega trends presentation 2012
Smarter planet and mega trends presentation 2012Smarter planet and mega trends presentation 2012
Smarter planet and mega trends presentation 2012
 
Identifying the Value of Informational Assets Before You Move Them to the Cloud
Identifying the Value of Informational Assets Before You Move Them to the CloudIdentifying the Value of Informational Assets Before You Move Them to the Cloud
Identifying the Value of Informational Assets Before You Move Them to the Cloud
 
Compounding Business Value Through Big Data & Advanced Analytics, v2
Compounding Business Value Through Big Data & Advanced Analytics, v2Compounding Business Value Through Big Data & Advanced Analytics, v2
Compounding Business Value Through Big Data & Advanced Analytics, v2
 
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
 
Cw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emcCw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emc
 
vip_day_2._1130_cloud
vip_day_2._1130_cloudvip_day_2._1130_cloud
vip_day_2._1130_cloud
 
Quelle stratégie pour EMC en 2015 ? Repensons l'IT
Quelle stratégie pour EMC en 2015 ? Repensons l'ITQuelle stratégie pour EMC en 2015 ? Repensons l'IT
Quelle stratégie pour EMC en 2015 ? Repensons l'IT
 
What You Need to Know About SaaS Application Data Protection
What You Need to Know About SaaS Application Data ProtectionWhat You Need to Know About SaaS Application Data Protection
What You Need to Know About SaaS Application Data Protection
 
Beware of the Risk Behind Big Data
Beware of the Risk Behind Big DataBeware of the Risk Behind Big Data
Beware of the Risk Behind Big Data
 
DAMA Big Data & The Cloud 2012-01-19
DAMA Big Data & The Cloud 2012-01-19DAMA Big Data & The Cloud 2012-01-19
DAMA Big Data & The Cloud 2012-01-19
 
Cloud Computing and CDO (April 29).pdf
 Cloud Computing and CDO (April 29).pdf Cloud Computing and CDO (April 29).pdf
Cloud Computing and CDO (April 29).pdf
 
Cloud Infrastructure and Services (CIS) - Webinar
Cloud Infrastructure and Services (CIS) - WebinarCloud Infrastructure and Services (CIS) - Webinar
Cloud Infrastructure and Services (CIS) - Webinar
 
Smart Grid Analytics: All That Remains to be Ready is You
Smart Grid Analytics: All That Remains to be Ready is YouSmart Grid Analytics: All That Remains to be Ready is You
Smart Grid Analytics: All That Remains to be Ready is You
 
How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02
 
reStartEvents TS/SCI & Above Employer Directory 3/31
reStartEvents TS/SCI & Above Employer Directory 3/31reStartEvents TS/SCI & Above Employer Directory 3/31
reStartEvents TS/SCI & Above Employer Directory 3/31
 
GigaOM Putting Big Data to Work by Brett Sheppard
GigaOM Putting Big Data to Work by Brett SheppardGigaOM Putting Big Data to Work by Brett Sheppard
GigaOM Putting Big Data to Work by Brett Sheppard
 
CTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINALCTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINAL
 
The computing age
The computing ageThe computing age
The computing age
 
Vmware cio event barcelona 2014 - no builds
Vmware cio event barcelona 2014 - no buildsVmware cio event barcelona 2014 - no builds
Vmware cio event barcelona 2014 - no builds
 

Más de Rio Info

Más de Rio Info (20)

Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
Rio Info 2015: Painel: Educação digital: experiências e oportunidades - Sylvi...
 
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina DissatRio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
Rio Info 2015 - Desafio de tornar networking em faturamento - Cristina Dissat
 
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio SouzaRio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
Rio Info 2015 - A verdade sobre os instrumentos de inovação - Luiz Claudio Souza
 
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo NavarroRio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
Rio Info 2015 - Salão da Inovação - Argentina - Visual Factory - Pablo Navarro
 
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
Rio Info 2015 - Como captar recursos não reembolsáveis em editais de inovação...
 
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
Rio Info 2015 - Plano de stock options o que fazer e o que não fazer - Marcel...
 
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie WitteRio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
Rio Info 2015 - Empreendendo sonhos compartilhados - Natalie Witte
 
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martinsRio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
Rio Info 2015 - Salão da Inovação - Paraíba - Luiz Maurício Fraga martins
 
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
Rio Info 2015 - Salão da Inovação - Rio Grande do Sul - Leandro Araújo carras...
 
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza - Biomob
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza -  BiomobRio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza -  Biomob
Rio Info 2015 - Salão da Inovação - São Paulo Capital - Valmir Souza - Biomob
 
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
Rio Info 2015 - Salão da Inovação - Amazonas - Senior APP - Dalvanira Santos ...
 
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio OliveiraRio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
Rio Info 2015 - Salão da Inovação - Espírito Santo - Fabrio Oliveira
 
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério GonçalvesRio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
Rio Info 2015 - Salão da Inovação - Paraná - Any Market - Rogério Gonçalves
 
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
Rio Info 2015 - Salão da Inovação - Rio de Janeiro Interior - Luís Gustavo Bo...
 
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto GasteiRio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
Rio Info 2015 - Salão da Inovação - Alagoas - Leandro - Quanto Gastei
 
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - PloogRio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
Rio Info 2015 - Salão da Inovação - Rio de Janeiro - Pedro Pisa - Ploog
 
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus DratovskyRio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
Rio Info 2015 - Salão da Inovação - Sergipe - Marcus Dratovsky
 
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz SantosRio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
Rio Info 2015 - Salão da Inovação - Maranhão - Weldys da Cruz Santos
 
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo FynnRio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
Rio Info 2015 - Salão da Inovação - Uruguai - Ricardo Fynn
 
Rio Info 2015 - Salão da Inovação - Goiás - Pedro Ivo Souza Medeiros - nectar
Rio Info 2015 - Salão da Inovação - Goiás - Pedro Ivo Souza Medeiros - nectarRio Info 2015 - Salão da Inovação - Goiás - Pedro Ivo Souza Medeiros - nectar
Rio Info 2015 - Salão da Inovação - Goiás - Pedro Ivo Souza Medeiros - nectar
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Big data: tendências e oportunidades - Palestrante: Ana Oliveira

  • 1. Copyright © 2013 EMC Corporation. All Rights Reserved. Cientistas de Dados – quem e como? Ana Oliveira EMC Centro de P&D ana.oliveira@emc.com
  • 2. Copyright © 2013 EMC Corporation. All Rights Reserved. EMC’s Big Data R&D Center Centro de Pesquisa Aplicada no Parque tecnológico, na Ilha do Fundão  Proximidade com a universidade  Ecossistema: Schlumberger, Halliburton, BG, Sie mens, GE, Petrobras, and others
  • 3. Copyright © 2013 EMC Corporation. All Rights Reserved. Realidade
  • 4. Copyright © 2013 EMC Corporation. All Rights Reserved. Você está pronto para criar, consumir e gerenciar 40 ZETTABYES de Dados?
  • 5. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. De onde vem este Dilúvio? Geophysical Exploration Medical Imaging Video SurveillanceMobile Sensors Video Rendering Gene Sequencing Smart Grids Social Media READING METERS EVERY 15 MINS. IS 3,000X MORE DATA INTENSIVE COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 FACEBOOK GROWSBY 250 MILLION PHOTOS / DAY
  • 6. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. Cenário •Dados estruturados, dados desestruturados, dados de sensores •Big Data não é somente sobre os “dados”, precisamos considerar novas tecnologias como: • Scale out storage • Arquiteturas de Bancos de dados Massivamente paralelos • Hadoop e o o seu ecossistema • In-database analytics • In-memory computing • Virtualizaçao • Visualização
  • 7. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. “The Sexiest Job of the 21st Century”, Harvard Business Review (October 2012)
  • 8. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. Role Role Description Deep Analytical Talent People with advanced training in quantitative disciplines, such as mathematics, statistics, and machine learning. Data Savvy Professionals People with a basic knowledge of statistics and/or machine learning, who can define key questions that can be answered using advanced analytics Technology & Data Enablers People providing technical expertise to support analytical projects. Skills sets including computer programming and database administration Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation, competition, and productivity Data Scientists Projected U.S. talent gap: 140,000 to 190,000 Analysts & Data Savvy Managers Projected U.S. talent gap: 1.5 million 8
  • 9. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. High FuturePast TIME BUSINESS VALUE Data Science Business Intelligence Low Data Science • Predictive analysis • What if…? Business Intelligence • Standard reporting • What happened? Qual a diferença?
  • 10. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. Análise - Ciclo de Vida
  • 11. Copyright © 2013 EMC Corporation. All Rights Reserved. Processo Analítico - Tradicional Administrator Bottleneck Opaque, No Collaboration Reactive, Unresponsive
  • 12. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. Processo Analítico – Big Data Self-Service Transparent, Real Time Collaboration Iterative, Agile
  • 13. Copyright © 2013 EMC Corporation. All Rights Reserved. Perfil de um Cientista de Dados Técnico Quantitativo Curiosidade & Creatividade Comunicativo & Colaborativo Cético
  • 14. Copyright © 2013 EMC Corporation. All Rights Reserved. • Captura  Programação, Banco de Dados  Conhecimento do domínio.  Modelagem de Dados, DW, dados não estruturados. • Análise  Estatística, ML, Matemática, Pesquisa Operacional…. • Apresentação  Visualização  Storytelling Skills Oscar Olmedo, Nasa, data scientist
  • 15. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. “Analyzing the analyzers”, Harris, Murphy, Vaisman Como você se vê?
  • 16. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. “Analyzing the analyzers”, Harris, Murphy, Vaisman
  • 17. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. “Analyzing the analyzers”, Harris, Murphy, Vaisman
  • 18. Copyright © 2013 EMC Corporation. All Rights Reserved.
  • 19. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. OBRIGADA!

Notas del editor

  1. Hiring of 50+ NationalsHiring and sponsorship of just as many interns and grantsFirst EBC in Latin AmericaThis is an artists rendering of what the facility will look like. [click]. This is what it really looks like right now. I like the pretty picture better [click]
  2. Fonte IDC -
  3. The data deluge is enabled by the Economies of Cloud, but is driven by the connected era avalanche of devices…The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media).
  4. The new data ecosystem driven by the arrival of big data will require 3 archetypical roles to provide services. Here are some professions that represent illustrative examples of each of the 3 main categories.Deep Analytical TalentTechnically savvy, with strong analytical skills Combination of skills to handle raw data, unstructured data and complex analytical techniques at massive scalesNeeds access to magnetic, analytic sandbox Examples of professions: Data Scientists, Statisticians, Economists, MathematiciansData Savvy ProfessionalsExamples of professions: Financial Analysts, Market Research Analysts, Life Scientists, Operations Managers, Business and Functional ManagersTechnology & Data EnablersExamples of professions: Computer programmers, database administrators, computer system analysts
  5. As background, it is important to understand that Business Intelligence is different than data science and analytics. BI deals with reporting on history. What happened last quarter? How many did we sell, etc.Data science is about predicting the future and understanding why things happen. What is the optimal solution? What will happen next?For many companies data science is a new approach to understanding the business yet an important one to undertake today. Gartner states that enterprises who are embracing Big Data and Data Science will outperform their peers by over 20% in the next five years.
  6. Here are 5 main competency and behavioral characteristics for Data Scientists.Quantitative skills, such as mathematics or statistics Technical aptitude, such as software engineering, machine learning, and programming skills. Skeptical…..this may be a counterintuitive trait, although it is important that data scientists can examine their work critically rather than in a one-sided way.Curious & Creative, data scientists must be passionate about data and finding creative ways to solve problems and portray informationCommunicative & Collaborative: it is not enough to have strong quantitative skills or engineering skills. To make a project resonate, you must be able to articulate the business value in a clear way, and work collaboratively with project sponsors and key stakeholders.
  7. Junho 2013
  8. The message here is to make your enterprise “extraordinary” through IT transformation. EMC helps organizations transform their business through cloud enablement (reduce the 85% maintenance to 60%) - reducing the time/cost of “keeping the lights on” while improving business agility (i.e., basic cloud message). We can expand (or not) on this message and actually link it to Oil & Gas by discussing our leadership in virtualization (first step) - most major Oil & Gas companies are already implementing VMWare, EMC offers leading infrastructure platforms, we are building Oil & Gas solutions with our network of Service Provider Partners, etceterea. B) leading a path to business innovation and competitive advantage by leveraging Big Data solutions (shift your spend from 15% innovation to 40% innovation). Later, we can describe our vision of Big Data for Big Oil through the Volume/Variety/Velocity slide along with our $100M investment at the BRDC. C) My recommendation is to put Security below as the “pillar of trust”. It is a pre-requisite to achieving A) & B) and EMC offers the most secure solutions. The key message is that EMC has an end-to-end offering that allows Oil & Gas companies to transform IT from a cost center to an innovation center.
  9. This slide illustrates the maturity of an organization and the amount of time spent on data collection, analyzing, and decision making. Clearly the less mature an organization the more time spent on gathering the data and less on analyzing and making decisions. The goal of an organization is to move to the more mature model where your organization can spend more time using the data to make business decisions.
  10. We think cloud is the next wave of massive disruption in IT that started with Mainframes.It’s disruptive because of the dramatic benefits it delivers to organization in both cost efficiency and agility (bottom line and top line).Its disruptive because it’s built on disruptive technologies and disruptive technology leads to lasting change.In the case of cloud, it’s arguably the most disruptive because we are seeing the IT cloud wave and the consumer cloud wave happen simultaneously.
  11. In the meantime, while everyone was at sleep….a revolution was happenning…Multi-thread – this is OLTP data
  12. To unlock applications from the physical devices
  13. There are multiple characteristics of big data, but 3 stand out as defining Characteristics: Huge volume of data (for instance, tools that can manage billions of rows and billions of columns)Complexity of data types and structures, with an increasing volume of unstructured data (80-90% of the data in existence is unstructured)….part of the Digital Shadow or “Data Exhaust”Speed or velocity of new data creation In addition, the data, due to its size or level of structure,cannot be efficiently analyzed using only traditional databases or methods.There are many examples of emerging big data opportunities and solutions. Here are a few: Netflix suggesting your next movie rental, dynamic monitoring of embedded sensors in bridges to detect real-time stresses and longer-term erosion, and retailers analyzing digital video streams to optimize product and display layouts and promotional spaces on a store-by-store basis are a few real examples of how big data is involved in our lives today. These kinds of big data problems require new tools/technologies to store, manage and realize the business benefit. The new architectures it necessitates are supported by new tools, processes and procedures that enable organizations to create, manipulate and manage these very large data sets and the storage environments that house them.Big data can come in multiple forms. Everything from highly structured financial data, to text files, to multi-media files and genetic mappings. The high volume of the data is a consistent characteristic of big data. As a corollary to this, because of the complexity of the data itself, the preferred approach for processing big data is in parallel computing environments and Massively Parallel Processing (MPP), which enable simultaneous, parallel ingest and data loading and analysis. As we will see in the next slide, most of the big data is unstructured or semi-structured in nature, which requires different techniques and tools to process and analyze.Let us examine the most prominent characteristic: its structure.
  14. Data from the new 2011 Digital Universe from IDC, sponsored by EMCData growing 44XBut IT staff only growing at 1.5X by the end of the decadeOnly way to stay ahead of the data deluge is to increase the volume of information that can be managed per person, using new technologies and productivity tools
  15. The graphic shows different types of data structures, with 80-90% of the future data growth coming from non structured data types (semi, quasi and unstructured). Although the image shows four different, separate types of data, in reality, these can be mixed together at times. For instance, you may have a classic RDBMS storing call logs for a software support call center. In this case, you may have typical structured data such as date/time stamps, machine types, problem type, operating system, which were probably entered by the support desk person from a pull-down menu GUI. In addition, you will likely have unstructured or semi-structured data, such as free form call log information, taken from an email ticket of the problem or an actual phone call description of a technical problem and a solution. The most salient information is often hidden in there. Another possibility would be voice logs or audio transcripts of the actual call that might be associated with the structured data. Until recently, most analysts would NOT be able to analyze the most common and highly structured data in this call log history RDBMS, since the mining of the textual information is very labor intensive and could not be easily automated.
  16. Here are examples of what each of the 4 main different types of data structures may look like. People tend to be most familiar with analyzing structured data, while semi-structured data (shown as XML here), quasi-structured (shown as a clickstream string), and unstructured data present different challenges and require different techniques to analyze.For each data type shown, answer these questions: What type of analytics are performed on these data?Who analyzes this kind of data?What types of data repositories are suited for each, or requirements you may have for storing and cataloguing this kind of data?Who consumes the data?Who manages and owns the data?
  17. …..describe or refer to NO SQL and KVPEveryone and everything is leaving a digital footprint. The graphic above provides a perspective on sources of big data generated by new applications and the scale and growth rate of the data. These applications provide opportunities for new analytics and driving value for organizations.These data come from multiple sources, including:Medical Information, such as genomic sequencing and MRIsIncreased use of broadband on the Web – including the 2 billion photos each month that Facebook users currently upload as well as the innumerable videos uploaded to YouTube and other multimedia sitesVideo surveillanceIncreased global use of mobile devices – the torrent of texting is not likely to ceaseSmart devices – sensor-based collection of information from smart electric grids, smart buildings and many other public and industry infrastructureNon-traditional IT devices – including the use of RFID readers, GPS navigation systems, and seismic processingThe Big Data trend is generating an enormous amount of information that requires advanced analytics and new market players to take advantage of it.
  18. People tend to both love and hate spreadsheets. With their introduction, business users were able to create simple logic on data structured in rows and columns and create their own analyses to business problems. Users do not need heavy training as a database administrator to create spreadsheets, meaning business users could set these up quickly and independent of IT groups. Two main spreadsheet benefits are that they are easy to share and that end users have control over the logic involved. However, their proliferation caused organizations to struggle with “many versions of the truth”, i.e. it was impossible to determine if you had the right version of a spreadsheet, with the most current data and logic in it. Moreover, if a user lost a laptop or it became corrupted, that was the end of the data and its logic. Many organizations still suffer from this challenge (Excel is still on millions of PCs worldwide), which gave rise to the need for centralizing the data. As data needs grew, companies such as Oracle, Teradata, and Microsoft (via SQL Server) offered more scalable data warehousing solutions. These technologies enabled the data to be managed centrally, providing benefits of security, failover, and a single repository where users could rely on getting an “official” source of data for financial reporting or other mission critical tasks. This structure also enabled the creation of OLAP cubes and business intelligence analytical tools, which provided users the ability to access dimensions within this RDBMS quickly, and find answers to streamline reporting needs. Some providers also packaged more advanced logic and the ability to perform more in-depth analytical techniques such as regression and neural networks.<Continued>
  19. Enterprise data warehouses (EDW) are critical for reporting and Business Intelligence (BI) tasks, although from an analyst perspective they tend to restrict the flexibility that a data analyst has for performing robust analysis or data exploration. In this model, data is managed and controlled by IT groups and DBAs, and analysts must depend on IT for access and changes to the data schemas. This tighter control and oversight also means longer lead times for analysts to get data, which generally must come from multiple sources. Another implication is that EDW rules restrict analysts from building data sets, which can cause shadow systems to emerge within organizations containing critical data for constructing analytic data sets, managed locally by power users.Analytic sandboxes enable high performance computing using in-database processing. This approach creates relationships to multiple data sources within an organization and saves the analyst time of creating these data feeds on an individual basis. In-database processing for deep analytics enables faster turnaround time for developing and executing new analytic models, while reducing (though not eliminating) the cost associated with data stored in local, "shadow" file systems. In addition, rather than the typical structured data in the EDW, analytic sandboxes can house a greater variety of data, such as webscale data, raw data, and unstructured data.
  20. The graphic shows a typical data warehouse and some of the challenges that it presents. For source data (1) to be loaded into the EDW, data needs to be well understood, structured and normalized with the appropriate data type definitions. While this kind of centralization enables organizations to enjoy the benefits of security, backup and failover of highly critical data, it also means that data must go through significant pre-processing and checkpoints before it can enter this sort of controlled environment, which does not lend itself to data exploration and iterative analytics.(2) As a result of this level of control on the EDW, shadow systems emerge in the form of departmental warehouses and local data marts that business users create to accommodate their need for flexible analysis. These local data marts do not have the same constraints for security and structure as the EDW does, and allow users across the enterprise to do some level of analysis. However, these one-off systems reside in isolation, often are not networked or connected to other data stores, and are generally not backed up.(3) Once in the data warehouse, data is fed to enterprise applications for business intelligence and reporting purposes. These are high priority operational processes getting critical data feeds from the EDW.<Continued>
  21. (4) At the end of this work flow, analysts get data provisioned for their downstream analytics. Since users cannot run custom or intensive analytics on production databases, analysts create data extracts from the EDW to analyze offline in R or other local analytical tools. Many times these tools are limited to in-memory analytics with desktops analyzing samples of data, rather than the entire population of a data set. Because these analyses are based on data extracts, they live in a separate location and the results of the analysis – and any insights on the quality of the data or anomalies, rarely are fed back into the main EDW repository. Lastly, because data slowly accumulates in the EDW due to the rigorous validation and data structuring process, data is slow to move into the EDW and the schema is slow to change. EDWs may have been originally designed for a specific purpose and set of business needs, but over time evolves to house more and more data and enables business intelligence and the creation of OLAP cubes for analysis and reporting. The EDWs provide limited means to accomplish these goals, achieving the objective of reporting, and sometimes the creation of dashboards, but generally limiting the ability of analysts to iterate on the data in an separate environment from the production environment where they can conduct in-depth analytics, or perform analysis on unstructured data.