SlideShare una empresa de Scribd logo
1 de 17
BigData 4

       Hadoop
BigData


  Agnieszka Zdebiak
             @AZdebiak
  Agnieszka.Zdebiak.com


                          15 years experience
                          software designer
                          data scientist
                          Prokom Software
                          Asseco Poland
                          Unizeto Technologies
BigData definition



Big Data describes methods & tech
  for highly scalable integration,
         storage & analysis
      of poly-structured data
Using hadoop for …


   • ads and recomendations
   • online travel
   • processing mobile data
   • energy savings and discovery
   • infrastructure management
   • image processing
   • fraud detection
   • IT security
   • health care
NO SQL Database
NO SQL Database




Not Only SQL Database
Hive Query Language (HQL) is as different as Oracle SQL is to MySQL SQL
http://support.karmasphere.com/customer/portal/articles/776853-hive-differences-from-common-rdbmss
BigData - new specialist
BigData - new specialist


     Source System Specialists
     Hadoop Specialists
     Data Warehouse Specialists
     BI Developers  Data Scientists

     IT Infrastructure Specialists
Hadoop workshops
Karmasphere VM:
http://karmasphere.com/try
                      The VM contains:
                                CentOS Linux
                                Cloudera CDH3u4
                                Karmasphere 2.0
                                MySQL 5
VMware Player:
http://www.vmware.com/products/player/

Configuration:
http://support.karmasphere.com/customer/portal/articles/792010-trial-
vm-configuration

Tutorial:
http://support.karmasphere.com/customer/portal/articles/787008-
karmasphere-orientation-videos
Big Data Seminarium

Data Scientist - The Sexiest Job of the 21st Century
Looking for team

Kaggle.com competitions   (35 dni next deadline)
Looking for team
Rozwój aplikacji
http://apps.facebook.com/findgiftsforfriends/
Big Data



  Let`s talk
 Agnieszka.Zdebiak.com
           @AZdebiak
  Agnieszka.Zdebiak@gmail.com

Más contenido relacionado

La actualidad más candente

Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAmpoolIO
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Delivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea UrsanerDelivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea UrsanerData Con LA
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoopahmed alshikh
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...MongoDB
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 

La actualidad más candente (20)

Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Bigdata
BigdataBigdata
Bigdata
 
Big Data
Big DataBig Data
Big Data
 
BigData Analytics
BigData AnalyticsBigData Analytics
BigData Analytics
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Big Data
Big DataBig Data
Big Data
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Delivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea UrsanerDelivering Quality Open Data by Chelsea Ursaner
Delivering Quality Open Data by Chelsea Ursaner
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Abi
AbiAbi
Abi
 

Similar a Big data hadoop

AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark newAnam Mahmood
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in CytoscapeKeiichiro Ono
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoCodecamp Romania
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?Denodo
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! Embarcadero Technologies
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationVlad Ponomarev
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_clouderaPrem Jain
 
Journey to cloud engineering
Journey to cloud engineeringJourney to cloud engineering
Journey to cloud engineeringMd. Sadhan Sarker
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyPraveen Kumar
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 

Similar a Big data hadoop (20)

AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark new
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in Cytoscape
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
 
BlueData DataSheet
BlueData DataSheetBlueData DataSheet
BlueData DataSheet
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
 
Journey to cloud engineering
Journey to cloud engineeringJourney to cloud engineering
Journey to cloud engineering
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 

Más de Agnieszka Zdebiak

Más de Agnieszka Zdebiak (13)

Data science w ubezpieczeniach
Data science w ubezpieczeniachData science w ubezpieczeniach
Data science w ubezpieczeniach
 
Data science warsaw inaugural meetup
Data science warsaw   inaugural meetupData science warsaw   inaugural meetup
Data science warsaw inaugural meetup
 
Big data w praktyce
Big data w praktyceBig data w praktyce
Big data w praktyce
 
Data scientist start now!
Data scientist   start now!Data scientist   start now!
Data scientist start now!
 
How to start big data projects?
How to start big data projects?How to start big data projects?
How to start big data projects?
 
What kind of Data Scientist do you need?
What kind of Data Scientist do you need?What kind of Data Scientist do you need?
What kind of Data Scientist do you need?
 
Aplikacja Telelo
Aplikacja TeleloAplikacja Telelo
Aplikacja Telelo
 
Let`s be friends with BigData
Let`s be friends with BigDataLet`s be friends with BigData
Let`s be friends with BigData
 
Data Scientist Why now?
Data Scientist Why now?Data Scientist Why now?
Data Scientist Why now?
 
BigBit
BigBitBigBit
BigBit
 
Big data for marketers
Big data for marketersBig data for marketers
Big data for marketers
 
Big data for Brains (part 3)
Big data for Brains (part 3)Big data for Brains (part 3)
Big data for Brains (part 3)
 
Big data for Brains (part 2)
Big data for Brains (part 2)Big data for Brains (part 2)
Big data for Brains (part 2)
 

Big data hadoop

  • 1. BigData 4 Hadoop
  • 2. BigData Agnieszka Zdebiak @AZdebiak Agnieszka.Zdebiak.com 15 years experience software designer data scientist Prokom Software Asseco Poland Unizeto Technologies
  • 3.
  • 4. BigData definition Big Data describes methods & tech for highly scalable integration, storage & analysis of poly-structured data
  • 5.
  • 6.
  • 7. Using hadoop for … • ads and recomendations • online travel • processing mobile data • energy savings and discovery • infrastructure management • image processing • fraud detection • IT security • health care
  • 9. NO SQL Database Not Only SQL Database Hive Query Language (HQL) is as different as Oracle SQL is to MySQL SQL http://support.karmasphere.com/customer/portal/articles/776853-hive-differences-from-common-rdbmss
  • 10.
  • 11. BigData - new specialist
  • 12. BigData - new specialist Source System Specialists Hadoop Specialists Data Warehouse Specialists BI Developers Data Scientists IT Infrastructure Specialists
  • 13. Hadoop workshops Karmasphere VM: http://karmasphere.com/try The VM contains: CentOS Linux Cloudera CDH3u4 Karmasphere 2.0 MySQL 5 VMware Player: http://www.vmware.com/products/player/ Configuration: http://support.karmasphere.com/customer/portal/articles/792010-trial- vm-configuration Tutorial: http://support.karmasphere.com/customer/portal/articles/787008- karmasphere-orientation-videos
  • 14. Big Data Seminarium Data Scientist - The Sexiest Job of the 21st Century
  • 15. Looking for team Kaggle.com competitions (35 dni next deadline)
  • 16. Looking for team Rozwój aplikacji http://apps.facebook.com/findgiftsforfriends/
  • 17. Big Data Let`s talk Agnieszka.Zdebiak.com @AZdebiak Agnieszka.Zdebiak@gmail.com