Enviar búsqueda
Cargar
Apache Spark: Usage and Roadmap in Hadoop
•
Descargar como PPTX, PDF
•
9 recomendaciones
•
4,367 vistas
Cloudera Japan
Seguir
Presentation to tokyo hug on spark
Leer menos
Leer más
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 18
Descargar ahora
Recomendados
Booktrailer, BCDI et E-sidoc
Booktrailer, BCDI et E-sidoc
Claire Chignard
How Atlassian Scales Bitbucket Data Center on AWS
How Atlassian Scales Bitbucket Data Center on AWS
Atlassian
Importance of data analytics for business
Importance of data analytics for business
BranliticSocial
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
Adam Doyle
Phil Harvey, Microsoft - Data & AI
Phil Harvey, Microsoft - Data & AI
Sagittarius
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
Amazon QuickSight
Amazon QuickSight
Amazon Web Services
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Timothy Spann
Recomendados
Booktrailer, BCDI et E-sidoc
Booktrailer, BCDI et E-sidoc
Claire Chignard
How Atlassian Scales Bitbucket Data Center on AWS
How Atlassian Scales Bitbucket Data Center on AWS
Atlassian
Importance of data analytics for business
Importance of data analytics for business
BranliticSocial
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
Adam Doyle
Phil Harvey, Microsoft - Data & AI
Phil Harvey, Microsoft - Data & AI
Sagittarius
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
Amazon QuickSight
Amazon QuickSight
Amazon Web Services
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Timothy Spann
A Gentle Introduction to Microsoft SSAS
A Gentle Introduction to Microsoft SSAS
John Paredes
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
Data Lake Architecture
Data Lake Architecture
DATAVERSITY
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
BI Presentation
BI Presentation
Dhiren Gala
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
Business Intelligence
Business Intelligence
Hank Lin
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Databricks
Building a Big Data Solution
Building a Big Data Solution
James Serra
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
granitesrijan
Azure HDInsight
Azure HDInsight
Ashish Thapliyal
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
Data analytics
Data analytics
Dr.Bhuvaneswari Velumani
Apache Superset at Airbnb
Apache Superset at Airbnb
Bill Liu
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
Joseph Arriola
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Simplilearn
Amazon QuickSight
Amazon QuickSight
Amazon Web Services
ElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
Tanaka Yuichi
Spark/MapReduceの 機械学習ライブラリ比較検証
Spark/MapReduceの 機械学習ライブラリ比較検証
Recruit Technologies
Más contenido relacionado
La actualidad más candente
A Gentle Introduction to Microsoft SSAS
A Gentle Introduction to Microsoft SSAS
John Paredes
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
Data Lake Architecture
Data Lake Architecture
DATAVERSITY
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
BI Presentation
BI Presentation
Dhiren Gala
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
Business Intelligence
Business Intelligence
Hank Lin
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Databricks
Building a Big Data Solution
Building a Big Data Solution
James Serra
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
granitesrijan
Azure HDInsight
Azure HDInsight
Ashish Thapliyal
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
Data analytics
Data analytics
Dr.Bhuvaneswari Velumani
Apache Superset at Airbnb
Apache Superset at Airbnb
Bill Liu
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
Joseph Arriola
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Simplilearn
Amazon QuickSight
Amazon QuickSight
Amazon Web Services
ElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
La actualidad más candente
(20)
A Gentle Introduction to Microsoft SSAS
A Gentle Introduction to Microsoft SSAS
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
Data Lake Architecture
Data Lake Architecture
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
BI Presentation
BI Presentation
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Business Intelligence
Business Intelligence
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Building a Big Data Solution
Building a Big Data Solution
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
Azure HDInsight
Azure HDInsight
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Data analytics
Data analytics
Apache Superset at Airbnb
Apache Superset at Airbnb
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Amazon QuickSight
Amazon QuickSight
ElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Destacado
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
Tanaka Yuichi
Spark/MapReduceの 機械学習ライブラリ比較検証
Spark/MapReduceの 機械学習ライブラリ比較検証
Recruit Technologies
Sparkストリーミング検証
Sparkストリーミング検証
BrainPad Inc.
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
NTT DATA OSS Professional Services
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
hamaken
Mesos framework API v1
Mesos framework API v1
Mesosphere Inc.
JAWS-DAYS 2015 / 北海道 x 農業 x クラウド
JAWS-DAYS 2015 / 北海道 x 農業 x クラウド
Takehito Tanabe
東急ハンズのクラウドデザインパターン アーキテクチャー編
東急ハンズのクラウドデザインパターン アーキテクチャー編
一成 田部井
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Lucidworks
Neural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計
Cloudera Japan
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
Cloudera Japan
Spark Streamingを使ってみた ~Twitterリアルタイムトレンドランキング~
Spark Streamingを使ってみた ~Twitterリアルタイムトレンドランキング~
sugiyama koki
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
Sparkコミュニティに飛び込もう!(Spark Meetup Tokyo 2015 講演資料、NTTデータ 猿田 浩輔)
Sparkコミュニティに飛び込もう!(Spark Meetup Tokyo 2015 講演資料、NTTデータ 猿田 浩輔)
NTT DATA OSS Professional Services
IoT時代におけるストリームデータ処理と急成長の Apache Flink
IoT時代におけるストリームデータ処理と急成長の Apache Flink
Takanori Suzuki
Apache kudu
Apache kudu
Asim Jalis
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
DataWorks Summit/Hadoop Summit
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
Destacado
(20)
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
初めてのSpark streaming 〜kafka+sparkstreamingの紹介〜
Spark/MapReduceの 機械学習ライブラリ比較検証
Spark/MapReduceの 機械学習ライブラリ比較検証
Sparkストリーミング検証
Sparkストリーミング検証
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
Mesos framework API v1
Mesos framework API v1
JAWS-DAYS 2015 / 北海道 x 農業 x クラウド
JAWS-DAYS 2015 / 北海道 x 農業 x クラウド
東急ハンズのクラウドデザインパターン アーキテクチャー編
東急ハンズのクラウドデザインパターン アーキテクチャー編
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Neural Networks and Deep Learning
Neural Networks and Deep Learning
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
Spark Streamingを使ってみた ~Twitterリアルタイムトレンドランキング~
Spark Streamingを使ってみた ~Twitterリアルタイムトレンドランキング~
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Sparkコミュニティに飛び込もう!(Spark Meetup Tokyo 2015 講演資料、NTTデータ 猿田 浩輔)
Sparkコミュニティに飛び込もう!(Spark Meetup Tokyo 2015 講演資料、NTTデータ 猿田 浩輔)
IoT時代におけるストリームデータ処理と急成長の Apache Flink
IoT時代におけるストリームデータ処理と急成長の Apache Flink
Apache kudu
Apache kudu
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
Similar a Apache Spark: Usage and Roadmap in Hadoop
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark Summit
Get most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
Hadoop world overview trends and topics
Hadoop world overview trends and topics
Valentin Kropov
Hortonworks.bdb
Hortonworks.bdb
Emil Andreas Siemes
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
Spark_Part 1
Spark_Part 1
Shashi Prakash
Apache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
Apache Spark in Scientific Applications
Apache Spark in Scientific Applications
Dr. Mirko Kämpf
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Databricks
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
Slim Baltagi
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
APACHE SPARK.pptx
APACHE SPARK.pptx
DeepaThirumurugan
Similar a Apache Spark: Usage and Roadmap in Hadoop
(20)
Spark One Platform Webinar
Spark One Platform Webinar
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
Get most out of Spark on YARN
Get most out of Spark on YARN
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Hadoop world overview trends and topics
Hadoop world overview trends and topics
Hortonworks.bdb
Hortonworks.bdb
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Spark_Part 1
Spark_Part 1
Apache Spark Fundamentals
Apache Spark Fundamentals
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applications
Apache Spark in Scientific Applications
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
APACHE SPARK.pptx
APACHE SPARK.pptx
Más de Cloudera Japan
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介
Cloudera Japan
HDFS Supportaiblity Improvements
HDFS Supportaiblity Improvements
Cloudera Japan
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
Cloudera Japan
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
Cloudera Japan
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Cloudera Japan
HBase Across the World #LINE_DM
HBase Across the World #LINE_DM
Cloudera Japan
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennight
Cloudera Japan
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
Cloudera Japan
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
Cloudera Japan
Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017
Cloudera Japan
先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方
Cloudera Japan
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Cloudera Japan
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Cloudera Japan
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
Cloudera Japan
Hue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejp
Cloudera Japan
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Cloudera Japan
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Japan
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera Japan
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016
Cloudera Japan
Más de Cloudera Japan
(20)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介
HDFS Supportaiblity Improvements
HDFS Supportaiblity Improvements
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理
HBase Across the World #LINE_DM
HBase Across the World #LINE_DM
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennight
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017
先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
Hue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejp
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016
Último
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Último
(20)
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Apache Spark: Usage and Roadmap in Hadoop
1.
1© Cloudera, Inc.
All rights reserved. Apache Spark: Usage and Roadmap in Hadoop Jai Ranganathan
2.
2© Cloudera, Inc.
All rights reserved. Spark will replace MapReduce To become the standard execution engine for Hadoop
3.
3© Cloudera, Inc.
All rights reserved. The Future of Data Processing on Hadoop Spark complemented by specialized fit-for-purpose engines General Data Processing w/Spark Fast Batch Processing, Machine Learning, and Stream Processing Analytic Database w/Impala Low-Latency Massively Concurrent Queries Full-Text Search w/Solr Querying textual data On-Disk Processing w/MapReduce Jobs at extreme scale and extremely disk IO intensive Shared: • Data Storage • Metadata • Resource Management • Administration • Security • Governance
4.
4© Cloudera, Inc.
All rights reserved. Cloudera Leading the Spark Movement 2013 2014 2015 2016 Identified Spark’s early potential Ships and Supports Spark with CDH 4.4 Spark on YARN integration Announces initiative to make Spark the standard execution engine Launches first Spark training Added security integration Cloudera engineers publish O’Reilly Spark book Leading effort to further performance, usability, and enterprise-readiness
5.
5© Cloudera, Inc.
All rights reserved. Community Initiative: Spark Supersedes MapReduce Stage 1 • Crunch on Spark • Search on Spark Stage 2 • Hive on Spark (beta) • Spark on HBase (beta) Stage 3 • Pig on Spark (alpha) • Sqoop on Spark Community development to port components to Spark:
6.
6© Cloudera, Inc.
All rights reserved. Cloudera Customer Use Cases Core Spark Spark Streaming • Portfolio Risk Analysis • ETL Pipeline Speed-Up • 20+ years of stock dataFinancial Services Health • Identify disease-causing genes in the full human genome • Calculate Jaccard scores on health care data sets ERP • Optical Character Recognition and Bill Classification • Trend analysis • Document classification (LDA) • Fraud analyticsData Services 1010 • Online Fraud Detection Financial Services Health • Incident Prediction for Sepsis Retail • Online Recommendation Systems • Real-Time Inventory Management Ad Tech • Real-Time Ad Performance Analysis
7.
7© Cloudera, Inc.
All rights reserved. Apache Spark Flexible, in-memory data processing for Hadoop Easy Development Flexible Extensible API Fast Batch & Stream Processing • Rich APIs for Scala, Java, and Python • Interactive shell • APIs for different types of workloads: • Batch • Streaming • Machine Learning • Graph • In-Memory processing and caching
8.
8© Cloudera, Inc.
All rights reserved. The Spark Ecosystem & Hadoop Hadoop Integration • Spark-on-YARN integration • Shares data, metadata, administration, security, & governance STORAGE HDFS, HBase RESOURCE MANAGEMENT YARN Spark Impala MR Others Spark Streamin g MLlib SparkSQL GraphX Data- frames SparkR
9.
9© Cloudera, Inc.
All rights reserved. Logistic Regression Performance (Data Fits in Memory) 0 500 1000 1500 2000 2500 3000 3500 4000 1 5 10 20 30 RunningTime(s) # of Iterations MapReduce Spark 110 s/iteration First iteration = 80s Further iterations 1s due to caching
10.
10© Cloudera, Inc.
All rights reserved. Apache Spark Streaming What is it? • Run continuous processing of data using Spark’s core API • Extends Spark concepts to fault-tolerant, transformable streams • Adds “rolling window” operations • Example: Compute rolling averages or counts for data over last five minutes Benefits: • Reuse knowledge and code in both contexts • Same programming paradigm for streaming and batch • Simplicity of development • High-level API with automatic DAG generation • Excellent throughput • Scale easily to support large volumes of data ingest • Combine elements like MLlib and Oryx into streaming applications Common Use Cases: • “On-the-fly” ETL as data is ingested into Hadoop/HDFS • Detect anomalous behavior and trigger alerts • Continuous reporting of summary metrics for incoming data
11.
11© Cloudera, Inc.
All rights reserved. Spark Streaming Architectures Data Sources Ingest Integration Layer • Flume • Kafka Spark Stream Processing Data Prep Aggregation / Scoring HDFS Spark Long-Term Analytics/ Model Building HBase Real-Time Result Serving
12.
12© Cloudera, Inc.
All rights reserved. SparkSQL + Dataframes Machine Learning Applications • Goal: • Spark/Java Developers and Data Scientists can inline SQL into Spark apps • Designed for: • Ease of development for Spark developers • Handful of concurrent Spark jobs • Strengths: • Ease of embedding SQL into Java or Scala applications • SQL for common functionality in developer flow (eg. aggregations, filters, samples)
13.
13© Cloudera, Inc.
All rights reserved. Execution Pipeline SQL AST Logical Plan Optimized Logical Plan Logical Plan Physical Plans CBO Selected Plan RDDsRDDsRDDs Dataframes
14.
14© Cloudera, Inc.
All rights reserved. Uniting Spark and Hadoop The One Platform Initiative Management Leverage Hadoop-native resource management. Security Full support for Hadoop security and beyond. Scale Enable 10k-node clusters. Streaming Support for 80% of common stream processing workloads.
15.
15© Cloudera, Inc.
All rights reserved. Management Security Scale Streaming • Spark on YARN Integration • HBase integration • Improved metrics for monitoring/troubleshooting • Dynamic Resource Allocation • Spark on YARN: • Container resizing • Dynamic Resource Allocation for Streaming • Simplified resource configuration • Improved WebUI for debugging • Improved metrics for visibility into resource utilization • Smart auto-tuning of job parameters • Kerberos Integration • HDFS Sync (Sentry) • Secure data at rest • Secure data over the wire • Audit/Lineage (Navigator) • Spark PCI compliance • Integration with Intel’s advanced encryption libraries • Enable column and view level security • Revamp Scheduler handling of node failure • Sort based shuffle improvements • Task Scheduling based on HDFS data locality and caching • Scheduler improvements for performance at scale • Stress test at scale with mixed multi-tenant workloads • HDFS DDM Integration • Dynamic resource utilization & prioritization • Scale Spark History Server for 1000s of jobs • Zero Data Loss with Spark Streaming Resilience • Flume integration • Kafka integration • SQL semantics for expressing streaming jobs (Business Users) • New streaming specific API extensions • Streaming application management (pause, update, redeploy) via CM • Optimized state updates: efficient point lookups and delta updates Detailed Roadmap: One Platform Initiative = Completed Work = Planned Future Work
16.
16© Cloudera, Inc.
All rights reserved. Spark Resources • Learn Spark • O’Reilly Advanced Analytics with Spark eBook (written by Clouderans) • Cloudera Developer Blog • cloudera.com/spark • Get Trained • Cloudera Spark Training • Try it Out • Cloudera Live Spark Tutorial
17.
17© Cloudera, Inc.
All rights reserved. Try It With Cloudera Live cloudera.com/live Featuring tutorials on: CDH
18.
18© Cloudera, Inc.
All rights reserved. Thank You Jairam Ranganathan jairam@cloudera.com
Descargar ahora