Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Client Data
Fluency
Office
Skype
Bing
Modern Data
Capability
Instrumentation & Ingestion
Processing & Storage
Reporting & ...
Data Size	
Query
Latency
Get results inline
in Zeppelin
Need to open the
results in Excel
0 20 40 60 80 100 120 140 160 180 200
Cosmos
SparkSQL
SparkSQL with Cache
Write and Compile Query Submit and Wait in Job Q...
Mesos Cluster/HDFS	
Job Manager	
 Zookeeper	
Job Frontend Web API	
Spark Driver Host Pool	
Spark Hive Thrift Server	
Zeppe...
Partition	
  1
Partition	
  2
...
Partition	
  n
Export	
  Cosmos	
  
Partition
Partition	
  1
Partition	
  2
...
Partitio...
<Database2>
<Table1>
<Database1>
<Partition1>
<Table2>
<Partition2>
MetastoreDB
Hive	
  Thrift	
  Server
Hive	
  Loader
Ze...
Data
Ingest
Services
Clients
Transform
Compute
Transform
Compute
Data
Streams
Data
Sets
Store
Event
Processing
HDFSData
Tr...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Próxima SlideShare
Cargando en…5
×

Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)

5.041 visualizaciones

Publicado el

Presentation at Spark Summit 2015

Publicado en: Datos y análisis
  • Inicia sesión para ver los comentarios

Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)

  1. 1. Client Data Fluency Office Skype Bing Modern Data Capability Instrumentation & Ingestion Processing & Storage Reporting & Analytics Information Management Mobile-First Analytics Experience Experimentation
  2. 2. Data Size Query Latency
  3. 3. Get results inline in Zeppelin Need to open the results in Excel
  4. 4. 0 20 40 60 80 100 120 140 160 180 200 Cosmos SparkSQL SparkSQL with Cache Write and Compile Query Submit and Wait in Job Queue Job Run Time
  5. 5. Mesos Cluster/HDFS Job Manager Zookeeper Job Frontend Web API Spark Driver Host Pool Spark Hive Thrift Server Zeppelin Server Avocado (Hive Query + Schedule Task) Rover (Drag & Drop BI tool with Hive Code Gen) Zeppelin Web UI MetastoreDB Hive Loader Cosmos Storage
  6. 6. Partition  1 Partition  2 ... Partition  n Export  Cosmos   Partition Partition  1 Partition  2 ... Partition  n Task  2 HDFS.copyFrom LocalFile ... Task  n Partition  1 Partition  2 ... Partition  n saveAsParquetFile Task  2 ... Task  n
  7. 7. <Database2> <Table1> <Database1> <Partition1> <Table2> <Partition2> MetastoreDB Hive  Thrift  Server Hive  Loader Zeppelin  Server UserQuery Query
  8. 8. Data Ingest Services Clients Transform Compute Transform Compute Data Streams Data Sets Store Event Processing HDFSData Transportation Spark Streaming Receiver Analyst Zeppelin Notebooks Avocado Simple query Query language “Analyze” “Debug” “Mine” “Glance” Dat a Unified platform Intelligence Interactive analytics Data Products Better Digital Experience s Dual users “Bing” “Office”

×