SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Big Data Infrastructure. 
Appliance, Cloud, or Do-it-Yourself. 
Daniel Steiger 
Discipline Manager Infrastructure Engineering 
BASEL BERN BRUGG GENF LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
1
Unser Unternehmen 
Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem 
Solution-Engineering und der Erbringung von IT-Services mit 
Fokussierung auf und Technologien 
im D-A-CH-Raum. Unsere strategischen Geschäftsfelder... 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
2
Mit über 600 IT- und Fachexperten bei Ihnen vor Ort 
Stuttgart 
Brugg 
2014 © Trivadis 
3 
12 Trivadis Niederlassungen mit 
über 600 Mitarbeitenden 
200 Service Level Agreements 
Mehr als 4'000 Trainingsteilnehmer 
Forschungs- und Entwicklungs-budget: 
CHF 5.0 Mio. / EUR 4.0 Mio. 
Finanziell unabhängig und 
nachhaltig profitabel 
Erfahrung aus mehr als 1'900 Projekten 
pro Jahr bei über 800 Kunden 
(Stand 12/2013) 
3 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
3 
Hamburg 
Düsseldorf 
Frankfurt 
Freiburg München 
Wien 
Basel 
Bern Zürich 
Lausanne
1. Big Data Infrastructure Challenges 
2. Hadoop on an Appliance 
3. Hadoop in the Cloud 
4. Hadoop Do-it-Yourself 
5. Conclusion 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
4 
Agenda
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
5 
Big Data Infrastructure Challenges
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
6 
Trailwise – a "quantified self" use case 
11'000 data points rendered in 165ms 
47'295 data points rendered in 643ms
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
7 
Trailwise – Infrastructure for a Proof of Concept 
7 
§ Hadoop HDFS as data 
store 
§ HBase for real-time data 
access 
§ Hadoop Map/Reduce
2014 © Trivadis 
Concerns… 
§ Scalability 
§ Costs for "always up" 
§ Setup and administration of a 
large cluster on AWS 
§ Break-even cloud vs on-premise 
For a proof of concept hadoop in the 
cloud (e.g. on Amazon EC2) is perfect... 
+ Fast and easy deployment 
+ Optimized Hadoop/HBase setup 
+ HBase real-time performance 
+ Map/Reduce scalability 
+ Affordable, ca. EUR 15.-/day 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
8 
Trailwise – Infrastructure Lessons Learned
§ Big Data means big data volume 
§ Petabytes and exabytes 
§ Scalability 
§ 10, 20, 50, 100, ... cluster nodes 
§ Costs should scale as well... 
§ High demands on machine-to-machine networks 
§ In Big Data for every one-client interaction, there may be hundreds or thousands of 
server and data node interactions 
§ This generates far more east-west (server-to-server or server-to-storage) network traffic 
than north-south (server-to-client or server-to-outside) network traffic 
§ And many others like integration, data protection, operation, etc. 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
9 
Big Data Infrastructure Challenges
§ Infrastructure must be engineered to scale 
§ The network has to provide high bandwidth, 
low latency, and should scale seamlessly 
with Hadoop clusters to provide predictable 
performance 
§ And many more, like 
§ Integration with operational data systems 
§ Authentication, authorization, encryption 
§ Centralized management 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
10 
Infrastructure Requirements 
Figure 1.2: Picture of a row of servers in a Google WSC, 1.6. ARCHITECTURAL Will my infrastructure 
meet my needs 
now and in the future 
without putting my 
business at risk?
When enterprises adopt Hadoop, one of the decisions they must make is the 
deployment model. There are four options: 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
11 
Where to Deploy your Hadoop Cluster? 
When enterprises adopt Hadoop, one of the 
decisions they must make is the deployment 
model. There are four options as illustrated 
in Figure 1: 
‡On-premise full custom. With this 
option, businesses purchase commodity 
hardware, then they install software and 
There have existed two divergent views 
related to the price-performance ratio 
for Hadoop deployments. One view is 
that a virtualized Hadoop cluster is 
slower because Hadoop’s workload has 
intensive I/O operations, which tend to 
run slowly on virtualized environments. 
A related and fourth area is data 
enrichment, which involves leveraging 
multiple datasets to uncover new insights. 
For example, combining a consumer’s 
purchase history and social-networking 
activities can yield a deeper understanding 
of the consumer’s lifestyle and key personal 
Figure 1. The spectrum of Hadoop deployment options 
On-premise 
full custom 
Hadoop 
appliance 
Hadoop 
hosting 
Hadoop-as-a- 
Service 
Bare-metal Cloud 
Reference: Hadoop Deployment Comparison Study, Price-Performance Comparison, Accenture Technology Labs, 2013
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
12 
Hadoop on an Appliance 
Oracle Big Data Appliance
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
13 
Overview: Oracle's Big Data Solution 
§ A complete and optimized 
solution for big data 
§ Tight integration with 
Exadata, Exalogic, 
Exalytics and SPARC 
Supercluster using 
Infiniband network 
§ Single-vendor support for 
both hardware and 
software
Full Rack Configuration (up to 18 racks) 
§ 18 x compute/storage nodes 
Per Node: 
§ 2 x Eight-Core Intel ® Xeon ® E5-2650 V2 Processors 
§ 64 GB Memory (up to 512 GB) 
§ 48 TB Raw Storage Capacity 
§ 40 Gb/sec Infiniband Network 
§ 10 Gb/sec Data Center Connectivity 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
14 
Oracle Big Data Appliance X4-2 HW 
Source: Oracle ®
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
15 
Oracle Big Data Appliance Internal Network Connectivity 
Source: Oracle Big Data Appliance: Datacenter Network Integration, Oracle White Paper, 2012
2014 © Trivadis 
§ Oracle R Distribution 
§ Oracle NoSQL DB Community Ed. 
§ BDA Enterprise Manager Plug-In 
§ Optional Software* 
§ Oracle Big Data SQL 
§ Oracle Big Data Connectors 
§ Oracle Audit Vault  Database Firewall 
for Hadoop Auditing 
§ Oracle Data Integrator 
§ Oracle NoSQL Database EE 
§ Oracle Linux 6.4 with UEK 
§ Oracle Java JDK 7 
§ Cloudera Enterprise Data Hub 
Edition 
§ Apache Hadoop HDFS 
§ HBase 
§ Cloudera Impala 
§ Cloudera Search 
§ Cloudera Manager 
§ Apache Spark 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
16 
Big Data Appliance Software Stack 
*Connectors are licensed separately from Oracle Big Data Appliance
2014 © Trivadis 
§ Oracle R Support for Big Data 
§ R is an open-source language and 
environment for statistical analysis and 
graphing 
§ The standard R distribution is installed 
on all nodes of Oracle Big Data 
Appliance 
§ Oracle R Connector for Hadoop 
provides R users with high-performance, 
native access to HDFS 
and the MapReduce programming 
framework 
§ Oracle R Enterprise is a separate 
package that provides real-time access 
to Oracle Database. 
§ Oracle NoSQL Database 
§ Oracle NoSQL Database is a 
distributed key-value database built on 
storage technology of Berkeley DB 
Java Edition. 
§ An intelligent driver on top of Berkeley 
DB keeps track of the underlying 
storage topology, shards the data and 
knows where data can be placed with 
the lowest latency 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
17 
BDA Specific Software Features
§ Oracle SQL Connector for HDFS 
§ Oracle Loader for Hadoop 
§ Oracle R Connector for Hadoop 
§ Oracle Data Integrator Application Adapter 
for Hadoop 
§ Data in HDFS (and NoSQL) data is 
accessable through relational database external 
table mechanism (HDFS as cluster file system) 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
18 
Oracle Big Data Connectors 
Reference: Oracle Big Data Connectors Data Sheet Source: Oracle ®
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
19 
Oracle Big Data SQL: one tool for all data sources 
Reference: https://www.oracle.com/webfolder/s/delivery_production/docs/FY15h1/doc6/1-T2-BigData.pdf
§ Oracle Big Data Lite VM 
§ http://www.oracle.com/technetwork/database/bigdata-appliance/ 
oracle-bigdatalite-2104726.html 
§ MOS Notes 
§ Information Center: Oracle Big Data Appliance (Doc ID 1445762.2) 
§ Big Data Connectors (ID 1487399.2) 
§ Sqoop Frequently Asked Questions (FAQ) (Doc ID 1510470.1) 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
20 
Oracle Big Data Appliance Ressources
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
21 
Hadoop in the Cloud
Hadoop in the Cloud 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
22
There are five key areas to consider when choosing the right deployment model*: 
Five key areas to consider when choosing the right deployment model: 
*Public Cloud, Private Cloud, Community Cloud oder Hybrid Cloud 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
23 
Deployment Considerations 
The second area of consideration is data 
privacy, which is a common concern when 
storing data outside of corporate-owned 
infrastructure. Cloud-based deployment 
requires a comprehensive cloud-data 
privacy strategy that encompasses 
areas such as proper implementation of 
legal requirements, well-orchestrated 
and therefore enable companies to 
introduce new services and products of 
interest. The primary challenge is that 
the storage of these multiple datasets 
increases the volume of data, resulting 
in slow connectivity. Therefore, many 
organizations choose to co-locate these 
datasets. Given volume and portability 
For the experiment, we first built 
the total cost of ownership (TCO) 
model to control two environments 
at the matched cost level. Then, using 
Accenture Data Platform Benchmark 
as real-world workloads, we compared 
the performance of both a bare-metal 
Hadoop cluster and Amazon 
Price-performance 
ratio 
Data privacy Data gravity Data 
enrichment 
Productivity of 
developers and 
data scientists 
Reference: Where to Deploy your Hadoop Cluster?, Executive Summary, Accenture Technology Labs, 2013
EC2 Instance for Hadoop/MapReduce 
Storage optimized – current generation 
§ Instance hs1.8xlarge 
§ 16 vCPUs (Intel Xeon) 
§ 117GB RAM 
§ 24 x 2000GB = 48TB 
§ 10 Gigabit network 
§ MapR as option 
§ M3, M5 or M7 edition 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
24 
Amazon EMR with the MapR Distribution for Hadoop 
Reference: http://aws.amazon.com/elasticmapreduce/mapr/
Costs for hs1.8xlarge Instance 
§ Medium Utilization Reserved Instances 
§ 1-Year term: upfront $9'200, $1.809 per Hour 
§ 3-Year term: upfront $14'109, $1.581 per Hour 
§ Data Transfer IN to Amazon EC2 from internet: $0.0 per GB 
§ Data Transfer OUT from Amazon EC2 to internet: $0.12 per GB up to 10TB/ 
month ($120 per TB) 
§ MapR M7: $1.49 per Hour 
§ Total: $2'600/month, $31'200/year (24/365 utilization) 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
25 
Amazon EMR with the MapR Distribution for Hadoop
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
26 
Hadoop on 
Do-It-Yourself Infrastructure
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
27 
Do-it-Yourself (experimental setup) 
Source: http://blog.ittoby.com/
HP ProLiant DL380p Gen8 
§ 2 x Eight-Core Intel ® Xeon ® E5-2650 V2 
§ 64 GB Memory (up to 512 GB) 
§ 48 TB Raw Storage Capacity 
§ 40 Gb/sec Infiniband Network 
§ 10 Gb/sec Data Center Connectivity 
§ About $20'000 + Rack + Network + Work 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
28 
Do-it-Yourself (enterprise class setup) 
HP ProLiant DL380e Gen8 
The HP ProLiant DL380e Gen8 (2U) is an excellent choice as the server platform for the Figure 6. HP ProLiant DL380e Gen8 Server 
§ Cloudera Enterprise Data Hub 
Edition 5.x 
§ ca. $2'500/node + support
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
29 
Conclusion
Oracle BDA 
+ High performance scalable 
network architecture 
+ Highly integrated into 
Oracle eco system 
+ Complete software stack 
Oracle  Hadoop 
+ Single point of support 
+ Competitive price/ 
performance ratio for 
enterprise class demands 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
30 
Appliance, Cloud or DIY? 
Amazon EC2 Instances 
+ Fast and easy deployment 
+ Scales from very small to 
very large cluster setups 
+ Capacity on demand on 
hourly base 
+ Optional enterprise class 
hadoop distribution 
+ Interesting price model for 
volatile utilisation and 
capacity on demand 
Servers running the node processes should have sufficient memory for either HBase or for the amount of Map/Reduce configured on the server. A server with larger RAM configuration will deliver optimum performance for both HBase Map/Reduce. To ensure optimal memory performance and bandwidth, we recommend using 8GB or 16GB DIMMs to 
populate each of the 6 memory channels as needed. 
Network configuration 
The DL380e includes four 1GbE NICs onboard. MapR automatically identifies the available NICs on the server and bonds 
them via the MapR software to increase throughput. 
MapR Benefit 
Each of the reference architecture configurations below specifies an additional Top of Rack Switch for redundancy. make use of this, we recommend cabling the ProLiant DL380e Worker Nodes so that NIC 1 is cabled to Switch 1 and cabled to Switch 2, repeating the same process for NICs 3 and 4. Each NIC in the server should have its own IP subnet 
instead of sharing the same subnet with other NICs. 
HP ProLiant DL380e Gen8 
The HP ProLiant DL380e Gen8 (2U) is an excellent choice as the server platform for the worker nodes. 
Figure 6. HP ProLiant DL380e Gen8 Server 
Do it Yourself 
+ Low entry point 
+ Free choice of hardware 
+ Free choice of software 
stack
§ Building an enterprise-class hadoop infrastructure is a challenge 
§ Analyse and prioritize your requirements (business and IT) is crucial 
§ Start „small  fast“ with a proof of concept 
§ Consider various deployment models (On-Premis, 
Appliance, IaaS, PaaS, HaaS, ...) 
§ The Oracle Database Appliance is a very competitive 
offering – especially as extension to your existing 
Oracle operational data systems 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
31 
Conclusion
Thank you. 
Daniel Steiger 
Discipline Manager Infratructure Engineering 
Tel: +41 58 459 50 88 
daniel.steiger@trivadis.com 
BASEL BERN BRUGG GENF LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
32
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
33 
Trivadis an der DOAG 
Ebene 3 - gleich neben der Rolltreppe 
Wir freuen uns auf Ihren Besuch. 
Denn mit Trivadis gewinnen Sie immer.
2014 © Trivadis 
Big Data Infrastructure 
DOAG Jahreskonferenz 2014 
34 
Cost comparison 
Aribute 
Oracle 
BDA 
Amazon 
EMR 
DIY 
Typ 
X4-­‐2 
hs1.8xlarge 
DL-­‐380 
CPU 
2x8-­‐Core 
16 
vCPU 
2x8-­‐Core 
RAM 
64 
GB 
117 
GB 
64 
GB 
Storage 
48 
TB 
48 
TB 
8 
TB 
Network 
10 
GB 
/ 
40 
GB 
10 
GB 
10 
GB 
/ 
40 
GB 
Hadoop 
Distr. 
Cloudera 
MapR 
Cloudera 
Preis 
/ 
Jahr 
525'000 
562'256 
405'000 
Wartung 
/ 
Jahr 
63'000 
-­‐ 
40'000 
Total 
1. 
Jahr 
588'000 
562'256 
445'000 
Total 
3 
Jahre 
714'000 
1'686'768 
525'000

Más contenido relacionado

La actualidad más candente

Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
aghosh_us
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
Robin Fong 方俊强
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
oracle-database-editions-wp-12c-1896124
oracle-database-editions-wp-12c-1896124oracle-database-editions-wp-12c-1896124
oracle-database-editions-wp-12c-1896124
Arjun Sathe
 

La actualidad más candente (20)

IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Biwa summit 2015 oaa oracle data miner hands on lab
Biwa summit 2015 oaa oracle data miner hands on labBiwa summit 2015 oaa oracle data miner hands on lab
Biwa summit 2015 oaa oracle data miner hands on lab
 
Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
 
Modernizando plataforma de bi
Modernizando plataforma de biModernizando plataforma de bi
Modernizando plataforma de bi
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...
 
Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
dvprimer-architecture
dvprimer-architecturedvprimer-architecture
dvprimer-architecture
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Creando un Portal Oracle para una Empresa
Creando un Portal Oracle para una EmpresaCreando un Portal Oracle para una Empresa
Creando un Portal Oracle para una Empresa
 
oracle-database-editions-wp-12c-1896124
oracle-database-editions-wp-12c-1896124oracle-database-editions-wp-12c-1896124
oracle-database-editions-wp-12c-1896124
 

Destacado

Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Vitaly Gordon
 
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Alfredo Krieg
 
Spatial Data Infrastructure Best Practices with GeoNode
Spatial Data Infrastructure Best Practices with GeoNodeSpatial Data Infrastructure Best Practices with GeoNode
Spatial Data Infrastructure Best Practices with GeoNode
Sebastian Benthall
 

Destacado (20)

Operational elastic
Operational elasticOperational elastic
Operational elastic
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Using Elastic to Monitor Anything
Using Elastic to Monitor Anything Using Elastic to Monitor Anything
Using Elastic to Monitor Anything
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
 
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
 
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
 
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
 
How TERN Data Infrastructure works
How TERN Data Infrastructure worksHow TERN Data Infrastructure works
How TERN Data Infrastructure works
 
Proactively Managing Your Data Center Infrastructure
Proactively Managing Your Data Center InfrastructureProactively Managing Your Data Center Infrastructure
Proactively Managing Your Data Center Infrastructure
 
Rootconf
RootconfRootconf
Rootconf
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & Elasticsearch
 
How to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to Insights
 
Tick
TickTick
Tick
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)
 
Time Series Database and Tick Stack
Time Series Database and Tick StackTime Series Database and Tick Stack
Time Series Database and Tick Stack
 
Spatial Data Infrastructure Best Practices with GeoNode
Spatial Data Infrastructure Best Practices with GeoNodeSpatial Data Infrastructure Best Practices with GeoNode
Spatial Data Infrastructure Best Practices with GeoNode
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
Beats
BeatsBeats
Beats
 
Webinar usando graylog para la gestión centralizada de logs
Webinar usando graylog para la gestión centralizada de logsWebinar usando graylog para la gestión centralizada de logs
Webinar usando graylog para la gestión centralizada de logs
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 

Similar a Big Data Infrastructure

Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Pentaho
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 

Similar a Big Data Infrastructure (20)

Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primer
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 

Más de Trivadis

Más de Trivadis (20)

Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
 
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Azure Days 2019: Master the Move to Azure (Konrad Brunner)
Azure Days 2019: Master the Move to Azure (Konrad Brunner)Azure Days 2019: Master the Move to Azure (Konrad Brunner)
Azure Days 2019: Master the Move to Azure (Konrad Brunner)
 
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
 
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
 
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
 
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
 
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
 
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
 
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - TrivadisTechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
 
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
 
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
 
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
 
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
 
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
 
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
 
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - TrivadisTechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Big Data Infrastructure

  • 1. Big Data Infrastructure. Appliance, Cloud, or Do-it-Yourself. Daniel Steiger Discipline Manager Infrastructure Engineering BASEL BERN BRUGG GENF LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 1
  • 2. Unser Unternehmen Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution-Engineering und der Erbringung von IT-Services mit Fokussierung auf und Technologien im D-A-CH-Raum. Unsere strategischen Geschäftsfelder... 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 2
  • 3. Mit über 600 IT- und Fachexperten bei Ihnen vor Ort Stuttgart Brugg 2014 © Trivadis 3 12 Trivadis Niederlassungen mit über 600 Mitarbeitenden 200 Service Level Agreements Mehr als 4'000 Trainingsteilnehmer Forschungs- und Entwicklungs-budget: CHF 5.0 Mio. / EUR 4.0 Mio. Finanziell unabhängig und nachhaltig profitabel Erfahrung aus mehr als 1'900 Projekten pro Jahr bei über 800 Kunden (Stand 12/2013) 3 Big Data Infrastructure DOAG Jahreskonferenz 2014 3 Hamburg Düsseldorf Frankfurt Freiburg München Wien Basel Bern Zürich Lausanne
  • 4. 1. Big Data Infrastructure Challenges 2. Hadoop on an Appliance 3. Hadoop in the Cloud 4. Hadoop Do-it-Yourself 5. Conclusion 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 4 Agenda
  • 5. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 5 Big Data Infrastructure Challenges
  • 6. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 6 Trailwise – a "quantified self" use case 11'000 data points rendered in 165ms 47'295 data points rendered in 643ms
  • 7. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 7 Trailwise – Infrastructure for a Proof of Concept 7 § Hadoop HDFS as data store § HBase for real-time data access § Hadoop Map/Reduce
  • 8. 2014 © Trivadis Concerns… § Scalability § Costs for "always up" § Setup and administration of a large cluster on AWS § Break-even cloud vs on-premise For a proof of concept hadoop in the cloud (e.g. on Amazon EC2) is perfect... + Fast and easy deployment + Optimized Hadoop/HBase setup + HBase real-time performance + Map/Reduce scalability + Affordable, ca. EUR 15.-/day Big Data Infrastructure DOAG Jahreskonferenz 2014 8 Trailwise – Infrastructure Lessons Learned
  • 9. § Big Data means big data volume § Petabytes and exabytes § Scalability § 10, 20, 50, 100, ... cluster nodes § Costs should scale as well... § High demands on machine-to-machine networks § In Big Data for every one-client interaction, there may be hundreds or thousands of server and data node interactions § This generates far more east-west (server-to-server or server-to-storage) network traffic than north-south (server-to-client or server-to-outside) network traffic § And many others like integration, data protection, operation, etc. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 9 Big Data Infrastructure Challenges
  • 10. § Infrastructure must be engineered to scale § The network has to provide high bandwidth, low latency, and should scale seamlessly with Hadoop clusters to provide predictable performance § And many more, like § Integration with operational data systems § Authentication, authorization, encryption § Centralized management 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 10 Infrastructure Requirements Figure 1.2: Picture of a row of servers in a Google WSC, 1.6. ARCHITECTURAL Will my infrastructure meet my needs now and in the future without putting my business at risk?
  • 11. When enterprises adopt Hadoop, one of the decisions they must make is the deployment model. There are four options: 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 11 Where to Deploy your Hadoop Cluster? When enterprises adopt Hadoop, one of the decisions they must make is the deployment model. There are four options as illustrated in Figure 1: ‡On-premise full custom. With this option, businesses purchase commodity hardware, then they install software and There have existed two divergent views related to the price-performance ratio for Hadoop deployments. One view is that a virtualized Hadoop cluster is slower because Hadoop’s workload has intensive I/O operations, which tend to run slowly on virtualized environments. A related and fourth area is data enrichment, which involves leveraging multiple datasets to uncover new insights. For example, combining a consumer’s purchase history and social-networking activities can yield a deeper understanding of the consumer’s lifestyle and key personal Figure 1. The spectrum of Hadoop deployment options On-premise full custom Hadoop appliance Hadoop hosting Hadoop-as-a- Service Bare-metal Cloud Reference: Hadoop Deployment Comparison Study, Price-Performance Comparison, Accenture Technology Labs, 2013
  • 12. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 12 Hadoop on an Appliance Oracle Big Data Appliance
  • 13. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 13 Overview: Oracle's Big Data Solution § A complete and optimized solution for big data § Tight integration with Exadata, Exalogic, Exalytics and SPARC Supercluster using Infiniband network § Single-vendor support for both hardware and software
  • 14. Full Rack Configuration (up to 18 racks) § 18 x compute/storage nodes Per Node: § 2 x Eight-Core Intel ® Xeon ® E5-2650 V2 Processors § 64 GB Memory (up to 512 GB) § 48 TB Raw Storage Capacity § 40 Gb/sec Infiniband Network § 10 Gb/sec Data Center Connectivity 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 14 Oracle Big Data Appliance X4-2 HW Source: Oracle ®
  • 15. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 15 Oracle Big Data Appliance Internal Network Connectivity Source: Oracle Big Data Appliance: Datacenter Network Integration, Oracle White Paper, 2012
  • 16. 2014 © Trivadis § Oracle R Distribution § Oracle NoSQL DB Community Ed. § BDA Enterprise Manager Plug-In § Optional Software* § Oracle Big Data SQL § Oracle Big Data Connectors § Oracle Audit Vault Database Firewall for Hadoop Auditing § Oracle Data Integrator § Oracle NoSQL Database EE § Oracle Linux 6.4 with UEK § Oracle Java JDK 7 § Cloudera Enterprise Data Hub Edition § Apache Hadoop HDFS § HBase § Cloudera Impala § Cloudera Search § Cloudera Manager § Apache Spark Big Data Infrastructure DOAG Jahreskonferenz 2014 16 Big Data Appliance Software Stack *Connectors are licensed separately from Oracle Big Data Appliance
  • 17. 2014 © Trivadis § Oracle R Support for Big Data § R is an open-source language and environment for statistical analysis and graphing § The standard R distribution is installed on all nodes of Oracle Big Data Appliance § Oracle R Connector for Hadoop provides R users with high-performance, native access to HDFS and the MapReduce programming framework § Oracle R Enterprise is a separate package that provides real-time access to Oracle Database. § Oracle NoSQL Database § Oracle NoSQL Database is a distributed key-value database built on storage technology of Berkeley DB Java Edition. § An intelligent driver on top of Berkeley DB keeps track of the underlying storage topology, shards the data and knows where data can be placed with the lowest latency Big Data Infrastructure DOAG Jahreskonferenz 2014 17 BDA Specific Software Features
  • 18. § Oracle SQL Connector for HDFS § Oracle Loader for Hadoop § Oracle R Connector for Hadoop § Oracle Data Integrator Application Adapter for Hadoop § Data in HDFS (and NoSQL) data is accessable through relational database external table mechanism (HDFS as cluster file system) 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 18 Oracle Big Data Connectors Reference: Oracle Big Data Connectors Data Sheet Source: Oracle ®
  • 19. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 19 Oracle Big Data SQL: one tool for all data sources Reference: https://www.oracle.com/webfolder/s/delivery_production/docs/FY15h1/doc6/1-T2-BigData.pdf
  • 20. § Oracle Big Data Lite VM § http://www.oracle.com/technetwork/database/bigdata-appliance/ oracle-bigdatalite-2104726.html § MOS Notes § Information Center: Oracle Big Data Appliance (Doc ID 1445762.2) § Big Data Connectors (ID 1487399.2) § Sqoop Frequently Asked Questions (FAQ) (Doc ID 1510470.1) 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 20 Oracle Big Data Appliance Ressources
  • 21. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 21 Hadoop in the Cloud
  • 22. Hadoop in the Cloud 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 22
  • 23. There are five key areas to consider when choosing the right deployment model*: Five key areas to consider when choosing the right deployment model: *Public Cloud, Private Cloud, Community Cloud oder Hybrid Cloud 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 23 Deployment Considerations The second area of consideration is data privacy, which is a common concern when storing data outside of corporate-owned infrastructure. Cloud-based deployment requires a comprehensive cloud-data privacy strategy that encompasses areas such as proper implementation of legal requirements, well-orchestrated and therefore enable companies to introduce new services and products of interest. The primary challenge is that the storage of these multiple datasets increases the volume of data, resulting in slow connectivity. Therefore, many organizations choose to co-locate these datasets. Given volume and portability For the experiment, we first built the total cost of ownership (TCO) model to control two environments at the matched cost level. Then, using Accenture Data Platform Benchmark as real-world workloads, we compared the performance of both a bare-metal Hadoop cluster and Amazon Price-performance ratio Data privacy Data gravity Data enrichment Productivity of developers and data scientists Reference: Where to Deploy your Hadoop Cluster?, Executive Summary, Accenture Technology Labs, 2013
  • 24. EC2 Instance for Hadoop/MapReduce Storage optimized – current generation § Instance hs1.8xlarge § 16 vCPUs (Intel Xeon) § 117GB RAM § 24 x 2000GB = 48TB § 10 Gigabit network § MapR as option § M3, M5 or M7 edition 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 24 Amazon EMR with the MapR Distribution for Hadoop Reference: http://aws.amazon.com/elasticmapreduce/mapr/
  • 25. Costs for hs1.8xlarge Instance § Medium Utilization Reserved Instances § 1-Year term: upfront $9'200, $1.809 per Hour § 3-Year term: upfront $14'109, $1.581 per Hour § Data Transfer IN to Amazon EC2 from internet: $0.0 per GB § Data Transfer OUT from Amazon EC2 to internet: $0.12 per GB up to 10TB/ month ($120 per TB) § MapR M7: $1.49 per Hour § Total: $2'600/month, $31'200/year (24/365 utilization) 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 25 Amazon EMR with the MapR Distribution for Hadoop
  • 26. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 26 Hadoop on Do-It-Yourself Infrastructure
  • 27. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 27 Do-it-Yourself (experimental setup) Source: http://blog.ittoby.com/
  • 28. HP ProLiant DL380p Gen8 § 2 x Eight-Core Intel ® Xeon ® E5-2650 V2 § 64 GB Memory (up to 512 GB) § 48 TB Raw Storage Capacity § 40 Gb/sec Infiniband Network § 10 Gb/sec Data Center Connectivity § About $20'000 + Rack + Network + Work 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 28 Do-it-Yourself (enterprise class setup) HP ProLiant DL380e Gen8 The HP ProLiant DL380e Gen8 (2U) is an excellent choice as the server platform for the Figure 6. HP ProLiant DL380e Gen8 Server § Cloudera Enterprise Data Hub Edition 5.x § ca. $2'500/node + support
  • 29. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 29 Conclusion
  • 30. Oracle BDA + High performance scalable network architecture + Highly integrated into Oracle eco system + Complete software stack Oracle Hadoop + Single point of support + Competitive price/ performance ratio for enterprise class demands 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 30 Appliance, Cloud or DIY? Amazon EC2 Instances + Fast and easy deployment + Scales from very small to very large cluster setups + Capacity on demand on hourly base + Optional enterprise class hadoop distribution + Interesting price model for volatile utilisation and capacity on demand Servers running the node processes should have sufficient memory for either HBase or for the amount of Map/Reduce configured on the server. A server with larger RAM configuration will deliver optimum performance for both HBase Map/Reduce. To ensure optimal memory performance and bandwidth, we recommend using 8GB or 16GB DIMMs to populate each of the 6 memory channels as needed. Network configuration The DL380e includes four 1GbE NICs onboard. MapR automatically identifies the available NICs on the server and bonds them via the MapR software to increase throughput. MapR Benefit Each of the reference architecture configurations below specifies an additional Top of Rack Switch for redundancy. make use of this, we recommend cabling the ProLiant DL380e Worker Nodes so that NIC 1 is cabled to Switch 1 and cabled to Switch 2, repeating the same process for NICs 3 and 4. Each NIC in the server should have its own IP subnet instead of sharing the same subnet with other NICs. HP ProLiant DL380e Gen8 The HP ProLiant DL380e Gen8 (2U) is an excellent choice as the server platform for the worker nodes. Figure 6. HP ProLiant DL380e Gen8 Server Do it Yourself + Low entry point + Free choice of hardware + Free choice of software stack
  • 31. § Building an enterprise-class hadoop infrastructure is a challenge § Analyse and prioritize your requirements (business and IT) is crucial § Start „small fast“ with a proof of concept § Consider various deployment models (On-Premis, Appliance, IaaS, PaaS, HaaS, ...) § The Oracle Database Appliance is a very competitive offering – especially as extension to your existing Oracle operational data systems 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 31 Conclusion
  • 32. Thank you. Daniel Steiger Discipline Manager Infratructure Engineering Tel: +41 58 459 50 88 daniel.steiger@trivadis.com BASEL BERN BRUGG GENF LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 32
  • 33. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 33 Trivadis an der DOAG Ebene 3 - gleich neben der Rolltreppe Wir freuen uns auf Ihren Besuch. Denn mit Trivadis gewinnen Sie immer.
  • 34. 2014 © Trivadis Big Data Infrastructure DOAG Jahreskonferenz 2014 34 Cost comparison Aribute Oracle BDA Amazon EMR DIY Typ X4-­‐2 hs1.8xlarge DL-­‐380 CPU 2x8-­‐Core 16 vCPU 2x8-­‐Core RAM 64 GB 117 GB 64 GB Storage 48 TB 48 TB 8 TB Network 10 GB / 40 GB 10 GB 10 GB / 40 GB Hadoop Distr. Cloudera MapR Cloudera Preis / Jahr 525'000 562'256 405'000 Wartung / Jahr 63'000 -­‐ 40'000 Total 1. Jahr 588'000 562'256 445'000 Total 3 Jahre 714'000 1'686'768 525'000