SlideShare a Scribd company logo
1 of 45
Download to read offline
Trusted Analytics as a Service
Vin Sharma, Intel Corporation
November 12, 2013
Data-Driven discoveries depend on analytics
Operational
Efficiency

Consumer Behavior

Security &
Risk Management

Traffic
Optimization

Location Aware
Ad Placement

Personalized
Preventive Care

Smart Energy
Grid

Buyer Protection
Program

Claim Fraud
Reduction
Machine-generated data requires end-to-end analytics

1990

2000

2010

Traditional Analytics

Big Data Analytics

End-to-End Analytics

•

Descriptive analysis,
business intelligence, and
reporting

•

Interactive analysis,
complex queries, and
data-intensive models

•

Real-time analysis of
streaming data from IoT

Internally sourced,
relatively small, structured
data

•

Fast and large amounts of
poly-structured data from
multiple sources

•

•

Predictive and prescriptive
analysis integrated into
organizational processes

Analysts and Quants
huddled in back-rooms

•

Data Scientists at the fore

•

•

Widespread access to
tools

3
End-to-end analytics for the Internet of things era
Verticals

Analytics Platform

Enable horizontal
platform for e2e
analytics

Data Platform

Servers

Help build lighthouse
solutions for targeted
verticals

Accelerate evolution
of Apache Hadoop

Storage

Network

Catalyze architectural
transitions to drive
growth

4
End-to-end analytics needs software-defined infrastructure
Processing

Orchestration

Compute

API

File System Security Scheduler

Compliance

Storage

Service
Assurance

Datacenter Operating
Systems

Intelligent Workload
Placement

Network

Composable Resource Pools
Thermals

Power

Location

Datacenter Facilities
Apache Hadoop as a Datacenter Operating System

API

Hadoop,
Storm,
GraphLab,
Spark, Shark,
MPI

Expressway
Future NVM
Memory
Mgmt

Process
Mgmt

Scheduler

YARN + SLURM |
Moab

Future Fabric Controller
I/O

TXT, AES-NI
Rhino
Data Governance
Security

File Systems

HDFS, LustreFS, GlusterFS, Ceph + Kafka

6
Intel leadership in foundational technologies of big data
HPC

Cloud

Enabling technical
computing on
massive data sets

Helping organizations
build open
interoperable clouds

Open Source

Contributing code
and fostering
ecosystem

Intel employs over 10,000 software developers
* Other names and brands may be claimed as the property of others.
Hadoop in a virtualized infrastructure
•

Good
– Agility: Lets you bring up and tear
down resources quickly on demand.
– Fault Tolerance: Protect against
SPOF in Hadoop/HDFS (NN, JT,
Zookeeper) and reduce downtime for
planned updates.
– Resource Efficiency: Run multiple
Hadoop clusters or other applications
– Security: Isolate clusters or nodes
– Simpler management of datacenter

•

Bad
– Performance hit of virtualization is
indeterminate and hard to optimize
– Storage configuration with SAN
and NAS is very different from the
disk attached storage of typical
Hadoop
– Nested virtualization with JVM in a
VM is philosophically uncomfortable
Hadoop in the cloud
• Good
– If your data is stored in a cloud
provider's storage infrastructure,
moving compute to data is
logical.
– If your analytics jobs are
infrequent, you can rent the
cluster only when you need it.
– Isolation offers security.
– Easy to use. Easy to expand.
– Pay as you go.

• Bad
– Cost of storage rises at the rate
of ingest and storage.
– Cost of compute rises with
cluster time. There is no "spare
cluster time" for low priority
work.
– Hadoop makes assumptions
about running in a fixed physical
infrastructure.
Deploying IDH on AWS
• Use a hop machine to connect into the VPC (private network)
for IDH. This is the only machine that allows inbound SSH
connections from clients on the internet. You must SSH into
the hop machine to gain access to machines in the VPC.
• The hop machine hosts the aws_system scripts.
• Although data may be retained on AWS, do not expect data to
always be saved. Assume machines and data will removed at
any time. Save any needed data or results to another
location.
Deploying IDH on AWS
createIDHCluster.sh
• Picks a management node. This should be the first IP address in the
list of IPs that you specify in the nodeips argument.
• After the nodes are running, verifies it can SSH in as the root user
on the management node and as either the root user or some other
non-root user on the other nodes.
• Checks that IDH is NOT installed on any of the nodes. If it cannot
SSH in or IDH is installed, the script exits with a failure.
• Copies over the IDH tarball and the idhscripts.tar to the
management node.
• On the management node, sets up the yum repository and installs
intel manager. Then installs and configures IDH on all the nodes.
Script options
bash ./createIDHCluster.sh
--nodeips=
10.0.20.240,10.0.20.241,10.0.20.242,10.0.20.243

--idhtarball=
/share/dev_builds/intelhadoop-3.0+19555-en-commercial-without-reg.el6.x86_64.tar.gz

--scripttarball=
/home/vin/idhscripts.tar
Why Intel Distribution for Apache Hadoop
Intel® Distribution for Apache Hadoop* software

Hardware-enhanced performance & security
Enables partner innovation in analytics
Strengthens Apache Hadoop* ecosystem

Intel employs over 300 people developing and supporting big data software
Hadoop Security and Compliance Challenges

Data manipulation

Log Data Collector

Data flow

(compiler, planner, driver)

Giraph

HCatalog
Metadata

Graph analysis framework

HBase Coprocessors

HBase

Mahout
Data mining

YARN (MRv2)

Data execution engine

Flume

Oozie

Hive

HiveQL

Interactive Query

R connectors

Distributed Processing Framework

Real-time Distributed BigTable

HDFS 2.0

Hadoop Distributed File System

statistics

Coordination

Pig

Zookeeper

Sqoop

RDB Data Collector

Hadoop is an ecosystem of loosely coupled components
Hadoop Security and Compliance Challenges

Data manipulation

Log Data Collector

Data flow

(compiler, planner, driver)

Giraph

HCatalog
Metadata

Graph analysis framework

HBase Coprocessors

HBase

Mahout
Data mining

YARN (MRv2)

Data execution engine

Flume

Oozie

Hive

HiveQL

Interactive Query

R connectors

Distributed Processing Framework

Real-time Distributed BigTable

HDFS 2.0

Hadoop Distributed File System

statistics

Coordination

Pig

Zookeeper

Sqoop

RDB Data Collector

Components sharing an authentication framework
Hadoop Security and Compliance Challenges

Data manipulation

Log Data Collector

Data flow

(compiler, planner, driver)

Giraph

HCatalog
Metadata

Graph analysis framework

HBase Coprocessors

HBase

Mahout
Data mining

YARN (MRv2)

Data execution engine

Flume

Oozie

Hive

HiveQL

Interactive Query

R connectors

Distributed Processing Framework

Real-time Distributed BigTable

HDFS 2.0

Hadoop Distributed File System

statistics

Coordination

Pig

Zookeeper

Sqoop

RDB Data Collector

Components capable of access control
Hadoop Security and Compliance Challenges

Data manipulation

Log Data Collector

Data flow

(compiler, planner, driver)

Giraph

HCatalog
Metadata

Graph analysis framework

HBase Coprocessors

HBase

Mahout
Data mining

YARN (MRv2)

Data execution engine

Flume

Oozie

Hive

HiveQL

Interactive Query

R connectors

Distributed Processing Framework

Real-time Distributed Big Table

HDFS 2.0

Hadoop Distributed File System

statistics

Coordination

Pig

Zookeeper

Sqoop

RDB Data Collector

Components capable of admission control
Hadoop Security and Compliance Challenges

Data manipulation

Log Data Collector

Data flow

(compiler, planner, driver)

Giraph

HCatalog
Metadata

Graph analysis framework

HBase Coprocessors

HBase

Mahout
Data mining

YARN (MRv2)

Data execution engine

Flume

Oozie

Hive

HiveQL

Interactive Query

R connectors

Distributed Processing Framework

Real-time Distributed Big Table

HDFS 2.0

Hadoop Distributed File System

statistics

Coordination

Pig

Zookeeper

Sqoop

RDB Data Collector

Components capable of (transparent) encryption
Hadoop Security and Compliance Challenges

Data manipulation

Log Data Collector

Data flow

(compiler, planner, driver)

Giraph

HCatalog
Metadata

Graph analysis framework

HBase Coprocessors

HBase

Mahout
Data mining

YARN (MRv2)

Data execution engine

Flume

Oozie

Hive

HiveQL

Interactive Query

R connectors

Distributed Processing Framework

Real-time Distributed Big Table

HDFS 2.0

Hadoop Distributed File System

statistics

Coordination

Pig

Zookeeper

Sqoop

RDB Data Collector

Components sharing a common policy engine
Hadoop Security and Compliance Challenges

Data manipulation

Log Data Collector

Data flow

(compiler, planner, driver)

Giraph

HCatalog
Metadata

Graph analysis framework

HBase Coprocessors

HBase

Mahout
Data mining

YARN (MRv2)

Data execution engine

Flume

Oozie

Hive

HiveQL

Interactive Query

R connectors

Distributed Processing Framework

Real-time Distributed Big Table

HDFS 2.0

Hadoop Distributed File System

statistics

Coordination

Pig

Zookeeper

Sqoop

RDB Data Collector

Components sharing a common audit log format
Project Rhino
•

Strategic Objectives
•
•
•
•
•

•

Framework support for encryption and key management
Token based authentication and SSO for internal cluster services
Role-based access control for simpler administration of authorizations
A common authorization framework, optional but easy to adopt
Consistent audit logging, enhanced for compliance support

Current Projects
• Develop crypto framework in Hadoop Common
• Enable transparent encryption in HBase
• Extend HBase support for ACLs to the cell level
Intel Distribution: Security
Connectors
Netezza, Oracle,
SAP, SQLServer,
Teradata, DB2

Vertical Accelerators

Behavior Model

Recommendation Engine

Analytics Workbench

Heat Map

HBase Explorer

Oozie
Workflow
Zookeeper
Coordination

Lucene, Solr

Tribeca

Gryphon

Search

Graph Mining

Low-latency SQL-92

Pig
Scripting

Mahout
Machine Learning

R
Stats

Hive
Query

Hcatalog
Metadata

YARN (+MapReduce)
Distributed Processing Framework

SLURM
Scheduler

Job Profiler
Resource
Monitor
HBase

Sqoop
Data Transfer
Flume
Log Collector

Kafka
Event Bus

Security
Controls

Upgrade
Alerts
Unified Logging

HDFS | Lustre | GlusterFS
Hadoop Compatible File Systems

Tuning

High Availability and Disaster Recovery

Configuration

Rhino (Security) [Encryption, Authentication, Authorization, Auditing]

Deployment

All external names and brands are claimed as the property of others.

23
Enterprise data requires defense in depth

Firewall

Gateway
Isolation
Authn

AuthZ

Encryption
Audit & Alerts
Intel Expressway protects Hadoop APIs
Firewall
Hcatalog
Stargate
REST APIs
WebHDFS

Containment
AuthnEnforces consistent security policies across all Hadoop services
•

•

Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs
RBAC

•

Complies with Common Criteria EAL4+, HSM, FIPS 140-2
certifications
Encryption

•

Deploys as software, virtual appliance, or hardware appliance
Kerberos authenticates Hadoop services
Firewall

APIs
request
ticket

1
2
3

Authentication

KDC

•

Wizard enables setup of
Containment cluster with
secure
encrypted key exchange

send service
ticket
Intel
Manager

5

request service

•

Manager generates
principal and keytab for
Hadoop services

•

Manager enables batch
upload of keytab files

validate
ticket

4
send

Encryption
respose
Intel Manager simplifies role-based access control

Firewall

AuthZ

•

File, table, and service-level controls

•

Intel Manager pushes ACLs to each node
Intel Distribution provides HDFS encryption

•

Extends compression codec into crypto codec

•

Firewall

Provides an abstract API for general use
HDFS
Derivativ
e Decrypt

MapReduce
RecordReader
Map
Combiner
Partitioner

Encrypt

Merge & Sort

RBAC

Reduce
Decryp
t

Derivative
Encrypt

RecordWriter

Local
Crypto Codec Framework
• Extends compression codec and establishes a common
abstraction of the API level that can be shared by all crypto codec
implementations as well as users that use the API
CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf);
CryptoContext cryptoContext = new CryptoContext();
...
cryptoCodec.setCryptoContext(cryptoContext);
CompressionInputStream input = cryptoCodec.createInputStream(inputStream);
...

• Provides a foundation for other components in Hadoop* such as
MapReduce or HBase* to support encryption features
Crypto Codec Framework: Class Hierarchy
<<Java Interface>>

<<Java Interface>>

<<Java Interface>>

Compressor

Compression Code

Decompressor

<<Java Interface>>

<<Java Interface>>

<<Java Interface>>

Encryptor

Crypto Codec

Decryptor

<<Java Class>>

Crypto Context
0..1

0..1

0..1

<<Java Class>>

<<Java Interface>>

<<Java Class>>

Key

Key ProfileResolver

KeyProfile

<<Java Interface>>

Key Provider
Crypto Codec: API Example
The usage is aligned with compression codec but with context supporting
Configuration conf = new Configuration();
CryptoCodec cryptoCodec =
(CryptoCodec) ReflectionUtils.newInstance(AESCodec.class, conf);
CryptoContext cryptoContext = new CryptoContext();
cryptoContext.setKey(Key.derive(password));
cryptoCodec.setCryptoContext(cryptoContext);
DataInputStream input = inputFile.getFileSystem(conf).open(inputFile);
DataOutputStream outputStream = outputFile.getFileSystem(conf).create(outputFile);
CompressionOutputStream output = cryptoCodec.createOutputStream(outputStream);
// encrypt the stream
writeStream(input, output);
input.close();
output.close();
Crypto Codec: A Simple MapReduce Example
The usage is aligned with compression codec usage in MapReduce job
but with context resolving
Job job = Job.getInstance(conf, "example");
JobConf jobConf = (JobConf)job.getConfiguration();
FileMatches fileMatches = new FileMatches(
KeyContext.refer("KEY00", Key.KeyType.SYMMETRIC_KEY, "AES", 128));
fileMatches.addMatch("^.*/input1.intelaes$",
KeyContext.refer("KEY01", Key.KeyType.SYMMETRIC_KEY, "AES", 128));
String keyStoreFile = "file:///" + secureDir + "/my.keystore";
String keyStorePasswordFile = "file:///" + secureDir + "/my.keystore.passwords";
KeyProviderConfig keyProviderConfig =
KeyProviderCryptoContextProvider.getKeyStoreKeyProviderConfig(
keyStoreFile, "JCEKS", null, keyStorePasswordFile, true);
KeyProviderCryptoContextProvider.setInputCryptoContextProvider(
jobConf, fileMatches, true, keyProviderConfig);
Key Distribution and Protection for MapReduce
• Targets
– A framework at MapReduce side for enabling crypto codec in MapReduce
job such as key context resolving, distribution and protection
– Enabling different key storage or management systems to plug-in for
providing keys
– Satisfying the common requirements that stage and file of a single job may
use different keys

• A complete key management system is not part of Intel®
Distribution for Apache Hadoop* software
– An API to integrate with an external key manage system is included
Secrets Distribution
Node A

Node B

task

2

task

IM Agent
task

1
Job credentials
& data
encryption key

task

3

task
task

IM Agent
task

Job credentials &
data encryption key

task

Shared storage or
distributed in each
node

IM Agent: Intel® Manager for Apache Hadoop* is a service resident in each cluster node.
Pig* & Hive* Encryption: Overview
Intel

Client

MapReduce

Encrypted Job input/output
data

HDFS*
Cluster

https for uploading master key
Master key also be encrypted

Local Disk

Encrypted secrets
Decrypt secrets

Encrypted secrets
Encrypted Intermediate
data

Intel® Manager for
Apache Hadoop*
software

Hive*

Secrets Protection
Service

Pig*
Pig* & Hive* Encryption
• Pig* Encryption Capabilities
–
–
–
–

Support of text file and Avro* file format
Intermediate job output file protection
Pluggable key retrieving and key resolving
Protection of key distribution in cluster

• Hive* Encryption Capabilities
– Support of RC file and Avro file format
– Intermediate and final output data encryption
– Encryption is transparent to end user without changing existing SQL
HBase* Encryption
•
•

Transparent table/CF encryption – HBase-7544
Transparent encryption for ZooKeeper* commit log – ZooKeeper-1688
Crypto Software Optimization

Multi-Buffer
• Process multiple independent
data buffers in parallel
• Improves cryptographic
functionality up to 2-9X
Intel® Data Protection Technology
Advanced Encryption Standard New Instructions
(AES-NI)
•

•

Processor assistance for performing AES
encryption
Makes enabled encryption software faster and
stronger
Internet

AES-NI -

Data in Motion
Secure transactions used
pervasively in ecommerce,
banking, etc.

Data at Rest
Full disk encryption
software protects data
while saving to disk

Data in Process
Most enterprise and cloud
applications offer
encryption options to
secure information and
protect confidentiality
Decryption

Encryption

Encryption

Decryption

AES-NI Accelerated Encryption

Non Intel®
AES-NI

With Intel®
AES-NI

Intel® AES-NI
Multi-Buffer

AES-NI - Advanced Encryption Standard New Instructions See slide in backup for test environment
hadoop.intel.com
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS
OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING
TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE,
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU
PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS
SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS
COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT
LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS
SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or
instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising
from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go
to: http://www.intel.com/design/literature.htm
Intel, Xeon, Look Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Copyright ©2013 Intel Corporation.
Legal Disclaimer
Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in
the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For
more information, see Intel® Advanced Encryption Standard Instructions (AES-NI).
• Software Source Code Disclaimer: Any software source code reprinted in this document is furnished under a software license and may
only be used or copied in accordance with the terms of that license.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following
conditions:
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
•
Risk Factors
The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking
statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,”
“should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify
forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual
results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could
cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including changes in
business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes
in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions
poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other
related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short
term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product
introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions,
marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate
new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation,
including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the
manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or
resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results
could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including
military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses,
particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's
products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected
by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual
property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable
ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices,
impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and
other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings
release.

Rev. 7/17/13
We are sincerely eager to hear
your feedback on this
presentation and on re:Invent.
Please fill out an evaluation form
when you have a chance.

More Related Content

What's hot

SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...Splunk
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Build and manage private and hybrid cloud
Build and manage private and hybrid cloudBuild and manage private and hybrid cloud
Build and manage private and hybrid cloudSyed Shaaf
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
Oracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18cOracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18cAiougVizagChapter
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingCloudera, Inc.
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Cloudera, Inc.
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...DataStax
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019Timothy Spann
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnCloudera, Inc.
 
Transforms Document Management at Scale with Distributed Database Solution wi...
Transforms Document Management at Scale with Distributed Database Solution wi...Transforms Document Management at Scale with Distributed Database Solution wi...
Transforms Document Management at Scale with Distributed Database Solution wi...DataStax Academy
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataDataWorks Summit/Hadoop Summit
 
Microsoft azure infrastructure essentials course manual
Microsoft azure infrastructure essentials   course manualMicrosoft azure infrastructure essentials   course manual
Microsoft azure infrastructure essentials course manualmichaeldejene4
 

What's hot (20)

SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Build and manage private and hybrid cloud
Build and manage private and hybrid cloudBuild and manage private and hybrid cloud
Build and manage private and hybrid cloud
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Oracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18cOracle database in cloud, dr in cloud and overview of oracle database 18c
Oracle database in cloud, dr in cloud and overview of oracle database 18c
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in Churn
 
Transforms Document Management at Scale with Distributed Database Solution wi...
Transforms Document Management at Scale with Distributed Database Solution wi...Transforms Document Management at Scale with Distributed Database Solution wi...
Transforms Document Management at Scale with Distributed Database Solution wi...
 
Meetup
MeetupMeetup
Meetup
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
Microsoft azure infrastructure essentials course manual
Microsoft azure infrastructure essentials   course manualMicrosoft azure infrastructure essentials   course manual
Microsoft azure infrastructure essentials course manual
 

Viewers also liked

Building an Angular 2 App
Building an Angular 2 AppBuilding an Angular 2 App
Building an Angular 2 AppFelix Gessert
 
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesCache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesFelix Gessert
 
Bloom Filters for Web Caching - Lightning Talk
Bloom Filters for Web Caching - Lightning TalkBloom Filters for Web Caching - Lightning Talk
Bloom Filters for Web Caching - Lightning TalkFelix Gessert
 
Web Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisWeb Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisFelix Gessert
 
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big DataLegacy Typesafe (now Lightbend)
 
Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...
Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...
Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...Amazon Web Services
 
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
 
Cloud Databases in Research and Practice
Cloud Databases in Research and PracticeCloud Databases in Research and Practice
Cloud Databases in Research and PracticeFelix Gessert
 
Data analytics as a service
Data analytics as a serviceData analytics as a service
Data analytics as a serviceStanley Wang
 
Pythian Analytics-as-a-Service on Google Cloud Platform - Technical Overview
Pythian Analytics-as-a-Service on Google Cloud Platform - Technical OverviewPythian Analytics-as-a-Service on Google Cloud Platform - Technical Overview
Pythian Analytics-as-a-Service on Google Cloud Platform - Technical OverviewPythian
 

Viewers also liked (11)

dataRPM
dataRPMdataRPM
dataRPM
 
Building an Angular 2 App
Building an Angular 2 AppBuilding an Angular 2 App
Building an Angular 2 App
 
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesCache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
 
Bloom Filters for Web Caching - Lightning Talk
Bloom Filters for Web Caching - Lightning TalkBloom Filters for Web Caching - Lightning Talk
Bloom Filters for Web Caching - Lightning Talk
 
Web Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisWeb Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der Praxis
 
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
 
Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...
Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...
Enabling Innovative Business Opportunities Through Secure Cloud Adoption - Se...
 
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
 
Cloud Databases in Research and Practice
Cloud Databases in Research and PracticeCloud Databases in Research and Practice
Cloud Databases in Research and Practice
 
Data analytics as a service
Data analytics as a serviceData analytics as a service
Data analytics as a service
 
Pythian Analytics-as-a-Service on Google Cloud Platform - Technical Overview
Pythian Analytics-as-a-Service on Google Cloud Platform - Technical OverviewPythian Analytics-as-a-Service on Google Cloud Platform - Technical Overview
Pythian Analytics-as-a-Service on Google Cloud Platform - Technical Overview
 

Similar to Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013

Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksLuan Moreno Medeiros Maciel
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAmazon Web Services
 
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?Krzysztof Adamski
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoHardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoAmazon Web Services
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-OverviewHarry Frost
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 

Similar to Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013 (20)

Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
 
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
 
BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
paper
paperpaper
paper
 
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoHardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013

  • 1. Trusted Analytics as a Service Vin Sharma, Intel Corporation November 12, 2013
  • 2. Data-Driven discoveries depend on analytics Operational Efficiency Consumer Behavior Security & Risk Management Traffic Optimization Location Aware Ad Placement Personalized Preventive Care Smart Energy Grid Buyer Protection Program Claim Fraud Reduction
  • 3. Machine-generated data requires end-to-end analytics 1990 2000 2010 Traditional Analytics Big Data Analytics End-to-End Analytics • Descriptive analysis, business intelligence, and reporting • Interactive analysis, complex queries, and data-intensive models • Real-time analysis of streaming data from IoT Internally sourced, relatively small, structured data • Fast and large amounts of poly-structured data from multiple sources • • Predictive and prescriptive analysis integrated into organizational processes Analysts and Quants huddled in back-rooms • Data Scientists at the fore • • Widespread access to tools 3
  • 4. End-to-end analytics for the Internet of things era Verticals Analytics Platform Enable horizontal platform for e2e analytics Data Platform Servers Help build lighthouse solutions for targeted verticals Accelerate evolution of Apache Hadoop Storage Network Catalyze architectural transitions to drive growth 4
  • 5. End-to-end analytics needs software-defined infrastructure Processing Orchestration Compute API File System Security Scheduler Compliance Storage Service Assurance Datacenter Operating Systems Intelligent Workload Placement Network Composable Resource Pools Thermals Power Location Datacenter Facilities
  • 6. Apache Hadoop as a Datacenter Operating System API Hadoop, Storm, GraphLab, Spark, Shark, MPI Expressway Future NVM Memory Mgmt Process Mgmt Scheduler YARN + SLURM | Moab Future Fabric Controller I/O TXT, AES-NI Rhino Data Governance Security File Systems HDFS, LustreFS, GlusterFS, Ceph + Kafka 6
  • 7. Intel leadership in foundational technologies of big data HPC Cloud Enabling technical computing on massive data sets Helping organizations build open interoperable clouds Open Source Contributing code and fostering ecosystem Intel employs over 10,000 software developers * Other names and brands may be claimed as the property of others.
  • 8. Hadoop in a virtualized infrastructure • Good – Agility: Lets you bring up and tear down resources quickly on demand. – Fault Tolerance: Protect against SPOF in Hadoop/HDFS (NN, JT, Zookeeper) and reduce downtime for planned updates. – Resource Efficiency: Run multiple Hadoop clusters or other applications – Security: Isolate clusters or nodes – Simpler management of datacenter • Bad – Performance hit of virtualization is indeterminate and hard to optimize – Storage configuration with SAN and NAS is very different from the disk attached storage of typical Hadoop – Nested virtualization with JVM in a VM is philosophically uncomfortable
  • 9. Hadoop in the cloud • Good – If your data is stored in a cloud provider's storage infrastructure, moving compute to data is logical. – If your analytics jobs are infrequent, you can rent the cluster only when you need it. – Isolation offers security. – Easy to use. Easy to expand. – Pay as you go. • Bad – Cost of storage rises at the rate of ingest and storage. – Cost of compute rises with cluster time. There is no "spare cluster time" for low priority work. – Hadoop makes assumptions about running in a fixed physical infrastructure.
  • 10. Deploying IDH on AWS • Use a hop machine to connect into the VPC (private network) for IDH. This is the only machine that allows inbound SSH connections from clients on the internet. You must SSH into the hop machine to gain access to machines in the VPC. • The hop machine hosts the aws_system scripts. • Although data may be retained on AWS, do not expect data to always be saved. Assume machines and data will removed at any time. Save any needed data or results to another location.
  • 11. Deploying IDH on AWS createIDHCluster.sh • Picks a management node. This should be the first IP address in the list of IPs that you specify in the nodeips argument. • After the nodes are running, verifies it can SSH in as the root user on the management node and as either the root user or some other non-root user on the other nodes. • Checks that IDH is NOT installed on any of the nodes. If it cannot SSH in or IDH is installed, the script exits with a failure. • Copies over the IDH tarball and the idhscripts.tar to the management node. • On the management node, sets up the yum repository and installs intel manager. Then installs and configures IDH on all the nodes.
  • 13. Why Intel Distribution for Apache Hadoop
  • 14. Intel® Distribution for Apache Hadoop* software Hardware-enhanced performance & security Enables partner innovation in analytics Strengthens Apache Hadoop* ecosystem Intel employs over 300 people developing and supporting big data software
  • 15. Hadoop Security and Compliance Challenges Data manipulation Log Data Collector Data flow (compiler, planner, driver) Giraph HCatalog Metadata Graph analysis framework HBase Coprocessors HBase Mahout Data mining YARN (MRv2) Data execution engine Flume Oozie Hive HiveQL Interactive Query R connectors Distributed Processing Framework Real-time Distributed BigTable HDFS 2.0 Hadoop Distributed File System statistics Coordination Pig Zookeeper Sqoop RDB Data Collector Hadoop is an ecosystem of loosely coupled components
  • 16. Hadoop Security and Compliance Challenges Data manipulation Log Data Collector Data flow (compiler, planner, driver) Giraph HCatalog Metadata Graph analysis framework HBase Coprocessors HBase Mahout Data mining YARN (MRv2) Data execution engine Flume Oozie Hive HiveQL Interactive Query R connectors Distributed Processing Framework Real-time Distributed BigTable HDFS 2.0 Hadoop Distributed File System statistics Coordination Pig Zookeeper Sqoop RDB Data Collector Components sharing an authentication framework
  • 17. Hadoop Security and Compliance Challenges Data manipulation Log Data Collector Data flow (compiler, planner, driver) Giraph HCatalog Metadata Graph analysis framework HBase Coprocessors HBase Mahout Data mining YARN (MRv2) Data execution engine Flume Oozie Hive HiveQL Interactive Query R connectors Distributed Processing Framework Real-time Distributed BigTable HDFS 2.0 Hadoop Distributed File System statistics Coordination Pig Zookeeper Sqoop RDB Data Collector Components capable of access control
  • 18. Hadoop Security and Compliance Challenges Data manipulation Log Data Collector Data flow (compiler, planner, driver) Giraph HCatalog Metadata Graph analysis framework HBase Coprocessors HBase Mahout Data mining YARN (MRv2) Data execution engine Flume Oozie Hive HiveQL Interactive Query R connectors Distributed Processing Framework Real-time Distributed Big Table HDFS 2.0 Hadoop Distributed File System statistics Coordination Pig Zookeeper Sqoop RDB Data Collector Components capable of admission control
  • 19. Hadoop Security and Compliance Challenges Data manipulation Log Data Collector Data flow (compiler, planner, driver) Giraph HCatalog Metadata Graph analysis framework HBase Coprocessors HBase Mahout Data mining YARN (MRv2) Data execution engine Flume Oozie Hive HiveQL Interactive Query R connectors Distributed Processing Framework Real-time Distributed Big Table HDFS 2.0 Hadoop Distributed File System statistics Coordination Pig Zookeeper Sqoop RDB Data Collector Components capable of (transparent) encryption
  • 20. Hadoop Security and Compliance Challenges Data manipulation Log Data Collector Data flow (compiler, planner, driver) Giraph HCatalog Metadata Graph analysis framework HBase Coprocessors HBase Mahout Data mining YARN (MRv2) Data execution engine Flume Oozie Hive HiveQL Interactive Query R connectors Distributed Processing Framework Real-time Distributed Big Table HDFS 2.0 Hadoop Distributed File System statistics Coordination Pig Zookeeper Sqoop RDB Data Collector Components sharing a common policy engine
  • 21. Hadoop Security and Compliance Challenges Data manipulation Log Data Collector Data flow (compiler, planner, driver) Giraph HCatalog Metadata Graph analysis framework HBase Coprocessors HBase Mahout Data mining YARN (MRv2) Data execution engine Flume Oozie Hive HiveQL Interactive Query R connectors Distributed Processing Framework Real-time Distributed Big Table HDFS 2.0 Hadoop Distributed File System statistics Coordination Pig Zookeeper Sqoop RDB Data Collector Components sharing a common audit log format
  • 22. Project Rhino • Strategic Objectives • • • • • • Framework support for encryption and key management Token based authentication and SSO for internal cluster services Role-based access control for simpler administration of authorizations A common authorization framework, optional but easy to adopt Consistent audit logging, enhanced for compliance support Current Projects • Develop crypto framework in Hadoop Common • Enable transparent encryption in HBase • Extend HBase support for ACLs to the cell level
  • 23. Intel Distribution: Security Connectors Netezza, Oracle, SAP, SQLServer, Teradata, DB2 Vertical Accelerators Behavior Model Recommendation Engine Analytics Workbench Heat Map HBase Explorer Oozie Workflow Zookeeper Coordination Lucene, Solr Tribeca Gryphon Search Graph Mining Low-latency SQL-92 Pig Scripting Mahout Machine Learning R Stats Hive Query Hcatalog Metadata YARN (+MapReduce) Distributed Processing Framework SLURM Scheduler Job Profiler Resource Monitor HBase Sqoop Data Transfer Flume Log Collector Kafka Event Bus Security Controls Upgrade Alerts Unified Logging HDFS | Lustre | GlusterFS Hadoop Compatible File Systems Tuning High Availability and Disaster Recovery Configuration Rhino (Security) [Encryption, Authentication, Authorization, Auditing] Deployment All external names and brands are claimed as the property of others. 23
  • 24. Enterprise data requires defense in depth Firewall Gateway Isolation Authn AuthZ Encryption Audit & Alerts
  • 25. Intel Expressway protects Hadoop APIs Firewall Hcatalog Stargate REST APIs WebHDFS Containment AuthnEnforces consistent security policies across all Hadoop services • • Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs RBAC • Complies with Common Criteria EAL4+, HSM, FIPS 140-2 certifications Encryption • Deploys as software, virtual appliance, or hardware appliance
  • 26. Kerberos authenticates Hadoop services Firewall APIs request ticket 1 2 3 Authentication KDC • Wizard enables setup of Containment cluster with secure encrypted key exchange send service ticket Intel Manager 5 request service • Manager generates principal and keytab for Hadoop services • Manager enables batch upload of keytab files validate ticket 4 send Encryption respose
  • 27. Intel Manager simplifies role-based access control Firewall AuthZ • File, table, and service-level controls • Intel Manager pushes ACLs to each node
  • 28. Intel Distribution provides HDFS encryption • Extends compression codec into crypto codec • Firewall Provides an abstract API for general use HDFS Derivativ e Decrypt MapReduce RecordReader Map Combiner Partitioner Encrypt Merge & Sort RBAC Reduce Decryp t Derivative Encrypt RecordWriter Local
  • 29. Crypto Codec Framework • Extends compression codec and establishes a common abstraction of the API level that can be shared by all crypto codec implementations as well as users that use the API CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf); CryptoContext cryptoContext = new CryptoContext(); ... cryptoCodec.setCryptoContext(cryptoContext); CompressionInputStream input = cryptoCodec.createInputStream(inputStream); ... • Provides a foundation for other components in Hadoop* such as MapReduce or HBase* to support encryption features
  • 30. Crypto Codec Framework: Class Hierarchy <<Java Interface>> <<Java Interface>> <<Java Interface>> Compressor Compression Code Decompressor <<Java Interface>> <<Java Interface>> <<Java Interface>> Encryptor Crypto Codec Decryptor <<Java Class>> Crypto Context 0..1 0..1 0..1 <<Java Class>> <<Java Interface>> <<Java Class>> Key Key ProfileResolver KeyProfile <<Java Interface>> Key Provider
  • 31. Crypto Codec: API Example The usage is aligned with compression codec but with context supporting Configuration conf = new Configuration(); CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(AESCodec.class, conf); CryptoContext cryptoContext = new CryptoContext(); cryptoContext.setKey(Key.derive(password)); cryptoCodec.setCryptoContext(cryptoContext); DataInputStream input = inputFile.getFileSystem(conf).open(inputFile); DataOutputStream outputStream = outputFile.getFileSystem(conf).create(outputFile); CompressionOutputStream output = cryptoCodec.createOutputStream(outputStream); // encrypt the stream writeStream(input, output); input.close(); output.close();
  • 32. Crypto Codec: A Simple MapReduce Example The usage is aligned with compression codec usage in MapReduce job but with context resolving Job job = Job.getInstance(conf, "example"); JobConf jobConf = (JobConf)job.getConfiguration(); FileMatches fileMatches = new FileMatches( KeyContext.refer("KEY00", Key.KeyType.SYMMETRIC_KEY, "AES", 128)); fileMatches.addMatch("^.*/input1.intelaes$", KeyContext.refer("KEY01", Key.KeyType.SYMMETRIC_KEY, "AES", 128)); String keyStoreFile = "file:///" + secureDir + "/my.keystore"; String keyStorePasswordFile = "file:///" + secureDir + "/my.keystore.passwords"; KeyProviderConfig keyProviderConfig = KeyProviderCryptoContextProvider.getKeyStoreKeyProviderConfig( keyStoreFile, "JCEKS", null, keyStorePasswordFile, true); KeyProviderCryptoContextProvider.setInputCryptoContextProvider( jobConf, fileMatches, true, keyProviderConfig);
  • 33. Key Distribution and Protection for MapReduce • Targets – A framework at MapReduce side for enabling crypto codec in MapReduce job such as key context resolving, distribution and protection – Enabling different key storage or management systems to plug-in for providing keys – Satisfying the common requirements that stage and file of a single job may use different keys • A complete key management system is not part of Intel® Distribution for Apache Hadoop* software – An API to integrate with an external key manage system is included
  • 34. Secrets Distribution Node A Node B task 2 task IM Agent task 1 Job credentials & data encryption key task 3 task task IM Agent task Job credentials & data encryption key task Shared storage or distributed in each node IM Agent: Intel® Manager for Apache Hadoop* is a service resident in each cluster node.
  • 35. Pig* & Hive* Encryption: Overview Intel Client MapReduce Encrypted Job input/output data HDFS* Cluster https for uploading master key Master key also be encrypted Local Disk Encrypted secrets Decrypt secrets Encrypted secrets Encrypted Intermediate data Intel® Manager for Apache Hadoop* software Hive* Secrets Protection Service Pig*
  • 36. Pig* & Hive* Encryption • Pig* Encryption Capabilities – – – – Support of text file and Avro* file format Intermediate job output file protection Pluggable key retrieving and key resolving Protection of key distribution in cluster • Hive* Encryption Capabilities – Support of RC file and Avro file format – Intermediate and final output data encryption – Encryption is transparent to end user without changing existing SQL
  • 37. HBase* Encryption • • Transparent table/CF encryption – HBase-7544 Transparent encryption for ZooKeeper* commit log – ZooKeeper-1688
  • 38. Crypto Software Optimization Multi-Buffer • Process multiple independent data buffers in parallel • Improves cryptographic functionality up to 2-9X
  • 39. Intel® Data Protection Technology Advanced Encryption Standard New Instructions (AES-NI) • • Processor assistance for performing AES encryption Makes enabled encryption software faster and stronger Internet AES-NI - Data in Motion Secure transactions used pervasively in ecommerce, banking, etc. Data at Rest Full disk encryption software protects data while saving to disk Data in Process Most enterprise and cloud applications offer encryption options to secure information and protect confidentiality
  • 40. Decryption Encryption Encryption Decryption AES-NI Accelerated Encryption Non Intel® AES-NI With Intel® AES-NI Intel® AES-NI Multi-Buffer AES-NI - Advanced Encryption Standard New Instructions See slide in backup for test environment
  • 42. Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Intel, Xeon, Look Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright ©2013 Intel Corporation.
  • 43. Legal Disclaimer Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information, see Intel® Advanced Encryption Standard Instructions (AES-NI). • Software Source Code Disclaimer: Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. •
  • 44. Risk Factors The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release. Rev. 7/17/13
  • 45. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.