SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
EMC WHITE PAPER
THE EMC ISILON SCALE-OUT
DATA LAKE
Key capabilities
ABSTRACT
This white paper provides an introduction to the EMC Isilon scale-out data lake as the
key enabler to store, manage, and protect unstructured data for traditional and
emerging workloads. Business decision makers and architects can leverage the
information provided here to make key strategy and implementation decisions for
their storage infrastructure.
December 2014
Copyright © 2015 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.
The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
All other trademarks used herein are the property of their respective owners.
EMC2
, EMC, the EMC logo, Isilon, InsightIQ, OneFS, SmartConnect, SmartLock, SmartPools, SmartQuotas, SnapshotIQ, SyncIQ, and
Vblock are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks
used herein are the property of their respective owners.
Part Number H13172
TABLE OF CONTENTS
EXECUTIVE SUMMARY 4
Audience 4
Terminology 4
OVERVIEW 4
DATA FLOW 5
Ingest 6
Storage 6
Analysis 6
Application: Surface and Act 6
BUSINESS CHALLENGES 6
Silos or islands of data 7
Inefficiencies across the system 7
Security and compliance 7
Inflexible architecture for innovation Error! Bookmark not defined.
DATA LAKE 8
ISILON SCALE-OUT DATA LAKE 9
Multiple access methods 9
Cost efficiency 9
Reduce risks 10
Protect and secure data assets 11
Faster time to insights 12
ERROR! BOOKMARK NOT DEFINED.
STORAGE AND DATA SERVICES 13
CONSUMPTION MODELS 13
CONCLUSION 13
REFERENCES 13
ABOUT EMC 13
4
EXECUTIVE SUMMARY
Data is growing rapidly as the numbers of people using digital devices interact with systems and networks grow across the globe. A
Majority of this data growth is in the unstructured form—and as organizations are experiencing, contains valuable insights that could
be used to improve business results. The technology to capture, store and analyze this data is maturing rapidly, as enterprises are
looking for ways to effectively handle the data growth. By some industry estimates, over 80 percent of the new storage capacity
deployed in organizations around the world will be for unstructured data.
EMC®
Isilon®
scale-out network-attached storage (NAS) provides a simple, scalable, and efficient platform to store massive amounts
of unstructured data and enable various applications to create a scalable and accessible data repository without the overhead
associated with traditional storage systems. Isilon enables organizations to build a scale-out data lake where they can store their
current data, and scale capacity, performance, or protection as their business data grows in the future. The scale-out data lake helps
lower storage costs by efficient storage utilization, eliminate islands or silos of storage, and lower storage management costs of
migration, security and protection.
EMC Isilon OneFS®
, the intelligence behind Isilon systems, is the industry’s leading scale-out NAS operating system, which is known
for its massive capacity, operational simplicity, extreme performance, and unmatched storage utilization. With proven enterprise-
grade protection and security capabilities, Isilon is an ideal platform to meet a wide variety of storage needs.
AUDIENCE
This white paper is intended for business decision makers, IT managers, architects, and implementers. By leveraging the Isilon
scale-out data lake—a key enabler for storing and managing massive quantities of unstructured data—enterprises can build their
storage strategies and implement their infrastructure to maximize the return of their IT investments.
TERMINOLOGY
The acronyms used in this paper are summarized in the Table 1.
Table 1. Acronyms used in this paper
OVERVIEW
According to an IDC “The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things” study
sponsored by EMC, the digital universe will grow from 4.4 zettabytes (1 trillion gigabytes) in 2013 to 44 zettabytes in 2020, doubling
in size every two years. Although enterprises create only 30 percent of this data (1.5 ZB in 2013), they come in contact with 85
percent (2.3 ZB in 2013) of it, and have some liability associated with the data. IDC further estimates that 22 percent of the data is
a candidate for analysis and contains valuable information that organizations can use to make critical decisions. Figure 1 shows this
trend for enterprise data.
ACRONYM DESCRIPTION
CIFS Common Internet File System
DAS direct-attached storage
HDFS Hadoop Distributed File System
LAN local area network
NAS network-attached storage
NFS network file system
SAN storage area network
SMB Server Message Block
5
Figure 1. Enterprise data1
Not all of this data is stored by the enterprises in the datacenter for cost, liability or process reasons.
Isilon scale-out NAS provides key capabilities to build a data lake that physically de-couple compute from storage without affecting
the seamless nature of the data flow outlined below. You can leverage a data lake to get the most value from your data. Data from a
variety of sources can be converged using native protocols, protected, secured and exposed to analytics systems that drive value,
organizations can plan, implement and manage loosely coupled but seamless storage and applications.
DATA FLOW
Organizational data typically follows a linear data flow starting with various sources both consumer and corporate; ingested into a
store, analyzed and surfaced for actions that augment value creation for an organization; as shown in Figure 2. The five distinct
stages are typically broken down into three broad categories of data, analytics and application for easy reference from a holistic point
of view.
|
Figure 2. Data flow
1
EMC Digital Universe Study—with Research and Analysis by IDC, http://www.emc.com/leadership/digital-universe/index.htm
6
In the following section we detail each of the stages in the data flow process as we understand the implications of each of these
phases of the data flow. The choices at each of these stages have a profound impact in deriving value of the organizations data and
ultimately the data store.
INGEST
Data Ingestion is the process of obtaining and processing data for later use by storing in the most appropriate system for the
application. An effective ingestion methodology validates the data, prioritizes the sources and with reasonable speed and efficiency
commits data to storage. The velocity, volume and variety tax the capture speeds, throughput and efficiency of the ingest systems
and therefore the store as discussed in the following section. The ingest process can act as the starting point of a storage silo if the
organization planners do not look at the dataset holistically.
High velocity data- characterized by a continuous feed of data requires a specialized ingestion mechanism that typically captures the
continuous stream and commits the records in batches of varying sizes to storage for further processing. Capture strategies come in
two flavors, commit records to storage directly or use higher speed buffers as an intermediate step before storing to a more
persistent downstream system. Examples of high velocity data include clickstreams, video surveillance feeds etc.
The volume of data is not only a factor of the velocity but also the size of the data set generated in batches of non-streaming origins.
High volume data typically requires time to move from one point in the system to another depending on the network speed and
places burdens on both the network and storage. Optimization for volume as in large files can affect velocity and vice versa. High
definition video and audio are the typical examples of high volume data in the form of files- separate from streaming.
Variety places translation challenges at the data, file and protocol levels as disparate organic systems may need varying levels of
translation for data to be of value for the downstream processes. A combination of CRM, LOB, social media, web and mobile
information on customers is a good example of variety of data.
STORAGE
Storing data is typically dictated by the type of storage strategy namely block or file, the flow mechanism and the application. Over
the years, storage has evolved into an optimization problem between storage costs and performance. As the volume and variety of
data grows, the cost to reliably store, manage, secure, and protect it grows as well, particularly for data that is subject to compliance
and regulatory mandates like personal identifiable information (PII), financial transactions, and medical records. Adding the
availability component adds cost pressures at the expense of performance and vice versa.
The segregation started by the choice of ingestion system is further exacerbated by storage bringing about true silos or islands of
information catering to various application requirements of real-time, interactive or batch processing. As silos permeate the IT
infrastructure, hot spots arise in systems of heavier use, while capacity goes unused elsewhere. Downstream applications,
compliance, security and data policies can contribute to create further silos within silos- giving rise to a very complex system.
ANALYSIS
Data analysis technologies load, process, surface, secure, and manage data in ways that enable organizations to mine value from
their data. Traditional data analysis systems are expensive, and extending them beyond their critical purposes can place a heavy
burden on IT resources and costs. Integrating data from disparate sources adds complexity and management overhead that can be a
deterrent to most organizations looking to derive value from their data.
APPLICATION: SURFACE AND ACT
Post analysis, results and insights have to be surfaced for actions like e-discovery, post mortem analysis, business process
improvements, decision making or a host of other applications. Traditional systems use traditional protocols and access mechanisms
while new and emerging systems are redefining access requirements to data already stored within an organization. A system is not
complete unless it caters to the range of requirements placed by traditional and next-generation workloads, systems and processes.
BUSINESS CHALLENGES
As organizations face large datasets, data and application growth, they are observing a tremendous increase in costs both CAPEX &
OPEX; limitations in the ability to realize the value of the data; and protection and compliance lapses to name a few. These
challenges can be attributed to a combination of factors throughout the data flow process but fall in the following broad categories:
7
SILOS OR ISLANDS OF DATA
The ingestion strategy employed for real-time, interactive or batch applications originate a silo that diverges as the data flows
downstream. For example, if an organization choses to setup a combination of SAN- buffer stream of data and NAS -persistent
storage for a real-time application like customer targeting; while elsewhere, CRM analysis happens on a combination of DAS and
cloud. A third silo consisting of archived data uses a combination of DAS and NAS to run batch analytics- a typical scenario in
organizations today.
In the first silo, the compliance requirement for handling of PII data can pressure admins to choose between applying policy to the
entire silo or carve out a protected zone if the technology so supports. This adds complexity to the system without the inclusion of
access control for financial data to meet SEC requirements; or HIPPA for medical data. If the design of the system does not support
protections, organizations risk going out of compliance with large fines at a minimum and liability of data leaks causing massive
lawsuits on the other extreme.
Figure 3. Silos or islands of data
Also, the silos of data adds cost pressures due to inefficiencies addressed in the following section, locking of insights within silos at
the expense of business and management and protection inconsistencies. In many organizations, applying learnings from a batch
processing system to a real-time system may involve lengthy change management processes- creating friction across departments
or adding barriers to value generation.
INEFFICIENCIES ACROSS THE SYSTEM
If one dataset of say 10 terabytes is used for three different analyses by three systems, you will have a minimum of three copies
requiring three times the storage capacity. If this dataset grows by a 30% a year the scaling requirement grows by 90% annually-
demonstrating inefficiencies experienced by organizations at the most basic level. If one silo has lower utilization than the other,
hotspots arise in the system while capacity goes unutilized elsewhere. Inefficient utilization of the budget can arise as a result of low
value data residing on high cost, high performance storage.
Organizations are faced with management, datacenter footprint, power and cooling inefficiencies due to silos and hotspots in addition
to reconfigurations, migrations and complicated maintenance activities.
SECURITY AND COMPLIANCE
Organizations with silos are faced with duplicate and inconsistent application of policies, security measures and governance policies,
whereas organizations with shared infrastructure face access control violations. Working around these is typically time-consuming,
painful and diverts resources away from standard procedures as issues arise. Securing data against leaks or destruction- both
accidental and malicious; presents a separate set of challenges. These challenges are compounded in regulated sectors like finance,
healthcare and government.
TIME TO INSIGHTS
More users than ever before are mobile or geographically dispersed with access to a larger dataset as a basic requirement to perform
their duties effectively. Traditional systems operated within the confines of Corporate IT through a regulated and measured
interconnections with the external world. With the growth of cloud, mobility and devices; the capabilities of traditional systems are
being challenged in new ways increasing the time required to access, process and consume insights.
8
Figure 4. Enterprise storage challenges
Providing access while enforcing policies consistently across a wide range of approved and unapproved devices and platforms is a
growing challenge for today’s storage administrators. Traditional systems, silos and implementation strategies add latencies as
protection and security layers are added around the application or sources of information.
DATA LAKE
The data lake represents a paradigm shift from the linear data flow model. As data and the insights gleaned from it increase in value,
enterprise-wide consolidated storage is transformed into a hub around which the ingestion and consumption systems work (see
Figure 4). This enables enterprises to bring analytics to data and avoid expensive costs of multiple systems, storage and time for
ingestion and analysis.
Figure 5. Scale out data lake
By eliminating a number of parallel linear data flows, enterprises can consolidate vast amounts of their data into a single store—a
data lake—through a native and simple ingestion process. This data can be secured and analysis performed, insights surfaced, and
9
actions taken in an iterative manner as the organization and technology matures. Enterprises can thus eliminate the cost of having
silos or islands of information spread across their enterprises.
The scale-out data lake further enhances this paradigm by providing scaling capabilities in terms of capacity, performance, security,
and protection. The key characteristics of a scale-out data lake are that it:
• Accepts data from a variety of sources like file shares, archives, web applications, devices, and the cloud, in both streaming and
batch processes
• Enables access to this data for a variety of uses from conventional purposes to next-gen mobile, analytics, and cloud
applications
• Secures and safeguards data with the appropriate level of protection from highly critical data like medical records, financial
transactions, credit card data, and PII to website logs and temporary data that might not require any security
• Scales to meet the demands of future consolidation and growth as technology evolves and new possibilities emerge for applying
data to gain competitive advantage in the marketplace
• Provides a tiering ability that enables organizations to manage their costs without setting up specialized infrastructures for cost
optimization
• Maintains simplicity, even at the petabyte scale
ISILON SCALE-OUT DATA LAKE
Isilon enhances the data lake concept by enriching your storage with improved cost efficiencies, reduced risks, data protection,
security, compliance & governance while enabling you to get to insights faster. You can reduce the risks of your big data project
implementation, operational expenses and try out pilot projects on real business data before investing in a solution that meets your
exact business needs. Isilon is based on a fully distributed architecture that consists of modular hardware nodes arranged in a
cluster. As nodes are added, the file system expands dynamically scaling out capacity and performance without adding corresponding
administrative overhead.
Multiple access methods
Isilon natively supports multiple protocols like SMB, NFS, File Transfer Protocol (FTP), and Hypertext Transfer Protocol (HTTP) for
traditional workloads, and HDFS for emerging workloads like Hadoop analytics. By the very nature, a shared storage with multiple
access methods eliminates storage silos bringing efficiencies and consistency to your IT infrastructure. This enables batch, real-time
or interactive applications and systems to store and access data from one shared storage pool without the need for any migrations,
loading, or conversion. Accessing data for read or write purposes are achieved at the protocol level. This implies that data can be
created by any of the myriad systems, ingested into the data lake using a natively supported protocol such as SMB for Windows or
Mac, and accessed or modified seamlessly using another protocol like NFS, FTP or HDFS.
Multiprotocol support enables the storage infrastructure to provide access to applications that leverage third platform protocols like
HTTP or HDFS, which drive emerging workloads. Adding future protocol support, data access mechanisms, or interfaces can be easily
achieved to scale the interoperability of the data lake. Without a data lake, interoperability would require an expensive and time-
consuming sequence of operations on data across multiple silos, or even costly and inefficient data duplication.
Cost efficiency
You can achieve great cost efficiencies by investing in the storage capacity and capability required today- smaller than traditional or
Hadoop requirements; scale in smaller steps proportionately to the data growth; tiering data according to value and; simplified
management.
Isilon scale-out NAS can scale from 18 terabytes to over 50 petabytes in a single cluster so you do not have to overprovision your
storage infrastructure. With over 80% utilization, you can keep your CAPEX in check with a smaller footprint. SmartDedupe provides
the ability to reduce the physical data footprint by locating and sharing common elements across files with minimal impact to
performance during writes or concurrent reads. You will observe a reduction in storage expansion costs, typically in the range of 30%
with smaller storage capacity, power and cooling and of course less rack space requirement.
10
Figure 6. Isilon Scale-out data lake
Isilon enables you to tier the data lake to further drive cost efficiencies by optimizing performance, throughput, and density. Using
policy-based tiering, you can reduce the cost of storing your data based on the inherent value and utility of the data. Tiering using
EMC Isilon S-Series, X-Series, NL-Series and HD-Series nodes is shown in Figure 7. High-value, readily needed data can be kept at a
high-performance tier, whereas low-value, infrequently used data can be moved to a more cost-effective active archive without
having any impact on your application or analytics infrastructure.
Figure 7 . Tiering using Isilon storage nodes
Under the covers, an Isilon cluster provides automatic storage balancing and deduplication that not only enhances storage efficiency
but also enables storage administrators to weather hardware outages better.
Reduce risks
Any large and growing data infrastructure risks can be classified as
11
• Ability to successfully implement an initial solution
• Ability and flexibility to scale solution as the environment changes
• Protect against current and future data loss
• Protect against data theft
These risks are further amplified as the dependency between the ingestion, storage and analytics system remains strong. With a
loosely coupled system you will be able to build mitigation strategies. A data lake based on Isilon is able to provide mitigation for all
these components. By decoupling storage from compute and using multiple access methods, you can deploy a storage system
capable of ingesting the data you need for your solution effortlessly. By using protection and security features outlined in the
following sections, you can ensure your data is safe and resilient to external forces.
Isilon is the only scale-out NAS to support multiple distributions of Hadoop through native HDFS implementation. This allows you the
flexibility to analyze, surface and act on data using the best tool for your application and scale storage compute or both as your
needs change.
Protect and secure data assets
Isilon storage systems are highly resilient and provide unmatched data protection and availability. Isilon uses the proven Reed-
Solomon erasure encoding algorithm rather than RAID to provide a level of data protection that goes far beyond traditional storage
systems. With N+1 protection, data is 100 percent available even if a single drive or a complete node fails, which is comparable to
RAID 5 in conventional storage. You can also deploy Isilon for N+ 2 protections, which allows two components to fail within the
system, similar to RAID 6; N+3 or N+ 4 protections, where three or four components can fail, keeping the data 100 percent
available.
Isilon is able to recover from hardware failures faster than traditional systems as recovery entails rebuilding lost data as opposed to
the entire disk. In a data Lake, this can have a huge impact on the operation as storage is shared across systems with varying levels
of performance demands. A truly distributed storage like Isilon can orchestrate all the nodes to participate in the restoration or
recovery of data from the outages on a dedicated backend infiniband network speeding up recovery without impacting front end
performance.
As organizations view data as an asset, securing data for inherent value is even more important than meeting regulatory compliance
and corporate governance requirements. Isilon helps organization address security needs by providing robust and flexible security
features outlined below as described in the figure below.
12
Figure 8. Authentication zones
• Secure role separation enables roles-based access control (RBAC), where a clear separation between storage administration and
file system access is enforced.
• Authentication zones that serve as secure, isolated storage pools for insulating departments within the organization from access
to data that they are not supposed to see. For example: legal, financial, PII and employee data can be seen by only authorized
employees in the authentication zone.
• Write once-read many (WORM) protection is achieved by using SmartLock software, which prevents accidental or malicious
alterations or deletion of data- a key requirement for governance and compliance.
• Data at rest encryptions through the use of Self-Encrypting drives enables organizations to ensure physical loss of hardware
does not equate to data leak.
Faster time to insights
By utilizing a shared infrastructure, you can consolidate data from multiple islands into one single storage system based on Isilon.
This is made possible through he multiple access mechanisms at the protocol level where data can be ingested into the Isilon store
from a wide variety of sources and surfaced for use by others. This would eliminate the need for data migration or extraction-
translation-loading (ETL) operations typical with any data analytics solution, saving you precious time and resources. You can then
run Hadoop analytics with your dataset in-place. As depicted in the figure below. By using a large loosely coupled shared scalable
storage on a typical dataset of around 100 terabytes, you can save over 24 hours of data moving time on a 10 GBps network; time
that you can use to actually generate insights from your data.
Isilon is the only scale-out NAS that works with multiple distributions of Hadoop from a variety of vendors. This enables you to try
out tools from all of these vendors at the same time if necessary to find the best solution to meet your business requirements.
The data lake is the key enabler for driving business value into customer environments through the multiprotocol, multi-access,
tiered, single namespace, protected and scalable data repository. Leveraging the scale-out data lake, customers can consolidate
multiple disparate islands of storage into a single cohesive and unified data repository that is easier to manage and more cost-
effective.
Figure 9. Faster insights
13
STORAGE AND DATA SERVICES
A scale-out data lake leverages the industry leading enterprise-grade storage and data services that extend the business value of
your data. Isilon Infrastructure Software provides powerful storage management software that helps protect your data assets,
control costs, and optimize the storage resources and system performance of your scale-out data lake.
The data management capabilities include EMC Isilon InsightIQ®
, MobileIQ, SmartDedupe, SmartPools®
, and SmartQuotas™, which
together help you improve the ROI of your data lake infrastructure. For more information on these data management services,
review the Isilon OneFS whitepaper available here.
The scale-out data lake is further strengthened by proven data protection capabilities provided by EMC Isilon SmartConnect™,
SmartLock®
, SnapshotIQ™, and SyncIQ®
. For more information on these data protection services, review the Isilon data availability
whitepaper available here.
CONSUMPTION MODELS
Isilon provides a number of consumption models that enables you to choose a strategy that is best for your business. The simplest
and most common strategy is an appliance that is a preinstalled combination of hardware and software. You can choose a converged
infrastructure solution in conjunction with EMC Vblock®
. Or, you may choose a cloud-based utility infrastructure-as-a-service; in
which you pay for the service based on usage. The key is that you have choices in the storage purchasing model and flexibility to fit
your business procurement needs.
CONCLUSION
Given that unstructured data will be doubling every two years, enterprises need higher efficiencies, architectural simplicity and more
protection as they scale capacity and capabilities. A scale-out data lake provides key capabiities to eliminate silos of data; secure and
protect information assets; support existing and next generation workloads while speeding time to insights. Starting with a scale-out
data lake, organizations can
1. Invest in the infrastructure today to get started
2. Realize the value of data, store, process, and analyze it- in the most cost effective manner; and
3. Grow capabilities as needs grow in the future.
This enables organizations to store everything, analyze anything and build a solution with the best ROI. By de-coupling storage from
analysis and application, organizations gain flexibility to choose between a larger number of strategies to deploy solutions without
risking data loss, cost overruns and dataset leaks. The data lake based on Isilon offers organizations this along with a capability to
simplify the IT infrastructure, tier, secure and protect data efficiently; and get to insights faster.
As a large dataset is the reason big data exists, organizations can start with the data. Understand the value locked in their large and
growing datasets by running pilots; not worry about ingesting or surfacing and; pilot applications or emerging technologies- to make
better informed strategic decisions.
REFERENCES
EMC Digital Universe Study—with Research and Analysis by IDC (http://www.emc.com/leadership/digital-universe/index.htm)
EMC Isilon OneFS Operating System http://www.emc.com/collateral/hardware/white-papers/h8202-isilon-onefs-wp.pdf
High Availability and Data Protection with EMC Isilon scale-out NAS http://www.emc.com/collateral/hardware/white-papers/h10588-
isilon-data-availability-protection-wp.pdf
ABOUT EMC
EMC Corporation is a global leader in enabling businesses and service providers to transform their operations and deliver IT as a
service. Fundamental to this transformation is cloud computing. Through innovative products and services, EMC accelerates the
journey to cloud computing, helping IT departments to store, manage, protect, and analyze their most valuable asset—information—
in a more agile, trusted, and cost-efficient way. Additional information about EMC can be found at www.EMC.com.

Más contenido relacionado

La actualidad más candente

SIM - Mc leod ch06
SIM - Mc leod ch06SIM - Mc leod ch06
SIM - Mc leod ch06Welly Tjoe
 
SIM - Mc leod ch04
SIM - Mc leod ch04SIM - Mc leod ch04
SIM - Mc leod ch04Welly Tjoe
 
EMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big DataEMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big DataEMC
 
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONBRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONijmnct
 
Model of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computingModel of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computingijcseit
 
HitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitachi Vantara
 
Say Goodbye to DIY Data Centers
Say Goodbye to DIY Data CentersSay Goodbye to DIY Data Centers
Say Goodbye to DIY Data CentersRackspace
 
SIM - Mc leod ch05
SIM - Mc leod ch05SIM - Mc leod ch05
SIM - Mc leod ch05Welly Tjoe
 
Beyond backup to intelligent data management
Beyond backup to intelligent data managementBeyond backup to intelligent data management
Beyond backup to intelligent data managementSirius
 
Frequently Used Terms in Data Centers
Frequently Used Terms in Data CentersFrequently Used Terms in Data Centers
Frequently Used Terms in Data CentersHTS Hosting
 
Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...redpel dot com
 
Chap 9 & 10
Chap 9 & 10Chap 9 & 10
Chap 9 & 10umair002
 
SIM - Mc leod ch02
SIM - Mc leod ch02SIM - Mc leod ch02
SIM - Mc leod ch02Welly Tjoe
 
Storing Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance ChallengesStoring Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance ChallengesTony Pearson
 
ERP Systems - The Next Legacy Dinosaur
ERP Systems - The Next Legacy DinosaurERP Systems - The Next Legacy Dinosaur
ERP Systems - The Next Legacy Dinosaureprentise
 
Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...
Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...
Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...Microsoft Private Cloud
 
Data Centers and IoT
Data Centers and IoTData Centers and IoT
Data Centers and IoTsflaig
 
Content Centric Applications
Content Centric ApplicationsContent Centric Applications
Content Centric ApplicationsNetApp
 
Smarter Data Storage.PDF
Smarter Data Storage.PDFSmarter Data Storage.PDF
Smarter Data Storage.PDFBillington K
 
Solix Big Data Suite
Solix Big Data SuiteSolix Big Data Suite
Solix Big Data SuiteLindaWatson19
 

La actualidad más candente (20)

SIM - Mc leod ch06
SIM - Mc leod ch06SIM - Mc leod ch06
SIM - Mc leod ch06
 
SIM - Mc leod ch04
SIM - Mc leod ch04SIM - Mc leod ch04
SIM - Mc leod ch04
 
EMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big DataEMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big Data
 
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONBRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
 
Model of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computingModel of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computing
 
HitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution Profile
 
Say Goodbye to DIY Data Centers
Say Goodbye to DIY Data CentersSay Goodbye to DIY Data Centers
Say Goodbye to DIY Data Centers
 
SIM - Mc leod ch05
SIM - Mc leod ch05SIM - Mc leod ch05
SIM - Mc leod ch05
 
Beyond backup to intelligent data management
Beyond backup to intelligent data managementBeyond backup to intelligent data management
Beyond backup to intelligent data management
 
Frequently Used Terms in Data Centers
Frequently Used Terms in Data CentersFrequently Used Terms in Data Centers
Frequently Used Terms in Data Centers
 
Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...
 
Chap 9 & 10
Chap 9 & 10Chap 9 & 10
Chap 9 & 10
 
SIM - Mc leod ch02
SIM - Mc leod ch02SIM - Mc leod ch02
SIM - Mc leod ch02
 
Storing Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance ChallengesStoring Archive Data to meet Compliance Challenges
Storing Archive Data to meet Compliance Challenges
 
ERP Systems - The Next Legacy Dinosaur
ERP Systems - The Next Legacy DinosaurERP Systems - The Next Legacy Dinosaur
ERP Systems - The Next Legacy Dinosaur
 
Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...
Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...
Microsoft India - SQL Server The Forrester Wave Enterprise Database Managemen...
 
Data Centers and IoT
Data Centers and IoTData Centers and IoT
Data Centers and IoT
 
Content Centric Applications
Content Centric ApplicationsContent Centric Applications
Content Centric Applications
 
Smarter Data Storage.PDF
Smarter Data Storage.PDFSmarter Data Storage.PDF
Smarter Data Storage.PDF
 
Solix Big Data Suite
Solix Big Data SuiteSolix Big Data Suite
Solix Big Data Suite
 

Destacado

Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)
Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)
Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)Marco Frullanti
 
Banking game 2 borrowers 1 bank closed economy
Banking game 2 borrowers 1 bank closed economyBanking game 2 borrowers 1 bank closed economy
Banking game 2 borrowers 1 bank closed economyTravis Klein
 
02 allocative efficiency
02 allocative efficiency02 allocative efficiency
02 allocative efficiencyTravis Klein
 
psychology of old age
psychology of old agepsychology of old age
psychology of old ageDeepika Singh
 
Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design PatternsEMC
 
FeliCa Liteの片側認証
FeliCa Liteの片側認証FeliCa Liteの片側認証
FeliCa Liteの片側認証Hirokuma Ueno
 
Fri lenin and trotsky
Fri lenin and trotskyFri lenin and trotsky
Fri lenin and trotskyTravis Klein
 
Transforming Expectations for Treat-Intelligence Sharing
Transforming Expectations for Treat-Intelligence SharingTransforming Expectations for Treat-Intelligence Sharing
Transforming Expectations for Treat-Intelligence SharingEMC
 
Tues palace of versailles
Tues palace of versaillesTues palace of versailles
Tues palace of versaillesTravis Klein
 
Thur roman empire lang
Thur roman empire langThur roman empire lang
Thur roman empire langTravis Klein
 
What photosensitive epilepsy
What photosensitive epilepsyWhat photosensitive epilepsy
What photosensitive epilepsyKamthon Sarawan
 
Friday columbian exchange
Friday columbian exchangeFriday columbian exchange
Friday columbian exchangeTravis Klein
 

Destacado (20)

Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)
Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)
Federmanager Bologna - Presentazione dei servizi (aggiornata a ottobre 2014)
 
Monopolistic comp
Monopolistic compMonopolistic comp
Monopolistic comp
 
Banking game 2 borrowers 1 bank closed economy
Banking game 2 borrowers 1 bank closed economyBanking game 2 borrowers 1 bank closed economy
Banking game 2 borrowers 1 bank closed economy
 
02 allocative efficiency
02 allocative efficiency02 allocative efficiency
02 allocative efficiency
 
psychology of old age
psychology of old agepsychology of old age
psychology of old age
 
Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design Patterns
 
Deeltopia
DeeltopiaDeeltopia
Deeltopia
 
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware UpdatingHands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
 
FeliCa Liteの片側認証
FeliCa Liteの片側認証FeliCa Liteの片側認証
FeliCa Liteの片側認証
 
Perfect comp
Perfect compPerfect comp
Perfect comp
 
Printing press
Printing pressPrinting press
Printing press
 
Ppp.doc
Ppp.docPpp.doc
Ppp.doc
 
Fri lenin and trotsky
Fri lenin and trotskyFri lenin and trotsky
Fri lenin and trotsky
 
2015 day 3
2015 day 32015 day 3
2015 day 3
 
Transforming Expectations for Treat-Intelligence Sharing
Transforming Expectations for Treat-Intelligence SharingTransforming Expectations for Treat-Intelligence Sharing
Transforming Expectations for Treat-Intelligence Sharing
 
Tues palace of versailles
Tues palace of versaillesTues palace of versailles
Tues palace of versailles
 
Thur roman empire lang
Thur roman empire langThur roman empire lang
Thur roman empire lang
 
What photosensitive epilepsy
What photosensitive epilepsyWhat photosensitive epilepsy
What photosensitive epilepsy
 
Friday columbian exchange
Friday columbian exchangeFriday columbian exchange
Friday columbian exchange
 
Client Awards 2015
Client Awards 2015Client Awards 2015
Client Awards 2015
 

Similar a The EMC Isilon Scale-Out Data Lake

Scale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC IsilonScale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC IsilonEMC
 
A New Frontier in Securing Sensitive Information – Taneja Group, April 2007
A New Frontier in Securing Sensitive Information – Taneja Group, April 2007A New Frontier in Securing Sensitive Information – Taneja Group, April 2007
A New Frontier in Securing Sensitive Information – Taneja Group, April 2007LindaWatson19
 
Solix Cloud – Managing Data Growth with Database Archiving and Application Re...
Solix Cloud – Managing Data Growth with Database Archiving and Application Re...Solix Cloud – Managing Data Growth with Database Archiving and Application Re...
Solix Cloud – Managing Data Growth with Database Archiving and Application Re...LindaWatson19
 
White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System  White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System EMC
 
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...Hitachi Vantara
 
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...LindaWatson19
 
Sample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NASSample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NASMike Alvarado
 
Enterprise Storage Solutions for Overcoming Big Data and Analytics Challenges
Enterprise Storage Solutions for Overcoming Big Data and Analytics ChallengesEnterprise Storage Solutions for Overcoming Big Data and Analytics Challenges
Enterprise Storage Solutions for Overcoming Big Data and Analytics ChallengesINFINIDAT
 
IRJET- A Novel Framework for Three Level Isolation in Cloud System based ...
IRJET-  	  A Novel Framework for Three Level Isolation in Cloud System based ...IRJET-  	  A Novel Framework for Three Level Isolation in Cloud System based ...
IRJET- A Novel Framework for Three Level Isolation in Cloud System based ...IRJET Journal
 
Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020Abaram Network Solutions
 
Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)Kaushik Rajan
 
Hitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualizationHitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualizationHitachi Vantara
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousingumesh patil
 

Similar a The EMC Isilon Scale-Out Data Lake (20)

Scale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC IsilonScale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC Isilon
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
A New Frontier in Securing Sensitive Information – Taneja Group, April 2007
A New Frontier in Securing Sensitive Information – Taneja Group, April 2007A New Frontier in Securing Sensitive Information – Taneja Group, April 2007
A New Frontier in Securing Sensitive Information – Taneja Group, April 2007
 
Solix Cloud – Managing Data Growth with Database Archiving and Application Re...
Solix Cloud – Managing Data Growth with Database Archiving and Application Re...Solix Cloud – Managing Data Growth with Database Archiving and Application Re...
Solix Cloud – Managing Data Growth with Database Archiving and Application Re...
 
White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System  White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System
 
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...
In the Age of Unstructured Data, Enterprise-Class Unified Storage Gives IT a ...
 
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
 
H1802045666
H1802045666H1802045666
H1802045666
 
Sample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NASSample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NAS
 
The storage matrix netmagic
The storage matrix   netmagicThe storage matrix   netmagic
The storage matrix netmagic
 
Enterprise Storage Solutions for Overcoming Big Data and Analytics Challenges
Enterprise Storage Solutions for Overcoming Big Data and Analytics ChallengesEnterprise Storage Solutions for Overcoming Big Data and Analytics Challenges
Enterprise Storage Solutions for Overcoming Big Data and Analytics Challenges
 
IRJET- A Novel Framework for Three Level Isolation in Cloud System based ...
IRJET-  	  A Novel Framework for Three Level Isolation in Cloud System based ...IRJET-  	  A Novel Framework for Three Level Isolation in Cloud System based ...
IRJET- A Novel Framework for Three Level Isolation in Cloud System based ...
 
Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020
 
50120140507005 2
50120140507005 250120140507005 2
50120140507005 2
 
50120140507005
5012014050700550120140507005
50120140507005
 
Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)
 
Hitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualizationHitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualization
 
Today's Need To Manage The Storage Polymorphism
Today's Need To Manage The Storage PolymorphismToday's Need To Manage The Storage Polymorphism
Today's Need To Manage The Storage Polymorphism
 
Wp security-data-safe
Wp security-data-safeWp security-data-safe
Wp security-data-safe
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 

Más de EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Más de EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Último

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Último (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

The EMC Isilon Scale-Out Data Lake

  • 1. EMC WHITE PAPER THE EMC ISILON SCALE-OUT DATA LAKE Key capabilities ABSTRACT This white paper provides an introduction to the EMC Isilon scale-out data lake as the key enabler to store, manage, and protect unstructured data for traditional and emerging workloads. Business decision makers and architects can leverage the information provided here to make key strategy and implementation decisions for their storage infrastructure. December 2014
  • 2. Copyright © 2015 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. EMC2 , EMC, the EMC logo, Isilon, InsightIQ, OneFS, SmartConnect, SmartLock, SmartPools, SmartQuotas, SnapshotIQ, SyncIQ, and Vblock are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. Part Number H13172
  • 3. TABLE OF CONTENTS EXECUTIVE SUMMARY 4 Audience 4 Terminology 4 OVERVIEW 4 DATA FLOW 5 Ingest 6 Storage 6 Analysis 6 Application: Surface and Act 6 BUSINESS CHALLENGES 6 Silos or islands of data 7 Inefficiencies across the system 7 Security and compliance 7 Inflexible architecture for innovation Error! Bookmark not defined. DATA LAKE 8 ISILON SCALE-OUT DATA LAKE 9 Multiple access methods 9 Cost efficiency 9 Reduce risks 10 Protect and secure data assets 11 Faster time to insights 12 ERROR! BOOKMARK NOT DEFINED. STORAGE AND DATA SERVICES 13 CONSUMPTION MODELS 13 CONCLUSION 13 REFERENCES 13 ABOUT EMC 13
  • 4. 4 EXECUTIVE SUMMARY Data is growing rapidly as the numbers of people using digital devices interact with systems and networks grow across the globe. A Majority of this data growth is in the unstructured form—and as organizations are experiencing, contains valuable insights that could be used to improve business results. The technology to capture, store and analyze this data is maturing rapidly, as enterprises are looking for ways to effectively handle the data growth. By some industry estimates, over 80 percent of the new storage capacity deployed in organizations around the world will be for unstructured data. EMC® Isilon® scale-out network-attached storage (NAS) provides a simple, scalable, and efficient platform to store massive amounts of unstructured data and enable various applications to create a scalable and accessible data repository without the overhead associated with traditional storage systems. Isilon enables organizations to build a scale-out data lake where they can store their current data, and scale capacity, performance, or protection as their business data grows in the future. The scale-out data lake helps lower storage costs by efficient storage utilization, eliminate islands or silos of storage, and lower storage management costs of migration, security and protection. EMC Isilon OneFS® , the intelligence behind Isilon systems, is the industry’s leading scale-out NAS operating system, which is known for its massive capacity, operational simplicity, extreme performance, and unmatched storage utilization. With proven enterprise- grade protection and security capabilities, Isilon is an ideal platform to meet a wide variety of storage needs. AUDIENCE This white paper is intended for business decision makers, IT managers, architects, and implementers. By leveraging the Isilon scale-out data lake—a key enabler for storing and managing massive quantities of unstructured data—enterprises can build their storage strategies and implement their infrastructure to maximize the return of their IT investments. TERMINOLOGY The acronyms used in this paper are summarized in the Table 1. Table 1. Acronyms used in this paper OVERVIEW According to an IDC “The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things” study sponsored by EMC, the digital universe will grow from 4.4 zettabytes (1 trillion gigabytes) in 2013 to 44 zettabytes in 2020, doubling in size every two years. Although enterprises create only 30 percent of this data (1.5 ZB in 2013), they come in contact with 85 percent (2.3 ZB in 2013) of it, and have some liability associated with the data. IDC further estimates that 22 percent of the data is a candidate for analysis and contains valuable information that organizations can use to make critical decisions. Figure 1 shows this trend for enterprise data. ACRONYM DESCRIPTION CIFS Common Internet File System DAS direct-attached storage HDFS Hadoop Distributed File System LAN local area network NAS network-attached storage NFS network file system SAN storage area network SMB Server Message Block
  • 5. 5 Figure 1. Enterprise data1 Not all of this data is stored by the enterprises in the datacenter for cost, liability or process reasons. Isilon scale-out NAS provides key capabilities to build a data lake that physically de-couple compute from storage without affecting the seamless nature of the data flow outlined below. You can leverage a data lake to get the most value from your data. Data from a variety of sources can be converged using native protocols, protected, secured and exposed to analytics systems that drive value, organizations can plan, implement and manage loosely coupled but seamless storage and applications. DATA FLOW Organizational data typically follows a linear data flow starting with various sources both consumer and corporate; ingested into a store, analyzed and surfaced for actions that augment value creation for an organization; as shown in Figure 2. The five distinct stages are typically broken down into three broad categories of data, analytics and application for easy reference from a holistic point of view. | Figure 2. Data flow 1 EMC Digital Universe Study—with Research and Analysis by IDC, http://www.emc.com/leadership/digital-universe/index.htm
  • 6. 6 In the following section we detail each of the stages in the data flow process as we understand the implications of each of these phases of the data flow. The choices at each of these stages have a profound impact in deriving value of the organizations data and ultimately the data store. INGEST Data Ingestion is the process of obtaining and processing data for later use by storing in the most appropriate system for the application. An effective ingestion methodology validates the data, prioritizes the sources and with reasonable speed and efficiency commits data to storage. The velocity, volume and variety tax the capture speeds, throughput and efficiency of the ingest systems and therefore the store as discussed in the following section. The ingest process can act as the starting point of a storage silo if the organization planners do not look at the dataset holistically. High velocity data- characterized by a continuous feed of data requires a specialized ingestion mechanism that typically captures the continuous stream and commits the records in batches of varying sizes to storage for further processing. Capture strategies come in two flavors, commit records to storage directly or use higher speed buffers as an intermediate step before storing to a more persistent downstream system. Examples of high velocity data include clickstreams, video surveillance feeds etc. The volume of data is not only a factor of the velocity but also the size of the data set generated in batches of non-streaming origins. High volume data typically requires time to move from one point in the system to another depending on the network speed and places burdens on both the network and storage. Optimization for volume as in large files can affect velocity and vice versa. High definition video and audio are the typical examples of high volume data in the form of files- separate from streaming. Variety places translation challenges at the data, file and protocol levels as disparate organic systems may need varying levels of translation for data to be of value for the downstream processes. A combination of CRM, LOB, social media, web and mobile information on customers is a good example of variety of data. STORAGE Storing data is typically dictated by the type of storage strategy namely block or file, the flow mechanism and the application. Over the years, storage has evolved into an optimization problem between storage costs and performance. As the volume and variety of data grows, the cost to reliably store, manage, secure, and protect it grows as well, particularly for data that is subject to compliance and regulatory mandates like personal identifiable information (PII), financial transactions, and medical records. Adding the availability component adds cost pressures at the expense of performance and vice versa. The segregation started by the choice of ingestion system is further exacerbated by storage bringing about true silos or islands of information catering to various application requirements of real-time, interactive or batch processing. As silos permeate the IT infrastructure, hot spots arise in systems of heavier use, while capacity goes unused elsewhere. Downstream applications, compliance, security and data policies can contribute to create further silos within silos- giving rise to a very complex system. ANALYSIS Data analysis technologies load, process, surface, secure, and manage data in ways that enable organizations to mine value from their data. Traditional data analysis systems are expensive, and extending them beyond their critical purposes can place a heavy burden on IT resources and costs. Integrating data from disparate sources adds complexity and management overhead that can be a deterrent to most organizations looking to derive value from their data. APPLICATION: SURFACE AND ACT Post analysis, results and insights have to be surfaced for actions like e-discovery, post mortem analysis, business process improvements, decision making or a host of other applications. Traditional systems use traditional protocols and access mechanisms while new and emerging systems are redefining access requirements to data already stored within an organization. A system is not complete unless it caters to the range of requirements placed by traditional and next-generation workloads, systems and processes. BUSINESS CHALLENGES As organizations face large datasets, data and application growth, they are observing a tremendous increase in costs both CAPEX & OPEX; limitations in the ability to realize the value of the data; and protection and compliance lapses to name a few. These challenges can be attributed to a combination of factors throughout the data flow process but fall in the following broad categories:
  • 7. 7 SILOS OR ISLANDS OF DATA The ingestion strategy employed for real-time, interactive or batch applications originate a silo that diverges as the data flows downstream. For example, if an organization choses to setup a combination of SAN- buffer stream of data and NAS -persistent storage for a real-time application like customer targeting; while elsewhere, CRM analysis happens on a combination of DAS and cloud. A third silo consisting of archived data uses a combination of DAS and NAS to run batch analytics- a typical scenario in organizations today. In the first silo, the compliance requirement for handling of PII data can pressure admins to choose between applying policy to the entire silo or carve out a protected zone if the technology so supports. This adds complexity to the system without the inclusion of access control for financial data to meet SEC requirements; or HIPPA for medical data. If the design of the system does not support protections, organizations risk going out of compliance with large fines at a minimum and liability of data leaks causing massive lawsuits on the other extreme. Figure 3. Silos or islands of data Also, the silos of data adds cost pressures due to inefficiencies addressed in the following section, locking of insights within silos at the expense of business and management and protection inconsistencies. In many organizations, applying learnings from a batch processing system to a real-time system may involve lengthy change management processes- creating friction across departments or adding barriers to value generation. INEFFICIENCIES ACROSS THE SYSTEM If one dataset of say 10 terabytes is used for three different analyses by three systems, you will have a minimum of three copies requiring three times the storage capacity. If this dataset grows by a 30% a year the scaling requirement grows by 90% annually- demonstrating inefficiencies experienced by organizations at the most basic level. If one silo has lower utilization than the other, hotspots arise in the system while capacity goes unutilized elsewhere. Inefficient utilization of the budget can arise as a result of low value data residing on high cost, high performance storage. Organizations are faced with management, datacenter footprint, power and cooling inefficiencies due to silos and hotspots in addition to reconfigurations, migrations and complicated maintenance activities. SECURITY AND COMPLIANCE Organizations with silos are faced with duplicate and inconsistent application of policies, security measures and governance policies, whereas organizations with shared infrastructure face access control violations. Working around these is typically time-consuming, painful and diverts resources away from standard procedures as issues arise. Securing data against leaks or destruction- both accidental and malicious; presents a separate set of challenges. These challenges are compounded in regulated sectors like finance, healthcare and government. TIME TO INSIGHTS More users than ever before are mobile or geographically dispersed with access to a larger dataset as a basic requirement to perform their duties effectively. Traditional systems operated within the confines of Corporate IT through a regulated and measured interconnections with the external world. With the growth of cloud, mobility and devices; the capabilities of traditional systems are being challenged in new ways increasing the time required to access, process and consume insights.
  • 8. 8 Figure 4. Enterprise storage challenges Providing access while enforcing policies consistently across a wide range of approved and unapproved devices and platforms is a growing challenge for today’s storage administrators. Traditional systems, silos and implementation strategies add latencies as protection and security layers are added around the application or sources of information. DATA LAKE The data lake represents a paradigm shift from the linear data flow model. As data and the insights gleaned from it increase in value, enterprise-wide consolidated storage is transformed into a hub around which the ingestion and consumption systems work (see Figure 4). This enables enterprises to bring analytics to data and avoid expensive costs of multiple systems, storage and time for ingestion and analysis. Figure 5. Scale out data lake By eliminating a number of parallel linear data flows, enterprises can consolidate vast amounts of their data into a single store—a data lake—through a native and simple ingestion process. This data can be secured and analysis performed, insights surfaced, and
  • 9. 9 actions taken in an iterative manner as the organization and technology matures. Enterprises can thus eliminate the cost of having silos or islands of information spread across their enterprises. The scale-out data lake further enhances this paradigm by providing scaling capabilities in terms of capacity, performance, security, and protection. The key characteristics of a scale-out data lake are that it: • Accepts data from a variety of sources like file shares, archives, web applications, devices, and the cloud, in both streaming and batch processes • Enables access to this data for a variety of uses from conventional purposes to next-gen mobile, analytics, and cloud applications • Secures and safeguards data with the appropriate level of protection from highly critical data like medical records, financial transactions, credit card data, and PII to website logs and temporary data that might not require any security • Scales to meet the demands of future consolidation and growth as technology evolves and new possibilities emerge for applying data to gain competitive advantage in the marketplace • Provides a tiering ability that enables organizations to manage their costs without setting up specialized infrastructures for cost optimization • Maintains simplicity, even at the petabyte scale ISILON SCALE-OUT DATA LAKE Isilon enhances the data lake concept by enriching your storage with improved cost efficiencies, reduced risks, data protection, security, compliance & governance while enabling you to get to insights faster. You can reduce the risks of your big data project implementation, operational expenses and try out pilot projects on real business data before investing in a solution that meets your exact business needs. Isilon is based on a fully distributed architecture that consists of modular hardware nodes arranged in a cluster. As nodes are added, the file system expands dynamically scaling out capacity and performance without adding corresponding administrative overhead. Multiple access methods Isilon natively supports multiple protocols like SMB, NFS, File Transfer Protocol (FTP), and Hypertext Transfer Protocol (HTTP) for traditional workloads, and HDFS for emerging workloads like Hadoop analytics. By the very nature, a shared storage with multiple access methods eliminates storage silos bringing efficiencies and consistency to your IT infrastructure. This enables batch, real-time or interactive applications and systems to store and access data from one shared storage pool without the need for any migrations, loading, or conversion. Accessing data for read or write purposes are achieved at the protocol level. This implies that data can be created by any of the myriad systems, ingested into the data lake using a natively supported protocol such as SMB for Windows or Mac, and accessed or modified seamlessly using another protocol like NFS, FTP or HDFS. Multiprotocol support enables the storage infrastructure to provide access to applications that leverage third platform protocols like HTTP or HDFS, which drive emerging workloads. Adding future protocol support, data access mechanisms, or interfaces can be easily achieved to scale the interoperability of the data lake. Without a data lake, interoperability would require an expensive and time- consuming sequence of operations on data across multiple silos, or even costly and inefficient data duplication. Cost efficiency You can achieve great cost efficiencies by investing in the storage capacity and capability required today- smaller than traditional or Hadoop requirements; scale in smaller steps proportionately to the data growth; tiering data according to value and; simplified management. Isilon scale-out NAS can scale from 18 terabytes to over 50 petabytes in a single cluster so you do not have to overprovision your storage infrastructure. With over 80% utilization, you can keep your CAPEX in check with a smaller footprint. SmartDedupe provides the ability to reduce the physical data footprint by locating and sharing common elements across files with minimal impact to performance during writes or concurrent reads. You will observe a reduction in storage expansion costs, typically in the range of 30% with smaller storage capacity, power and cooling and of course less rack space requirement.
  • 10. 10 Figure 6. Isilon Scale-out data lake Isilon enables you to tier the data lake to further drive cost efficiencies by optimizing performance, throughput, and density. Using policy-based tiering, you can reduce the cost of storing your data based on the inherent value and utility of the data. Tiering using EMC Isilon S-Series, X-Series, NL-Series and HD-Series nodes is shown in Figure 7. High-value, readily needed data can be kept at a high-performance tier, whereas low-value, infrequently used data can be moved to a more cost-effective active archive without having any impact on your application or analytics infrastructure. Figure 7 . Tiering using Isilon storage nodes Under the covers, an Isilon cluster provides automatic storage balancing and deduplication that not only enhances storage efficiency but also enables storage administrators to weather hardware outages better. Reduce risks Any large and growing data infrastructure risks can be classified as
  • 11. 11 • Ability to successfully implement an initial solution • Ability and flexibility to scale solution as the environment changes • Protect against current and future data loss • Protect against data theft These risks are further amplified as the dependency between the ingestion, storage and analytics system remains strong. With a loosely coupled system you will be able to build mitigation strategies. A data lake based on Isilon is able to provide mitigation for all these components. By decoupling storage from compute and using multiple access methods, you can deploy a storage system capable of ingesting the data you need for your solution effortlessly. By using protection and security features outlined in the following sections, you can ensure your data is safe and resilient to external forces. Isilon is the only scale-out NAS to support multiple distributions of Hadoop through native HDFS implementation. This allows you the flexibility to analyze, surface and act on data using the best tool for your application and scale storage compute or both as your needs change. Protect and secure data assets Isilon storage systems are highly resilient and provide unmatched data protection and availability. Isilon uses the proven Reed- Solomon erasure encoding algorithm rather than RAID to provide a level of data protection that goes far beyond traditional storage systems. With N+1 protection, data is 100 percent available even if a single drive or a complete node fails, which is comparable to RAID 5 in conventional storage. You can also deploy Isilon for N+ 2 protections, which allows two components to fail within the system, similar to RAID 6; N+3 or N+ 4 protections, where three or four components can fail, keeping the data 100 percent available. Isilon is able to recover from hardware failures faster than traditional systems as recovery entails rebuilding lost data as opposed to the entire disk. In a data Lake, this can have a huge impact on the operation as storage is shared across systems with varying levels of performance demands. A truly distributed storage like Isilon can orchestrate all the nodes to participate in the restoration or recovery of data from the outages on a dedicated backend infiniband network speeding up recovery without impacting front end performance. As organizations view data as an asset, securing data for inherent value is even more important than meeting regulatory compliance and corporate governance requirements. Isilon helps organization address security needs by providing robust and flexible security features outlined below as described in the figure below.
  • 12. 12 Figure 8. Authentication zones • Secure role separation enables roles-based access control (RBAC), where a clear separation between storage administration and file system access is enforced. • Authentication zones that serve as secure, isolated storage pools for insulating departments within the organization from access to data that they are not supposed to see. For example: legal, financial, PII and employee data can be seen by only authorized employees in the authentication zone. • Write once-read many (WORM) protection is achieved by using SmartLock software, which prevents accidental or malicious alterations or deletion of data- a key requirement for governance and compliance. • Data at rest encryptions through the use of Self-Encrypting drives enables organizations to ensure physical loss of hardware does not equate to data leak. Faster time to insights By utilizing a shared infrastructure, you can consolidate data from multiple islands into one single storage system based on Isilon. This is made possible through he multiple access mechanisms at the protocol level where data can be ingested into the Isilon store from a wide variety of sources and surfaced for use by others. This would eliminate the need for data migration or extraction- translation-loading (ETL) operations typical with any data analytics solution, saving you precious time and resources. You can then run Hadoop analytics with your dataset in-place. As depicted in the figure below. By using a large loosely coupled shared scalable storage on a typical dataset of around 100 terabytes, you can save over 24 hours of data moving time on a 10 GBps network; time that you can use to actually generate insights from your data. Isilon is the only scale-out NAS that works with multiple distributions of Hadoop from a variety of vendors. This enables you to try out tools from all of these vendors at the same time if necessary to find the best solution to meet your business requirements. The data lake is the key enabler for driving business value into customer environments through the multiprotocol, multi-access, tiered, single namespace, protected and scalable data repository. Leveraging the scale-out data lake, customers can consolidate multiple disparate islands of storage into a single cohesive and unified data repository that is easier to manage and more cost- effective. Figure 9. Faster insights
  • 13. 13 STORAGE AND DATA SERVICES A scale-out data lake leverages the industry leading enterprise-grade storage and data services that extend the business value of your data. Isilon Infrastructure Software provides powerful storage management software that helps protect your data assets, control costs, and optimize the storage resources and system performance of your scale-out data lake. The data management capabilities include EMC Isilon InsightIQ® , MobileIQ, SmartDedupe, SmartPools® , and SmartQuotas™, which together help you improve the ROI of your data lake infrastructure. For more information on these data management services, review the Isilon OneFS whitepaper available here. The scale-out data lake is further strengthened by proven data protection capabilities provided by EMC Isilon SmartConnect™, SmartLock® , SnapshotIQ™, and SyncIQ® . For more information on these data protection services, review the Isilon data availability whitepaper available here. CONSUMPTION MODELS Isilon provides a number of consumption models that enables you to choose a strategy that is best for your business. The simplest and most common strategy is an appliance that is a preinstalled combination of hardware and software. You can choose a converged infrastructure solution in conjunction with EMC Vblock® . Or, you may choose a cloud-based utility infrastructure-as-a-service; in which you pay for the service based on usage. The key is that you have choices in the storage purchasing model and flexibility to fit your business procurement needs. CONCLUSION Given that unstructured data will be doubling every two years, enterprises need higher efficiencies, architectural simplicity and more protection as they scale capacity and capabilities. A scale-out data lake provides key capabiities to eliminate silos of data; secure and protect information assets; support existing and next generation workloads while speeding time to insights. Starting with a scale-out data lake, organizations can 1. Invest in the infrastructure today to get started 2. Realize the value of data, store, process, and analyze it- in the most cost effective manner; and 3. Grow capabilities as needs grow in the future. This enables organizations to store everything, analyze anything and build a solution with the best ROI. By de-coupling storage from analysis and application, organizations gain flexibility to choose between a larger number of strategies to deploy solutions without risking data loss, cost overruns and dataset leaks. The data lake based on Isilon offers organizations this along with a capability to simplify the IT infrastructure, tier, secure and protect data efficiently; and get to insights faster. As a large dataset is the reason big data exists, organizations can start with the data. Understand the value locked in their large and growing datasets by running pilots; not worry about ingesting or surfacing and; pilot applications or emerging technologies- to make better informed strategic decisions. REFERENCES EMC Digital Universe Study—with Research and Analysis by IDC (http://www.emc.com/leadership/digital-universe/index.htm) EMC Isilon OneFS Operating System http://www.emc.com/collateral/hardware/white-papers/h8202-isilon-onefs-wp.pdf High Availability and Data Protection with EMC Isilon scale-out NAS http://www.emc.com/collateral/hardware/white-papers/h10588- isilon-data-availability-protection-wp.pdf ABOUT EMC EMC Corporation is a global leader in enabling businesses and service providers to transform their operations and deliver IT as a service. Fundamental to this transformation is cloud computing. Through innovative products and services, EMC accelerates the journey to cloud computing, helping IT departments to store, manage, protect, and analyze their most valuable asset—information— in a more agile, trusted, and cost-efficient way. Additional information about EMC can be found at www.EMC.com.