Traditionally, applications have dictated a set of requirements that directly related to a set of choices on infrastructure technologies. Each application type had it’s own vertical stack of CPU and operating system type, of storage pool, of networking and security, and even management systems. As storage has evolved through the past few decades, it has adequately met the requirements of new application workloads by providing arrays with distinctive characteristics. This approach has lead to the development of a diverse set of powerful storage arrays, each with its own unique value. While needs have been met, it has created massive complexity for storage administrators. The datacenter environment just becomes more and more divergent, leading to increased complexity, driving the need for more and more resources to manage and keep the infrastructure up-to-date, which ultimately drives a significant amount – as much as 70% - of operating expenses. This picture is just not sustainable over time and puts IT further and further into the hole, turning storage administrators into storage managers who spend most of their time simply managing the storage arrays instead of optimizing information storage for their business.
The software-defined-data center aims to break down data center silos. The SDDC is a concept being driven by VMWare and compute virtualization. But the value of a SDDC is broader than simply compute/server virtualization. The SDDC abstractsthe functionality of all the hardware components – compute, networking and storage and provides the ability to programmatically drive your data center – to truly realize end to end automation through your entire data center. The less human interaction – the lower the costs and the less room there is for error.
Yet storage virtualization has lagged behind. If you take a look at Enterprise data centers today, compute is the most mature. Analysts estimate that enterprises have virtualized between 30-75% of their compute infrastructure. Networking virtualization in the form of VLANs has existed for some time but is just starting to accelerate with new sophisticated network virtualization solutions, such as Nicera. Storage has lagged behind with a mere 5-10% of storage infrastructure virtualized . This is due in part to the fact that unlike network and compute which focus on configuration and resource – storage laden with data is inherently heavy. And storage lacks a set of clearly defined protocols and lack standardization across tools. As a result, Storage evolved into a heterogeneous environment, where, application workloads are tied to unique storage and array types. All of which have prevented storage virtualization from evolving as fast as network and compute.To realize the full value of the SDDC,compute, network and storage – must be virtualized. .
This is because the world around is changing at record speed: It’s a data-centric world. Data is outpacing the staff to manage it and data must be stored for longer periods of time. Mobile devices are the preferred access medium, and provide instant access to cloud applications, which are the major contributor to data growth. In fact, 75% of data is end-user generated, 90% of which is unstructured, and 80% is managed by IT at some point in its lifecycle. Users demand instant Web access and Developers demand cloud-scale and speed from IT. If they don’t get it, they’ll go outside of IT for easy to access swipe and go resources. As a result, Storage is at the Center of an IT transformation, and enterprises must rethink how storage is delivered and managed in order to store massive amounts of content, continue to provide access via traditional methods (on the LAN via CIFS/NFS) and provide support for new Web mobile and cloud methods (http, REST) – and ensure it can evolve, adapt and respond to new workloads on demand.
That’s where a Software-Defined Storage solution would fit it. A solution which is completely based on software, and which automates the storage provisioning and management. A solution that transform existing heterogeneous physical storage into a simple, extensible, and open virtual storage platform. SimpleStorage virtualizationAutomate storage workflowsCentralize managementExtensibleSupports multivendor storage arraysIntegrate with cloud stacks Open Standard APIs Global data servicesOpen community
Companies look for a solution to manage their heterogeneous storage environments, even from different vendors, and commodity storage like Amazon S3 or even dropbox. Now, what’ would be new compared to other approaches in the market? The control plane (what manages the storage arrays) need to be decoupled from the data service plane (which manages the data) . Why? This ensures, the platform does not sit in the data path for file and block stores, and all applications can directly access storage and all its underlying value and data services embedded in the storage arrays. That’s especially important for low latency application. By abstracting the control path, storage management can operates at the virtual layer, which gives customers the ability to partition a storage pool into virtual storage arrays.This is analogous to partitioning a server into a number of virtual machines. Control path data services provide multi-tenancy, service cataloging, Metering and Monitoring across all arrays. It also enables administrators to centralize data provisioning and data management tasks, and allow any applications to access file and block data.
The Software-Defined Storage solution should be simple:It should help to Simplify Storage Management and Delivery through storage virtualization, automation, and centralized management.
The controller plane should be separated from the data plane in order not to impact applications with low latency. The functions of the controller are: Recognizes all arraysVirtualizes and Configures your storage,Automates storage tasks and delivers through a self-service catalog and centralizes management across physical and virtual environmentsLet take a look at how in 3 easy steps, you can deliver a fully automated, self-service storage model, across arrays.
Discovering and registering arrays is the first of 3 easy steps to virtualize, automate and centralize storage. The storage administrator defines the storage environment that it wants the virtualization platform to manage. They point to storage arrays, SAN switches, and data Protection devices. The Controller discovers and abstracts physical storage arrays with all their unique capabilities into a single pool of virtual storage. That steps should only need to be done at the beginning or at anytime the administrator wants to add or change the configuration. All of the functions exposed through the portal should be accessed through a REST APis.
The next two steps are to Define Virtual Storage Arrays and Configure Virtual Storage Pools. In step 2: the Storage administrators creates virtual storage arrays,managed at the virtual layer according to automated policies. This approach is very similar to how server administrators can create many virtual machines with unique characteristics from one or more physical servers. Remember these are abstract arrays where a Virtual storage array can span multiple physical arrays. In final step: the Storage Administrator configures Virtual Storage Pools. Virtual Storage Pools represent sets of storage capabilities required by unique application workloads. Rather than provisioning capacity on storage arrays, storage administrators give users the ability to subscribe to Virtual Storage Pools that meet their unique requirements.
Quick delivery of requested storage capacity will be provided to users and tenants via a service catalog.Users subscribe to a Virtual Storage Pool that meets their workload’s demands. The user does not need to know or care about the underlying hardware and software that is providing the data service to their application. For example, a transactional workload would subscribe to a Virtual Storage Pool that features the characteristics of a high-performance block store. A cloud application such as online file and content sharing would subscribe to a Virtual Storage Pool that features the characteristics of a distributed object or file-based storage cloud. When the customer selects that Virtual Storage Pool, the virtualization platform automatically provisions the right hardware and software to meet that need.The result is, No more provisioning. Storage would be instantly available – on demand. This helps storage administrators minimize user-IT interactions, automate the process of identifying available storage capacity, and better map an application workload’s requirements to the right combination of software and hardware storage resources - instead of the repetitive, time consuming, labor intensive task of provisioning storage manually.For enterprise IT – such a solution delivers a huge advantage. They can now provide access to storage in less time than it takes them to go to an external cloud vendor – all while utilizing their existing storage infrastructure.
What makes this so compelling is the significant time savings. Such a solution enables you to simplify storage management delivery by virtualizing storage and automating storage requests to make storage more responsive to the changing needs of the business. Making storage resources available instantly - provisioning tasks that typically take 29 days to complete are completed instantly. With such a storage virtualization solution, developers select the class of storage thru storage service catalogs to rapidly build and deploy apps, and centralize management. This ensures that storage services are well-defined and ensures service levels required by the applications and users are met - all while removing error prone manual processes - IT monitors and meters storage usage and charges for only what is being used.
Such a solution needs to be an open cloud platform that is completely extensible. AnOpen Architecture Provides Choice,enables customers and partners to integrate all sorts of storage arrays, Cloud stacks and Vmware, data services and more.
It must be ensure such a solution is an open cloud platform in which the hardware is abstracted, so it’s easy to write connectors or code that understands the underlying arrays and exposes them to the virtualization platform. Ideally, the interface specification will be published through the supplier. So, any 3rd parties can easily write adaptors.
All data and resources managed by the virtualization platform should be accessible via the open API, which can then integrate with Vmware and other cloud environments such as OpenStack, or Microsoft. That means, the the storage layer could be another programmatic virtual resource in a SDDC. An organization could easily integrate the virtualization platform into their existing data center operations. It can provide, for example, specific VMware integration with interfaces into the VMware vStorage API for Storage Awareness (VASA), vCenter Orchestrator and vCenter Operations. For example, a vCenter administrator has end-to-end visibility from the virtual machine to physical storage.
All data and resources managed by the virtualization platform should be accessible via the open API, which can then integrate with Vmware and other cloud environments such as OpenStack, or Microsoft. That means, the the storage layer could be another programmatic virtual resource in a SDDC. An organization could easily integrate the virtualization platform into their existing data center operations. It can provide, for example, specific VMware integration with interfaces into the VMware vStorage API for Storage Awareness (VASA), vCenter Orchestrator and vCenter Operations. For example, a vCenter administrator has end-to-end visibility from the virtual machine to physical storage.
Open access to a set of de-facto industry standard APIs including Amazon S3, OpenStack Swift, etc. are essential for such a virtualization solution. The fosters and open development community for delivery of global data and automation services. This would be a great benefit not only for the enterprise, but for the many startups because it gives them an open underlying platform to leverage – so can extend and build cool new data services on top this and gives them ability to gain the value of the data services as well. And, when you can Write Once, Run Everywhere using data services – you gain ultimate freedom, flexibility – and most importantly choice.
Standard interfaces, such as RESTful API, for both the data path and the control path provide the essence of choice we discussed earlier. With such broad and open API support, any API-driven storage requirement from private, public or hybrid clouds can be handled. And, it enables you to use your own referred management tool of choice. Developers can write applications to multiple cloud APIs and execute those workloads on the virtualization platform in an enterprise data center or a service provider’s cloud.
What’s now meant with “software-defined storage”? Block, file and object storage are DEFINED in software as global data services. Specifically, those File and block data services at the data plane provide all the functionality of physical block and file storage arrays. Block and file data services allows users to manage block volume, NFS file systems or CIFS shares, and provide advanced protection services such as snapshots, cloning, and replication. Block and file data services offer full storage functionality as if the user were accessing a physical array. And since file and block services don’t operate in the data path, users retain and can leverage all the unique attributes of the underlying block and file arrays, yet also get all the benefits of centralized provisioning, management, reporting, self-service access etc. But applications access file and block data directly. So, all block and file data services should deliver operational simplicity and maintainall the advanced features of the arrays such as mirrors, clones, snapshots, and multi-site high-availability, and replication. Like Object, any storage service can become a Global Data Service.
Today, cloud based apps are fundamentally different and built in a completely different way. Rather than being written at a very low level, they are increasingly written using new frameworks and require a cloud scale architecture to manage the volume and demands of these applications. This is essentially a higher level paradigm, one that focuses on simplicity and component reuse. In the Java world, more than 50% of ALL the applications running today are written with Spring. But it is not limited to Java: emerging languages are all based on frameworks – Rails for Ruby, Node.js for Java Script, Grails, etc. The development process for these kinds of apps is also very different: rather than being 9 month development cycles, they tend to have rapid iterations allowed by this new paradigm: they are developed, tweaked, deployed, tweaked, deployed, etc. This has implications for the kinds of technologies being used. Web, mobile and cloud applications written with these new frameworks only care about byte streams and metadata. The file system construct is overkill and a poor architectural fit. Object-based storage accessible via a REST API is ideally suited to these new applications. Together, these trends are driving a real transition to API-driven storage.
Block and file storage functionality are basic data services. But additional data services, that can span across heterogeneous arrays, can be incorporated. These global data services extend additional storage functionality to the underlying arrays. For instance, an Object Data Service could provides the abilityto store, access and manipulate unstructured data (e.g. images, video, audio, online documents) as objects on file-based storage without having to rewrite or rework existing file-based applications. Such anObjectData Service is a software layer that works transparently with different hardware platforms. I.e., the virtualization platform accommodates short development cycles of new developed applications. By building Object as a Data service, all the underlying file arrays will have the ability to store objects and access them as files or objects. As an example, an enterprise can ingest objects from a REST-based cloud application directly into a traditional or scale-out NAS filer. A file-based application written to that file system can then access and manipulate those objects as files and save them as objects. As a result, the enterprise can access the same data from a REST-based application and a file-based application without having to move or copy the data or recode applications. The object data service provides a different semantic view of the same data. The application owner toggles between “object mode” and “file mode”. When in object mode, access is through the virtualization platform (means it is in the data path) and has all the capabilities, performance and qualities of object. When in file mode, the file-based application accesses the data directly (means the virtualization platformn is not in the data path in file mode). Enterprises get the flexibility and simplicity of object-base storage and REST-based access while maintaining all the enterprise-class features of a traditional or scale-out NAS – replication, snapshots, etc. And, critically, they don’t have to move data or recode applications. Another scenario would be HDFS, which is becoming increasingly popular as a file system layer for distributed applications, beyond Hadoop. This allows customers to scale analytics beyond appliances. Today, to do analytics on data – it is necessary to copy the data to an Hadoop appliance. The issue is that data is heavy and you want your data where your compute is. Same with analytics – want data close to analytics engine. The problem sometimes is getting data over to appliance. With such an virtualization platform , an HDFS data service set up on those arrays, and in-place analytics can be done across the environment. So, the processing is done on the worker node where the data resides without unnecessarily traversing the network and thereby reducing backbone traffic. This opens up a huge opportunity in the Big Data place by providing hyper scale analytics across heterogeneous platforms within existing environments.
Software-Defined Storage moves the needle in the right direction for true storage virtualization.
EMC can help you lead your transformation.
That’s why we developed EMC ViPR, our Software-Defined Storage……