IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
The International Journal of Engineering and Science
1. The International Journal of Engineering
And Science (IJES)
||Volume|| 1 ||Issue|| 1 ||Pages|| 13-17 ||2012||
ISSN: 2319 – 1813 ISBN: 2319 – 1805
Research of elastic management strategy for cloud storage
1
SHAO Bi-lin, 2BIAN Gen-qing, 3ZHU Xu-dong
1, First Author
School of Management, Xi’an University of Architecture and Technology,
*2, Corresponding Author
School of Information and Control Engineering, Xi’an University of Architecture and
Technology,
3
Schools of Information and Control Engineering, Xi’an University of Architecture and Technology ,
-------------------------------------------------------------Abstract-------------------------------------------------------------
In order to sol ve those issues including limited storage capacity, high cost of storage and fault recovery in
traditional HDFS , cloud storage can effectively solve these problems by using virtual resources of the IaaS based on
HDFS . But it cannot assure cloud storage to utilize virtual resources more effectively. In order to solve these
problems, the paper proposes a elastic cloud storage framework based on HDFS , and introduces the thought of
virtual resources management into the framework, and proposes a elastic management strategy of virtual resource
allocation and scheduling based on this framework. The simulation experiment shows cloud storage can effectively
improve the efficiency in the use of virtual resources.
Keywords: HDFS; Elastic Cloud Storage; Virtual Resource; Resource Allocation and Scheduling; feedback control theory.
-------------------------------------------------------------------------------------------------------------------------
Date of Submission: 19, October, 2012 Date of Publication: 10, November 2012
------------------------------------------------------------------------------------------------------------------------
I Introduce we should be even more concerned about the
With the development of Web2.0 technology, actual efficiency of cloud storage to the virtual
especially the popularity of social network and resources. Therefore, through introducing the
thought of virtual resource management, this article
popular, the traditional storage technologies
(network Storage or distributed storage) are proposes a kind of elastic storage architecture and
difficult to meet the demand of their vast amounts elastic management strategy of virtual resources.
The simu lation results show that cloud storage can
of unstructured data storage [1]. So more and more
commercial websites began to use cloud storage effectively improve the efficiency in the use of
architecture based on HDFS, such as Google, virtual resources.
Amaze, Facebook, etc., the development is
gathering mo mentu m. Design ideas of HDFS -based II HDFS on El astic Cloud Storage
cloud storage derived fro m the virtual cluster The paper makes the virtual resource scheduling
computing, which is expanded fro m a single v irtual
storage servers to a virtual storage server cluster units as Slot. Slot represents the characterizat ion of
(ie: Hadoop virtual cluster). Hadoop virtual cluster virtual resources calculation ability, wh ich is
can be comb ined with mu ltiple virtual storage
decided by the virtual resource CPU and memory
servers, just as a storage server which has a great
computing power and high-throughput provides size. Set CPU as the unit CPU capacity, taking
users with a transparent access interface [2]. 1GHz; mem on behalf of the unit memo ry capacity,
However, the existing cloud storage has certain
limitat ions in study and implementation, which taking 1GB; a virtual server Pi is able to provide
concerned for high-throughput and high reliability a number of Slot
excessively, and ignored the cloud storage’s actual
cpui memi
use rate on virtual resource, thus it seriously affects Sloti min , (1)
the cloud storage application and popularization. cpu mem
Cloud storage for load-handling capability and
disturbance switching capability is important, but
www.thei jes.com The IJES Page 13
2. Research of elastic management strategy for cloud storage
dynamically start, stop, move a service instance,
Among them, says round down, cpui is the
equivalent dynamic start, stop, migration of a
CPU size of virtual server Pi , memi is the memory HDFS data node; secondly, through the virtual
size of virtual server Pi . In a mo ment t , HDFS resource management platform based on the
loads is loadt , the nu mber of virtual server is N , virtual resource management capabilit ies for the
then available virtual resources: HDFS data node provides a flexib le v irtual
N
resource allocation strategy, dynamic adjustment
Slottotal Sloti (2)
i 1 of Hadoop virtual cluster preformed resource N
And HDFS virtual resource use rate: size, and allows mult iple HDFS shared resources;
loadt loadt at the same time, through the encapsulation of a
u
Slottotal N
cpui memi
min cpu
virtual resource scheduling service, real-t ime
memtotal
,
i 1 total monitoring of HDFS and virtual resource load
(3) informat ion, and according to the load informat ion
If the HDFS v irtual resource usage in an expected and the elastic scheduling strategy on data node,
high and low water level, it is judged as effective HDFS elastically scheduling. Thus achieved
resource. The problem to be solved is dynamic without affecting the existing HDFS structure and
programming R, such that when a certain amount function ,to solve the resource waste and the
problem of resource sharing, making use of many
of loadt . 0 ulow u uhigh 1 . Therefore, to
existing virtual resource management platform to
make the HDFS schedule resource elastically achieve them is proposed, such as OpenNebula [3][4],
according to the real-time load state and resource Platform[5]. in order to simulation test, virtual
use rate, it must be introduced the virtual resource resource management conduct ISF of Platform
flexib ility management strategy. Flexible will be used.
Application Application Application
management strategy should include: flexible Program Program Program
HDFS Client HDFS Client HDFS Client
resource allocation strategy and flexib le resource
scheduling strategy. But if the new resource Internet/Intranet
management component, or resource scheduling Elastic scheduling Hadoop Cluster Hadoop Cluster
service resources Service(HDFS1) Service(HDFS2)
strategy is added to the existing HDFS, it will not
Vitual Resource Management Platform(VM Orchestra)
only increase the complexity of HDFS, but also
instance
instance
instance
instance
instance
instance
instance
instance
instance
instance
Service
Service
Service
Service
Service
Service
Service
Service
Service
Service
increase the HDFS manager addit ional cost, and
increase the load, thus affect the overall HDFS VM VM VM VM VM VM VM VM VM VM
Hadoop-based elastic cloud storage
load capacity and stability. Hypervisor(XEN) Hypervisor(XEN)
Amazon EC2
This paper introduces the virtual resource ...
Physical resource pool
management platform, and puts forward Elastic
cloud storage system based on HDFS. First of all, Fig 1. Elastic Cloud Storage based on HDFS
the HDFS cluster is encapsulated into a virtual Shown in Figure 1, a elastic cloud storage
resource management platform as a service, each based on HDFS is made up of four parts : a virtual
service instance represents a HDFS data node. The resource, virtual resource management platform
use of virtual resource management plat form for (VM Orchestra), Hadoop virtual cluster service,
service instance lifetime management to realize Hadoop virtual cluster service flexib le scheduling.
virtual resource management platfo rm to the Virtual resource is the carrier of Each HDFS
HDFS data node lifetime management, i.e., by data nodes, managed by the virtual resource
www.thei jes.com The IJES Page 14
3. Research of elastic management strategy for cloud storage
management platform. Virtual resource Elastic cloud storage virtual resource
management platform can be deployed to run a allocation strategy, consists of reservation strategy,
range of services, the p latform can assign a virtual sharing strategy, borrowing and lending strategy,
resource and start one or more service instance for recycling strategies. Reservation and sharing
each service configuration. strategy provides a group of static exclusive and
Hadoop virtual cluster service, namely HVCS. shared resources belonging to HCS. Reservation
Each HVCS service represents a particular HDFS policy allows directly to HCS allocation reserved
Hadoop cluster, consisting of a configuration and resources, regardless of the HCS can be used or
one or more instances. In this paper, each instance not be used, even if HCS load is small; and
of HVCS represents a HDFS data node, the sharing strategy provides shared resources for
configuration of HVCS specifies the required HCS, including 2 parts: shared resource belonging
informat ion when HDFS runs, and a group of to HCS and overload using shared resource.
flexib le resource allocation strategy, namely Shared resource does not immed iately distribute
HDFS flexib le resource allocation strategy of data HCS resource, HCS can use resource according to
node. the configuration of the sharing proportion only
Hadoop virtual cluster elastic scheduling HCS has more demand, wh ich requires more
service, namely HVRS. A data node is increased, virtual nodes, and reserve resources has been
removed or transferred fro m the HDFS cluster by exhausted; when reserved resources and sharing
HVRS through the current HDFS resources resources still cannot meet the demand of HCS
informat ion and the load informat ion, HDFS data virtual node, sharing strategy allows HCS to use
node lifet ime control and service scheduling other HCS free sharing resources. Borrowing and
interface and the provisions of flexibility principle lending strategy are used for HCS to reserve
to perform HDFS data node elastic scheduling. resources sharing, this strategy will take effect
HVCS resource usage was obtained by the only when shared resource has overload. By
interface wh ich is provided by the virtual resource borrowing, lending the idle reserved resource,
management platform, and the HDFS load HCS can effectively use the free reserved
informat ion is provided by HDFS. resources in the Hadoop cluster, which effectively
improves the environment resources use rate.
III Virtual Resource Elastic Management Recycling strategy used to recycle HCS idle
Strategy of Elastic Cloud Storage resources which is required sharing or borrowing
Virtual resource elastic management strategy out, support HCS in its spare time, and because of
of elastic cloud storage includes 2 parts: flexible increasing load need more virtual node, the others
resource allocation strategy and flexib le resource can use the resources which belongs to HCS. The
scheduling strategy. Flexib le resource allocation research results found that, after introducing the
strategy, used to specify a set of the v irtual virtual resource allocation strategy, compared with
resources for HDFS to ensure the elastic the conventional HDFS fixed using a group of the
scheduling object available; the elastic scheduling allocation of resources, cloud storage on HDFS
strategy, elastically schedules HDFS data node available resources are dynamic regulated
through the usage of virtual resource, so as to according to load state.
effectively use the virtual resource. 3.2 Virtual Resource El astic Scheduling
IV Flexi ble Allocati on Strateg y on Virtual Strategy
Resource
www.thei jes.com The IJES Page 15
4. Research of elastic management strategy for cloud storage
For every HDFS, the data node elastic
Slottotal , i.e.:
scheduling strategy is composed of a three tuple
structure P(S, A, R), wherein S is a sample space
L(t ) s1L(t 1) r1Slottotal (t 1) r2 Slottotal (t 2)
(Samp ling), A for the action space of
implementation (Actions), R (Rule) represents (5)
scheduling rules.
The model parameters s1 , r1 and r2 , which
S represents sampling space, wh ich consists of
the virtual node load information {u(l (hi ))}iN 1 ,
can use regression minimu m variance (Recursive
wherein i represents a virtual node, hi represents Least-Squares, RLS) method assessment. In order
the virtual node load capacity, l (hi ) represents a to realize the flexib le control based on this model,
virtual node load, u (l (hi )) represents virtual node the equation (5) is transformed the control
resource consumption. Then equation (6):
S {S (hi ) |{u(l (hi )) |1 i N}} (4)
A said elastic scheduling execution action 1 s r
Slottotal (t ) Lref 1 L(t ) 2 Slottotal (t 1)
r1 r1 r1
set, A ( Aover , Awasted ) , wherein Aover represents
(6)
for resource use overload handling action,
Where in Lref L(t 1) , namely : the next
Awasted represents waste treatment action.
By (3) type R knowledge, scheduling rules are time step expected HDFS load information.
defined as follo ws: The scheduling rule R is introduced to the
* if the current HDFS in the resources control equation (6), wh ich can get elastic control
equation (7):
utilizat ion rate u h igher than uhigh , judged as
1 s1 r2
r Lhigh r L(t ) r Slottotal (t 1) if Lhigh L(t )
resource usage overload, HCRS will perform the 1 1 1
1 s1 r2
action Aover on the HDFS data node overload Slottotal (t ) Llow L(t ) Slottotal (t 1) if Llow L(t )
r1 r1 r1
operation, namely: if u uhigh , then Aover . Slottotal (t 1) otherwise
* if the current HDFS resource using rate u
Namely: the HSC can work only when the
is lower than ulow , it is judged to exist HDFS
Lhigh L(t ) (h igher than supply) or
waste of resources, HCRS will perform Awasted ,
HDFS node waste processing, namely : if u ulow , Llow L(t ) (less supply), and set new Slottotal (t ) ,
then Awasted . namely the executive Aover or Awasted .
The elastic scheduling is the process of V Simulati on test
tracking HDFS load in formation {l (hi )}iN 1 , based
Virtual resource elastic management
on the scheduling rules R, it schedules and strategies are tested, specific methods are as
controls timely HDFS's Slottotal , and realization follows: v irtual resource management platform:
of virtual resource is dispatching. In order to using virtual resource management product ISF of
realize virtual resource automated flexible Platform , including 8 adjustable virtual resource
scheduling, introduced into the classical feedback Slots; and HDFS1 cluster contains 2 slots reserved
[6][7]
control theory , in modeling HDFS, the resources and 1 slots of shared resources; at the
relations of load informat ion L {l (hi )}iN 1 and
same time HDFS2 cluster contains 1 slots resource
www.thei jes.com The IJES Page 16
5. Research of elastic management strategy for cloud storage
reservation and 1 slots of shared resources; load difficult ies to share resources among mult iple
sampling through the use of CPU rate HDFS, and the problems and difficulties are
caused by the lack of flexib le resource
measurement, high water level uhigh and low
management pattern. Against this problem, this
water level ulow is set to 85%, 40%; Aover to paper proposes elastic cloud storage framewo rk
increase a slots, Awasted to remove a slots; HDFS based on the HDFS, and puts forward HDFS data
testing tools as HDFS Bench. Testing server test node flexible resource allocation strategy and
tool over a period of time, respectively for 2 elastic scheduling strategy according to its
HDFS cluster time tested, through the observation characteristic. The simulat ion experiment shows
of 2 virtual cluster with load change for virtual that elastic management strategy can improve
resources (slots) usage. The test results as shown HDFS existing problems, and effectively imp rove
in figure 2. the efficiency in the use of virtual resource.
Acknowledg ment
This research is supported by National Natural
Science Foundation of China (61073196&61272458)
and Natural Science Research Foundation of Shanxi
Province. China (2011JM 8026).
Reference
[1] M. Armbrust, A. Fox, D. A. Patterson, N. Lanham, B.
Trushkowsky,J.Trutna, and H. Oh. Scads:
Fig 2. Simu lation Result Scale-independent storage for social computing
The results show that, when HDFS1 loading applications. In Proc. of CIDR, 2009.
amount is small, it uses only one slots, while the [2] S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google
remainder of the resource is the part to high load file system. In Proc. of SOSP, 2003.Volume 37 Issue 5,
HDFS2 ;with the increasing of HDFS1 loading December 2003, Pages: 29-43.
amount, shared slots used by HDFS2 and [3] Open Nebula. http://opennebula.org.
borrowing slots will gradually returned to the [4] B. Sotomayor, R. Montero, I. Llorente, and I. Foster.
HDFS1, then their maximu m load using slots; Capacity Leasing in Cloud Systems using the Open
With the HDFS2 loading amount gradually Nebula Engine. In Workshop on Cloud Computing and its
decreased, HDFS2 gradually reduced the use of Applications (CCA08),2009.
slots, thus released slots are gradually used by the [5] Platform: http://www.platform.com/
full load of HDFS1; through the use of slots [6] X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, P.
sharing and borrowing fro m HDFS2, wh ich Padala, and K. Shin. What does control theory bring to
gradually reduces load. Along with its load systems research? SIGOPS Operating Systems Review,
reduction, HDFS2 gradually reduces the use of 2009, 43(1):62-69.
slots, and tends to smooth. [7] H. C. Lim,S. Babu, J. S. Chase, and S. S. Parekh.
VI Conclusion Automated control in cloud computing: Challenges and
Through studying of the existing HDFS opportunities. ACDC '09 Proceedings of the 1st workshop
constructing methods and the data node on Automated control for dat acenters and clouds. Volume:
controlling mode, what makes the utilizat ion rate C, Issue: 09, Publisher: ACM, Pages: 13-18.
of HDFS for the system res ource low is the
problems that the waste of resource usage and the
www.thei jes.com The IJES Page 17