Welcome to distributing data the Aerospike way. This is an introductory presentation on the basics of how data can be parceled out in a distributed database. There are many ways to do this and we will be covering the plusses and minuses are for each.This is a pretty dry concept, so to make it a LITTLE more entertaining, we have an alternative, less dry title to this. Which is…
You may ask: “So why this alternate title?” As you take a look at the picture there may not seem to be much similarity between the two. But this is a situation most, if not all, of us have been through and there are some real similarities.
So let’s think about how you might do this on a small scale. Assume there are only 20 registrants.You would have a single table and give each person their registration package. This includes information specific to him/her. They will have their own class schedule and of course name tag.So what happens as the conference attendance grows. If you reach the point were there are thousand of registrants? How do you scale up?
We have all been in lines and seen the impact that the system uses. One way is to simply scale up the hardware on your single server. Another is to distribute the load. There are a number of ways to do this.Let’s start by taking a look at vertical scaling.
Vertical scaling has the virtue of being relatively simple. By making full use of the best hardware, you can handle more transactions.Because you are still dealing with a single server, you don’t have any problems with questions like:How do you distribute and map the data to different registration desks?What happens if two registrants check in at different booths?However, these benefits only work to some level. After that the minuses of this strategy become an issue.
Generally, upgrading a single server is simple. But the added costs in hardware can be considerable. A 64 core server is not cheap. Everyone does what they can to keep a monolithic database up. This includesRedundant power suppliesSometimes independent power sourcesRAID diskRedundant network connectionsBut even with this, servers may still go down. In some cases this is intentional, like for maintenance and upgrades. How many of up have tried to pay a bill online on Saturday night and found that the database is down?Even with all this, you may find that the business requirements still go beyond what you can do on a single server.So let’s look at what horizontal scaling offers.
Horizontal scaling means that the load and data will be distributed among many servers.There are several different strategies for this, along with different spins on each. If you are thinking about using a distributed database, there are a series of things you should look for.
Horizontal scaling can provide many benefits. Let’s take a look at some of the major features.This might seem odd, but first, you want features that prevent you from having to think about having a distributed database.
Simple sharding is based on distributing data according to simple patterns, like grouping people by last name.Hashed sharding is similar, but uses a computer algorithm to hash the key.Master-slave means that there is a master that helps determine what data goes where. The master controls how the data is distributed and must be consulted when accessing the data again.
The cluster
Horizontal scaling can provide many benefits. Let’s take a look at some of the major features.This might seem odd, but first, you want features that prevent you from having to think about having a distributed database.
How do have long running task (data re-balancing and back-up) and also doing short term transaction – do not starve – real time prioritization – while still keep short term with SLAs.Shared nothing:Allows upgrading of software in a rolling way (releases are backwardly compatible)Robust way of building cluster db systemsEach of our cluster notes is identicalBunch of nodes and bunch of clients.Even though node is identical – received/send data rebalancing, needs to second or third copy of data, as you add nodes will share proportional part of the dataMakes systems operations much simpler.Clusters are tightly coupled – nodes are close to each other - millisecondClusters are self configuring.Multicast (cloud we use mesh).Everything is synchronus within a cluster.Remote clusters are asynchronous.