How FinTech Innovator Razorpay Uses Open-Source Tracing And Observability to ...
A plumber's guide to SaaS
1. A Plumber's Guide To SaaS (and multi-tenanting, and the Cloud)
[Author : Kinshuk Adhikary - software plumber.
LinkedIn : http://in.linkedin.com/in/kinshuka
Blog : http://me-plumber.blogspot.com/
Email: kinshuk-in@yahoo.com ]
Disclaimer # 1 : This write-up is arbitrarily lengthy. I have added whenever I wanted to.
Disorganized too, because I have jumped zillions of light-years between topics. That is the
way I am made, plus it is not easy to compress 5-6 years of experience into neat topics.
Disclaimer # 2 : Sometimes I go very "business" and sometimes very "techie". I can't help
it. I am a great believer in mixing up those two.
Introduction :
Quite recently, I asked a question on a SaaS forum "what the hell is SaaS ?". Some said it
was any web application with a log-in, same as ASP. Really ?!! Some said it had to be
multi-tenanted if it was SaaS. I will go with the second one for now, it serves as a basis.
About 7-8 years ago, deep into a multi-tenanted web based collaboration platform (we
didn't call it SaaS back then), I had an interesting conversation with an architect (a proper
architect, not the mushrooming fakes you find these days).
"What is all this multi-tenanting rubbish ?" he asked. A database is a logical space, not a
physical thing. It is anyway bifurcated at a "root" level, so why bifurcate it further at the
level of "organization" ? Why share a single database between many subscribing
organizations ?. They all have the same structure. The web application is the same. Why
not have a 100 instances of the application, and a 100 instances of the database ? After
all, a single script and a single deployment engineer can "manage" all of them, there is no
extra maintenance cost involved. In fact, the bigger headache is in trying to "box" all these
100 organizations into the same logical space.
Multi-tenant SaaS applications, a few pictures: (Non-tech people, please bear with me.
Could be important later on).
Without multi-tenanting :
Hosting company, infrastructure provider
Org1 Org2
Scripts that DB instance DB instance (same structure)
manage 1
app and 1 App instance App instance (same code version)
logical
database
Remember what the SaaS concept promises. It is the same application that is "shared".
There is no difference in what the different organizations "see", neither in the application
nor in the database structure. To the developer, all that really means, 1 script.
2. Alternative with multi-tenanting :
Licensing company, infrastructure provider
Org1 DB instance Org2
Scripts that
manage 1
app and 1 App instance
logical
database
Ok. So what really happened here ? We put two orgs into the same logical database.
1. Less maintenance costs in this kind of sharing ? : If there is absolutely no difference
between the versions of the applications that the two organizations are using, then the
"effort at maintenance" faced by the developer is the same in both cases. 1 script.
[This is a very big if, but it is some sort of a basis As we will see later, it never stays the
same application and the same database for very long].
A single codebase, and a single database maintenance script. In the first alternative it is
"run" twice over the two instances, in the second one it is run once over a single instance.
That is all.
2. Clear and marketable ownership is now fuzzed up : But notice that while in the first
case, each organization was able to clearly point and say "this is MY database instance,
and that is MY application instance". In the second case the database is now "shared",
and the instance too is "shared". Responsibilities too, are a bit fuzzy here.
This has major ramifications. Enter the SaaS Licensor. This is the guy who has probably
hosted the SaaS app, has built it, know it well, and is now contracting with the many
organizations who will use the instance. In many cases it will be the same entity as the
"cloud provider". It may not be, but we are considering simple scenarios here.
The 2 organizations now "share" the database, which means that the same "structure" still
applies to them, but which also means that every piece of data is now "tagged" with the
organization's name. To cleanly separate the data between the two orgs, when needed.
It is the Licensor's responsibility to ensure that this "tagging" really happens, and that one
organization (viewing the data through the application's window) does not end up "seeing"
another organization's data.
That would rarely happen. I can assure you that it is rare, unless the Licensor is an
amateur. Most concerns of "data security" raised in cloud computing scenarios are
somewhat unfounded.
The real issues around "sharing" are somewhere else, and more subtle. Solvable too.
3. Do not imagine I am against SaaS, or against the Cloud. Each thing has its proper use in
its proper place. The right knowledge is essential, that is all.
Common data : The above data sharing picture is more correctly visualized as this.
DB instance
Org1
Org2
There is always some common data. This could be master data, it could be configurations.
When an organization decides to "subscribe" to a SaaS application, it is allowed to share
this already existing common stuff, else the rest of its "own" data would be meaningless.
What happens when the organizations "leaves" the SaaS provider/licensor, or joins
another competing one ? Is it freely allowed to carry its own data as well as the common
data ? I have no idea, this exact situation has not occurred in my experience.
What else is the organization "sharing" ?
Something that many people cannot see in all these pictures. And yet, that something is
probably far more important than data.
It is "business logic". It is the "way your business processes are run".
When an organization enters the SaaS environment, this is what the picture often is :
SaaS Org + 3 years
New org ''sharing'' environment.
Entering Leaving
Adapting to new
business logic. Enhanced,
Its existing Its current Getting changes Increased unchanged, or
legacy data business logic as the SaaS version data unwittingly shared
enhances. business logic
What exactly is this animal called business logic ? If you will believe me, it could by a lot of
things :
- the convenient way your screens look, the way information is presented in them
- all the if-then-else conditional logic scattered all over the place, yes/nos, configurations
- the workflows your users follow as they achieve a business objective
- the way your data is structured
- the way you organize your sub-departments
- the way Jim gets permissions to travel when Tom is on leave
4. The "best practices" argument :
SaaS companies often entice organizations to join up, citing "best practices". Here we
have these 200 companies, and our application incorporates the best of their best
practices, so if you join up you will be able to avail of them.
Quite true. When a newbie company joins up, it is probably "sharing" with the best of
breed companies the same screens, the same if-else logic, same capabilities.
And how exactly do these best practices get incorporated into the SaaS applications ?
From the best organizations that join up, most of the time. To start with, the SaaS
application is probably a rough thing. But software applications are all about upgrades,
new versions, new feature additions.
Lets say an organization joins up which has a best practice X, that the existing SaaS
version does not have. So the organizations is told - we do not have this. If you want it, it
will come out in our next version, and you have to pay extra.
The new organization in its eagerness to join up, says, no problems. And there you are.
Right after the new version is deployed, about 200 other organizations are now able to
avail of this new feature X. Everyone is happy.
No one has put a price on the "business logic". And, as grizzled application builders know
only two well, it is the best , or rather the only part of an application that has any value, and
it comes straight from the business guys in the best run businesses.
Too much sharing of best practices between everyone can make best practices look silly.
The data analytics future : (This is one of those light-year jumps I warned you about. But
it is relevant. If you have difficulties on this part please ask someone).
Mature organizations know what to do with 5 years worth of accumulated data. They do
analytics on it. Although still a bit unproven, data analytics yeilds some very significant
conclusions to those who can do it well.
Lets say that you can somehow get hold of a large volume of customer purchases data.
When you "classify" the data properly and give it to a data analytics engine, it will probably
be able to tell you things like - "any customer that has purchased more than 200 dollars
from this store during January is very likely to be spending 500 dollars during Christmas
shopping". Your marketing people would have guessed that anyway, but now you have
some solid figures.
And the bigger is the volume of data you have available for the analytics, the more
complete the data, the more current, the higher are the chances that such analytics
predictions are correct.
So, now you are a 200 customer small company in the SaaS environment. Your data isn't
really good enough. And now enters a 50000 customers company, and stays in the SaaS
environment for 3 years.
5. And someone now does data analytics and arrives at important business conclusions (we
do not specify who that will be, but it will be equally stupid to prevent people from doing
data analytics when so much good data is all around).
There are several really important issues here :
- who really owns "ALL of the data?" . I guess whoever has "control" over it. Since
organizations are entering the SaaS environment primarily to be free of all the hassles,
they are relinquishing their control.
- can the Licensor do analytics on the data and derive such conclusions, and sell such
conclusions to others ?
- even the small company that had 200 customers did contribute somewhat, or maybe
even significantly, to such an analytics result. It depends on how the data was classified
before doing the analytics, maybe it had all customers of a certain type that were not
"outliers", maybe its data was less patchy and therefore better suited to analytics.
This single word "data analytics" has the potential to enforce a complete re-thinking on
how applications of today contribute to a data pool, and how pricing for SaaS applications
are done.
Both good and bad, always it is like that. One has to manage them.
A semi-technical dive, on things that make or break SaaS apps : (this one
is more relevant for the developer and the SaaS vendor, but it provides some insight into
the jugglery inside).
After 1,3 and 5 years of a SaaS application : So my architect friend and I chewed the
cud over "logical vs physical" sharing aspects of the databases, and we decided not to
worry about these things, because the people who paid us seemed to think that a single
instance meant less complexity, less costs, whatever.
Only thing is, we later wished we had, because the increasing complexities fell right on top
of us, after 1 year into the app, and then after 3 years. After 5 years there was no one left
to worry further, because the product had been entirely too successful and the company
was sold off, techies like us asked to fade. I hear the guys who purchased it are still
grappling with the mess. It is not easy, you have to know exactly what multi-tenanting
implies.
Let me state that "a business application is all about requirements". Smart SaaS vendors
ought to select a group of requirements that are more or less common across the entire
business community. Only then can it become "a service".
Remember the "ideal" case we discussed ? That the database structure is the same, that
the application is the same. In about 1 years time, after the SaaS vendor has added about
10 organizations, all that idealism starts coming apart.
In 3 years time, when about 200 organizations have been added, a second phase of
changes are also to be expected. These are more about performance and speed issues.
6. "Boxing" the real world into a single logical space : Consider these (few) examples :
• One organization has 2 users (a small mom-and-pop company). The other joins
with about 80000 users. Various shades in between. Do you seriously imagine they
will have the same business requirements, and fit into the same database
structure ?
• One organization has its departments/divisions organized functionally. The other
has organized them regionally. The domain requirements could be very similar in
both cases (both follow the same sales tracking process, say), but the user
permissioning could be completely different. For example, users in Timbuctoo are
not repeat not allowed to see orders that pertain to nuclear spare parts.
• A single big company breaks up into two. In the real world, they divide up the
existing contracts neatly, assets neatly, employees neatly, and so on. But remember
there is a lot of historical data related to these contracts. So what does one do ?
Make copies ?
• An employee leaves one organization and joins another in the same Saas
environment. Whose responsibility is it, to ensure that there is no overlap during the
login changes, that the same employee is not able to view/use/download both
organization's data simultaneously.
Models break quickly : The experience is - "models break very easily". The moment an
organization in the real world re-structures itself, or starts calling itself "a vendor who also
happens to be a customer to other vendors", you have many technical problems
overnight. No point aiming for "flexibility", "extensibility" etc., seasoned developers know all
the limits.
The whole "shared" concept depends very finely on ''the model being invariant over time".
However, organizations are different, or they change under business pressures, or grow
wiser. Simple models become complex. Often there is total conflict with the logical
structures and spaces that the SaaS application envisages. In such cases, the patchy
solutions often go against common sense and good software design practices. For
example, storing Strings in big flat tables as "custom" fields is not a good practice. Yet
custom fields are often touted as "a feature" in many SaaS applications.
If you want your application to be fine tuned to the business, one of a kind, and easily
amenable to drastic changes if you wish to (often happens), you may not like the simplicity
that SaaS may try to "box" you into. On the other hand, if the SaaS licensor is a great
"pleaser of all customers", then very soon each organization's "logical space" starts
looking distinctly individualistic, and very soon you may end up having the 100 separate
applications and 100 separate databases anyway, or much worse, a mix and a hotch-
potch which is a nightmare for your developers to maintain and update.
Fine grained authorizations and permissions :
SaaS applications need extremely fine grained authentication and authorization modules.
That is because you have none of the usual comfort that organizational firewalls and LANs
provide. So you must positively identify and separate each resource, each piece of data,
each operation, each screen. Some have their own special permissioning requirements.
Individual object level ACLs are almost a given in SaaS applications.
7. Inherently collaborative : The application we worked in was all about "collaboration"
(remember it wasn't called SaaS back then). I still maintain that the key benefit that SaaS
and multi-tenanting provides is a collaboration between organizations, between people in
those organizations, both intra-organization and inter-organization. Silo applications
without linkages are trivial.
At times, there are things one can appreciate in the multi-tenant design, the 60-70%
commonality, the ability to link one organization to another easily without needing to go out
of the system.
For example, the same "vendor organization" can collaborate with multiple "customer
organizations". So can "consultants".
But usually such collborative apps take a huge amount of design effort, and I wonder if it is
worth it.
Fat content sharing : Think also of "fat content". For example, let us say huge video files
available more or less to all organizations, some kind of training material, say. Are you
sure each organization should maintain its own copy of such big files ? Or should you have
just one copy, with some kind of "sharing" logic pointing to the fat resource.
And once again, I do not know if such permission and such content sharing is good, or
bad. Simplicity is so much better always :-)
Feeding data in and out : For various reasons, a stream of data must go in and out of all
applications. When an organization initially joins up, there is a fairly big task of getting all
ist legacy data to conform to the SaaS model. Periodically, data extracts need to be taken
out to feed other things that the organization may need it for. And of course, there is the
"integration" requirement, your CRM needs to talk to your bespoke accounting application.
The "tagging" of data that we did ensures that your see only your organization's data, and
not some other organizations. It is however a blessing only in disguise. The "tag" must be
forever preserved. I guess this hint is enough for those in the know of what ETL (extract-
transform-load) is and how badly it can screw up :-)
Identity Management : For some reason I cannot quite pin-point, the issues of "object
identity managment" seems to hit a multi-tenant SaaS application with unusual ferocity.
We often find a large number of "duplicates", duplicated companies, duplicated people,
created , requiring frequent removal, and messing up the colloaborative nature of the
platform. Like I said, it ought to be the same intensity in a silo application too, but my
experience is that it is not the same, I do not know why.
Split databases : Many organizations join, but soon start asking for "a separate database
for all of OUR data". The SaaS developer may oblige, by routing each request to its own
specific database, after parsing each request, finding out which organization the request
was for. Preserving the illusion of "a single application instance", at least.
A roundabout way of having your own application instance with own database in the first
place !
8. My own reasons in the Cloud, why, what : (this is complete speculation !!!).
Adoption of cloud during budget approvals : The cloud will be adopted. Can you
imagine any CTO/CIO saying "yes" to a departmental head's request for "buying one more
server for the new set of applications". They will almost certainly say - try it out on the
cloud. We will save Capex, and see if it works, and weigh carefully if "hired" infrastrcuture
is cheaper than buying more servers.
Of course, there will also be a drive to "utilize better the existing CPU cycles in the current
set of servers". I mean, 20% server utilization. What a waste (as if server utilization was an
accounting number, and as if people know or ever knew how to size servers against
arbitrary unpredictable application loads).
It is always about "saving dollars", isn't it ? For some odd reason, CTOs tend to adopt the
"accounting" perspective. I have met very few who have the "investor" attitude, unlike
other business heads.
A stronger reason for going to the cloud : It is the old old story - lack of available
knowledge and skillsets. There is a great shortage of enough DBAs, enough IT staff who
know enough about security, database replication, load balancing, maintaining multiple
applications, squeezing the maximum out of servers.
Whereas the cloud provider folks would definitely know. You are safer trusting security and
your data to them, than to your own in-house staff. It is the sheer vast knowledge of cloud
providers that is one very good reason why folks will go to the cloud.
Who is the "best" cloud provider ? : That is a strange question.
If you wanted rapidly built applications, obviously the ones who provide ready-to-cook-and-
eat APIs. the taste will be a little insipid, but you can get by.
If you wanted more or less your own custom application, but wouldn't mind some ready-
built common services, there are many others.
Watch this space. The biggies are moving, in their own ways.
One of them is moving very slowly, in spite of provocation. This one "knows the mind of the
hacker". So its current offering does not allow you to do this, or do that. This one also
knows that a database on the cloud is not exactly the same thing as a database
elsewhere.
Another has just realized that the desktop is a goner, it better have the applications made
for the browser. This is a good guy too, except that in the last few years it has been living
off its past glory, and now it is the past that may pull it back.
Some are offering pricing as their only feature.
Everyone is struggling with issues like session handling, insidious APIs that are vulnerable
to leet haxors, and of course the circuit breakers.
All in all, post-2010, the cloud story is gathering momentum. A good thing too, IMHO, as
long as one is mindful of the basic definitions of the cloud, and does not go by jargon.