1. Why serve an undercooked Big Data
Solution!! Go for the well-baked one
Introduction
Today Big Data is the buzz word in the information and technology domain. Thereis too much of
discussion and at the sametime confusion pertaining to this revolution calledBig Data. Inclient’smind
(most of the time) Big Data cando wonders on their data with its intelligent processing power and
analyticalfeatures. Most often, the client is in a cloud nine situation where they visualize a Utopian/
ideal environment whereeverything is getting resolved.
Through this article, I want to deconstruct that belief. I want to share and pass on my learning on how to
rightly approach Big Data solution in order to get the insight in line with the expectationsof the client.
The intension of this BLOG is not to provide any technicalsolution for Big-Data problems, but a process
to approach Big Data solution in a smart way.
Understand the Client Requirement
Often we come across a successfully implemented Big-Data solution with optimal processing time, jazzy
visualization & analyticalcapabilities!! But generally it doesn’t meet the client’s expectations!!
Therefore, we need to understand why a client wants to go for the Big Data solution. Without having
this insight, we may not be able to produce thedesired output in line with their anticipation.
Image 1: Client mind mapping
2. Following the initial step of understanding the reason behind the client going for Big Data solution, it’s
time to generateanextrapolationof the study. We have to come up with a blueprint which will
highlight our suggestions and recommendations to executethe Big Data solution.
Image 2: The inference from client’sexpectation& Outcome
A First Person Account
Let me explain toyou this concept from a realtime example and walk you through a smart way to
execute the solution and present theoutcomes.
The tablebelow gives a brief overview of a client’s problem and how they expect it to resolve it.
Client Situation A leading retailer in USA wants to enhance the existing Decision
Support System application. The DSS application presently serves a
huge user base (50+) on a data volume of about 250TB. It runs on a
traditionalRDBMSand is not able to scale to the said expectation.
Client Requirement According to the client’sIT Director, iftheir existing application is
moved to Big-Data (byhosting on Microsoft Azure Cloud) it should be
able to resolve their problem, both on cost and performance.
3. Image 3: Smart Way Inference
Points to ponder for implementing Big Data Solution
i) Baseline the Solution Coverage
Based on the client situation mentioned above, we need to come up with solution for the following
glitches:
Reduce cost on data storage
Provide stable code base
Implement a scalable solution
Deliver cost effective application
From the inference above, I amtrying to come up with two use cases (scenario):
UseCase 1
i. Ingest one unit of the data set (1 TB) in to the system. Repeat this procedure for 30 units.
ii. Run this use case on 10, 50 and 100 Node clusters.
UseCase 2
i. Readone unit of data through multiple requests. Consider them as supplier request and can
rangefrom 1 to 40K.
ii. Run this use case on 10, 50 & 100 Node clusters.
4. ii) Configuring theEnvironment
This is yet another crucialphase of implementation. Identify theoptimal infrastructureto validatethe
solution. Ones we have identified the vendor (For e.g., Microsoft)or an infrastructure provider (For e.g.,
Big Decisions or Client’s IT Dept.)toimplement Big Data solution, keep the following aspects for one’s
cognizance:
Is the infrastructurestable? Validate thefollowing:
ValidateCPU & Memory
ValidateI/O
Validatethe Storage(In our case it should hold 250TB)
Adequate access rightsfor you to execute
Edgenode access of a cluster
Root folder access
iii) Identifying the Core Team
No solution canbe productive, until or unless we get the right stream of resource. We need to engage
the spot-on resource to run the solution.
In this sample POC (Proof of Concept), wewould need an expert or COE to support the coreteam.
There is certainly no time for training, therefore wehave to identify a coreteam who can reachthe
expectancyin no time.
iv) All Set to Go
Ones there is clarityon the objective of the exercise (here POC) with right resources and optimal
environment, you could expect the outcome alignwith the client’s expectation. It is now timeto draft a
plan for implementing thesolution.
For instance, you candetermine the storagetypes (BLOB, AzureData LakeStore) and tool stacks
(Spark, Hive, ADLF) for data processing and querying. Createa code base and execute it over the
environments.
NOTE: It’s mandatory to complete the above steps before moving to the next. All the
above activities can be executedin parallel
5. v) Present the Insights and Recommendations
Having clarity on the objectives, it is easy to matchor meet the client’sexpectation, be it implementing
the Big Data solution. Last phase of theexercise will capture thestatistics by running the code on the
configured environments. It will derive the metrics/insights from the execution outcome and present it
to the client with recommendations. Refer imagebelow for a sample outcome.
I would like to conclude with the saying, “Don’t just meet the expectations. Exceed them.”And with
this approach you are bound to exceedthe expectations of the client.