DevEX - reference for building teams, processes, and platforms
Is the elephant in the room
1. Is the Elephant in the room?
Regunath B
regunathb@gmail.com
Twitter : @RegunathB
2. Quick read 1.8 million words?
The story is about a battle between great kings and sons, with the principal characters being
Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc.
Source : The Gramener blog for visualizations –
Analysis of the entire text contained in the Mahabharatha
(http://blog.gramener.com/category/visualisations)
3. Insights from Social Media
Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile)
(http://ttwick.com/blog/bill-gates-twitter-social-media/)
4. Insights from Social Media
Source : Impact page of Satyamevjayate
(http://www.satyamevjayate.in/impact/impact.php/)
5. What is Big Data?
● Big Data challenges and opportunities arise when information in an enterprise
demonstrates following characteristics:
– Volume
● Transaction data from enterprise systems
– For example : Financial transactions, Orders
– Variety
● Structured and Unstructured data
– For example : Customer contact, Social Media, Biometrics
– Velocity
● High information arrival rates
– For example : Application events, Tagging, Rating of content
● Big Data opportunities arise when the enterprise is able to derive Value from the
data characteristics defined above
6. Food for thought.... on theorems and laws
● Do hardware and technology trends affect your technology selection?
– CPU, RAM and disk size double every 18-24 months [Moore’s law]
– Disk seek time remains nearly constant at around 5% speed-up per year
● Data Seek vs. Data transfer
– Software that leverage one of the above (or) a combination
B+ tree index, LSM tree index, “Fractal tree”
● CAP theorem effect – ability to achieve only 2 of 3 properties of shared-
data systems : data Consistency, system Availability and tolerance to
network Partitions
● Bandwidth is the most scare commodity in a Data Center
7. Aadhaar Patterns & Technologies
•
Principles
•
POJO based application implementation
•
Light-weight, custom application container
•
Http gateway for APIs
•
Compute Patterns
•
Data Locality
•
Distribute compute (within a OS process and across)
•
Compute Architectures
•
SEDA – Staged Event Driven Architecture
•
Master-Worker(s) Compute Grid
•
Data Access types
•
High throughput streaming : bio-dedupe, analytics
•
High volume, moderate latency : workflow, UID records
•
High volume , low latency : auth, demo-dedupe,
search – eAadhaar, KYC
8. Aadhaar Architecture
•
Real-time monitoring using Events
•
Work distribution
using SEDA &
Messaging
•
Ability to scale within
JVM and across
•
Recovery through
check-pointing
•
Sync Http based Auth
gateway
•
Protocol Buffers &
XML payloads
•
Sharded clusters
•
Near Real-time data delivery to warehouse
•
Nightly data-sets used to build dashboards, data
marts and reports
11. Big Data at Flipkart
● Website traffic
– Millions of page hits per day – product catalogs, item availability, promotions,
search
– Millions of active sessions and shopping carts
– Latencies measured in low digit milliseconds
● Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...)
– Electronic inventory – MP3, eBooks, movies
● New business models, newer channels
● Understanding users, user profiles, social media, experience
– Tera bytes of logs containing browsing behavior, data from multiple
engagement channels
– Recommendations based on millions of possible item matches and relevance
algorithms
12.
13. Is the Elephant in the room?
From Wikipedia:
"Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored
or goes unaddressed.
Big Data opportunities and challenges are real and present -
It is the Elephant in the room.
14. Some takeaways from experience
● Make everything API based
● Everything fails (hardware, software, network, storage)
– System must recover, retry transactions, and sort of self-heal
● Security and privacy should not be an afterthought
● Scalability does not come from one product
– Watch out for solution and technology stereotyping
● Open scale out is the only way to go
– Heterogeneous, multi-vendor, commodity compute, growing linear fashion.
Nothing else can adapt!