SlideShare una empresa de Scribd logo
1 de 14
Is the Elephant in the room?

                                         Regunath B

                               regunathb@gmail.com
                                Twitter : @RegunathB
Quick read 1.8 million words?




The story is about a battle between great kings and sons, with the principal characters being
Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc.
                                                            Source : The Gramener blog for visualizations –
                                                Analysis of the entire text contained in the Mahabharatha
                                                       (http://blog.gramener.com/category/visualisations)
Insights from Social Media




                         Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile)
                                         (http://ttwick.com/blog/bill-gates-twitter-social-media/)
Insights from Social Media




                                         Source : Impact page of Satyamevjayate
                             (http://www.satyamevjayate.in/impact/impact.php/)
What is Big Data?

●   Big Data challenges and opportunities arise when information in an enterprise
    demonstrates following characteristics:

     –   Volume
          ●   Transaction data from enterprise systems
                   –   For example : Financial transactions, Orders
     –   Variety
          ●   Structured and Unstructured data
                   –   For example : Customer contact, Social Media, Biometrics
     –   Velocity
          ●   High information arrival rates
                   –   For example : Application events, Tagging, Rating of content



●   Big Data opportunities arise when the enterprise is able to derive Value from the
    data characteristics defined above
Food for thought.... on theorems and laws
●   Do hardware and technology trends affect your technology selection?
     –   CPU, RAM and disk size double every 18-24 months [Moore’s law]
     –   Disk seek time remains nearly constant at around 5% speed-up per year


●   Data Seek vs. Data transfer
     –   Software that leverage one of the above (or) a combination
         B+ tree index, LSM tree index, “Fractal tree”


●   CAP theorem effect – ability to achieve only 2 of 3 properties of shared-
    data systems : data Consistency, system Availability and tolerance to
    network Partitions


●   Bandwidth is the most scare commodity in a Data Center
Aadhaar Patterns & Technologies
•
    Principles
      •
         POJO based application implementation
      •
         Light-weight, custom application container
      •
         Http gateway for APIs

•
    Compute Patterns
     •
       Data Locality
     •
       Distribute compute (within a OS process and across)

•
    Compute Architectures
     •
       SEDA – Staged Event Driven Architecture
     •
       Master-Worker(s) Compute Grid

•
    Data Access types
     •
        High throughput streaming : bio-dedupe, analytics
     •
        High volume, moderate latency : workflow, UID records
     •
        High volume , low latency : auth, demo-dedupe,
                         search – eAadhaar, KYC
Aadhaar Architecture
                              •
                                  Real-time monitoring using Events


•
    Work distribution
    using SEDA &
    Messaging
•
    Ability to scale within
    JVM and across
•
    Recovery through
    check-pointing




•
    Sync Http based Auth
    gateway
•
    Protocol Buffers &
    XML payloads
•
    Sharded clusters

                                                   •
                                                       Near Real-time data delivery to warehouse
                                                   •
                                                       Nightly data-sets used to build dashboards, data
                                                       marts and reports
Putting data to work at Aadhaar
Deployment Monitoring
Big Data at Flipkart
 ●   Website traffic
      –   Millions of page hits per day – product catalogs, item availability, promotions,
          search
      –   Millions of active sessions and shopping carts
      –   Latencies measured in low digit milliseconds
 ●   Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...)
      –   Electronic inventory – MP3, eBooks, movies
 ●   New business models, newer channels
 ●   Understanding users, user profiles, social media, experience
      –   Tera bytes of logs containing browsing behavior, data from multiple
          engagement channels
      –   Recommendations based on millions of possible item matches and relevance
          algorithms
Is the Elephant in the room?




From Wikipedia:

"Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored
or goes unaddressed.




Big Data opportunities and challenges are real and present -
It is the Elephant in the room.
Some takeaways from experience


●   Make everything API based
●   Everything fails (hardware, software, network, storage)
     –   System must recover, retry transactions, and sort of self-heal
●   Security and privacy should not be an afterthought
●   Scalability does not come from one product
     –   Watch out for solution and technology stereotyping
●   Open scale out is the only way to go
     –   Heterogeneous, multi-vendor, commodity compute, growing linear fashion.
         Nothing else can adapt!

Más contenido relacionado

Destacado

practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome themsaipriyadonthula
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres Regunath B
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantageRegunath B
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 

Destacado (7)

practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
What database
What databaseWhat database
What database
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Is the elephant in the room

  • 1. Is the Elephant in the room? Regunath B regunathb@gmail.com Twitter : @RegunathB
  • 2. Quick read 1.8 million words? The story is about a battle between great kings and sons, with the principal characters being Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc. Source : The Gramener blog for visualizations – Analysis of the entire text contained in the Mahabharatha (http://blog.gramener.com/category/visualisations)
  • 3. Insights from Social Media Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile) (http://ttwick.com/blog/bill-gates-twitter-social-media/)
  • 4. Insights from Social Media Source : Impact page of Satyamevjayate (http://www.satyamevjayate.in/impact/impact.php/)
  • 5. What is Big Data? ● Big Data challenges and opportunities arise when information in an enterprise demonstrates following characteristics: – Volume ● Transaction data from enterprise systems – For example : Financial transactions, Orders – Variety ● Structured and Unstructured data – For example : Customer contact, Social Media, Biometrics – Velocity ● High information arrival rates – For example : Application events, Tagging, Rating of content ● Big Data opportunities arise when the enterprise is able to derive Value from the data characteristics defined above
  • 6. Food for thought.... on theorems and laws ● Do hardware and technology trends affect your technology selection? – CPU, RAM and disk size double every 18-24 months [Moore’s law] – Disk seek time remains nearly constant at around 5% speed-up per year ● Data Seek vs. Data transfer – Software that leverage one of the above (or) a combination B+ tree index, LSM tree index, “Fractal tree” ● CAP theorem effect – ability to achieve only 2 of 3 properties of shared- data systems : data Consistency, system Availability and tolerance to network Partitions ● Bandwidth is the most scare commodity in a Data Center
  • 7. Aadhaar Patterns & Technologies • Principles • POJO based application implementation • Light-weight, custom application container • Http gateway for APIs • Compute Patterns • Data Locality • Distribute compute (within a OS process and across) • Compute Architectures • SEDA – Staged Event Driven Architecture • Master-Worker(s) Compute Grid • Data Access types • High throughput streaming : bio-dedupe, analytics • High volume, moderate latency : workflow, UID records • High volume , low latency : auth, demo-dedupe, search – eAadhaar, KYC
  • 8. Aadhaar Architecture • Real-time monitoring using Events • Work distribution using SEDA & Messaging • Ability to scale within JVM and across • Recovery through check-pointing • Sync Http based Auth gateway • Protocol Buffers & XML payloads • Sharded clusters • Near Real-time data delivery to warehouse • Nightly data-sets used to build dashboards, data marts and reports
  • 9. Putting data to work at Aadhaar
  • 11. Big Data at Flipkart ● Website traffic – Millions of page hits per day – product catalogs, item availability, promotions, search – Millions of active sessions and shopping carts – Latencies measured in low digit milliseconds ● Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...) – Electronic inventory – MP3, eBooks, movies ● New business models, newer channels ● Understanding users, user profiles, social media, experience – Tera bytes of logs containing browsing behavior, data from multiple engagement channels – Recommendations based on millions of possible item matches and relevance algorithms
  • 12.
  • 13. Is the Elephant in the room? From Wikipedia: "Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored or goes unaddressed. Big Data opportunities and challenges are real and present - It is the Elephant in the room.
  • 14. Some takeaways from experience ● Make everything API based ● Everything fails (hardware, software, network, storage) – System must recover, retry transactions, and sort of self-heal ● Security and privacy should not be an afterthought ● Scalability does not come from one product – Watch out for solution and technology stereotyping ● Open scale out is the only way to go – Heterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt!