SlideShare una empresa de Scribd logo
1 de 76
Descargar para leer sin conexión
A Technical Introduction to Big Data Analytics
Pethuru Raj PhD
Infrastructure Architect
IBM Global Cloud Center of Excellence (CoE)
IBM India, Bangalore
E-mail: peterindia@gmail.com
The Business Intelligence (BI) in the Pre-Big Data Era
The Business Intelligence (BI) in the Post-Big Data Era
The Classification of the IT Trends
• The Technology Space - There is a cornucopia of technologies (Computing, Connectivity,
Miniaturization, Middleware, Sensing, Actuation, Perception, Analyses, Knowledge Engineering, etc.)
• The Process Space – With new kinds of services, applications, data, infrastructures, and devices
joining into the mainstream IT, fresh process consolidation, orchestration, governance and
management mechanisms are emerging. That is, process excellence is the ultimate aim
• Infrastructure Space – Infrastructure consolidation, convergence, centralization, federation,
automation and sharing methods clearly indicate the infrastructure trends in the computing and
communication disciplines. Physical infrastructures turn to be virtual infrastructures. Two major
infrastructural types are
• System Infrastructure (Compute, Storage, & Network)
• Application Infrastructure – Integration Backbones, Platforms (Design, Development, Deployment,
Delivery, Management, etc.), Messaging Middleware, Databases (SQL and NoSQL), etc.
• Architecture Space – Service oriented architecture (SOA), event-driven architecture (EDA), model-
driven architecture (MDA), resource oriented architecture (ROA) and so on are the leading
architectural patterns
• The Device Space is fast evolving (Slim & Sleek, handy & trendy, mobile, wearable, implantable,
portable, etc.). Everyday machines are tied up with one another as well as to the remote Web / Cloud
• Data Space – Data are being produced in an automated and massive manner
The TectonicTrendsTowards the Ensuing Knowledge Era
1. Data is being positioned as the strategic asset for any organization
2. Analytics has been an important ingredient for worldwide business enterprises
to
Strategize and Plan Ahead
Take Informed Decisions
Proceed with Confidence and Clarity (Insights-driven Enterprises)
With the arrival of newer technologies, the capabilities and competencies of
Analytics have been consistently on the climb.
In sync up with big data, platforms and infrastructures, big insights will become the
norm for worldwide organizations
For any Strategic and SustainableTransformation
 Leverage Data Assets Insightfully
 Optimize InfrastructureTechnologically
 Innovate Processes Consistently
 Assimilate Architectures Appropriately
 ChooseTechnologies Carefully
 Ensure Accessibility, Simplicity & Consumability Cognitively
The Principal Sources for Big Data
8
The Convergence ofTechnologies lays a profound foundation for Large-scale Data Generation
Social Media
Cloud Computing
Mobile
Internet ofThings
The Extreme Connectivity enables Data Generation in Heaps
The Deeper and Broader Integration pours out Big Data
• Device to Device (D2D) Integration
• Device to Enterprise (D2E) Integration - In order to have remote and real-time
monitoring, management, repair, and maintenance, and for enabling decision-
support and expert systems, ground-level heterogeneous devices have to be
synchronized with control-level enterprise packages such as ERP, SCM, CRM,
KM etc.
• Device to Cloud (D2C) Integration - As most of the enterprise systems are
moving to clouds, device to cloud (D2C) connectivity is gaining importance.
• Cloud to Cloud (C2C) Integration – Disparate, distributed and decentralised
clouds are getting connected to provide better prospects
The Interconnectivity of Devices generates Large-scale Fast Data
The Technology Cluster Stack
Sensors, Actuators, Controllers, Tags, Stickers, consumer
electronics, appliances, Devices, Machines, Utensils, instruments,
gadgets, smart materials
Service oriented device middleware for message routing,
enrichment, adaptation etc.
Applications, Services, Data sources, Packages, Platforms,
Middleware, etc.
Clouds (Consolidated, Centralized / Federated, Virtualized,
Automated and Shared Infrastructures)
Physical World
Cyber World
Physical
Devices
Device
Middleware
Virtual
Applications
& Platforms
Virtual
Infrastructur
es
SomeTidbits on the Enormity of Data
The Unequivocal Result : the Data-drivenWorld
 BusinessTransactions, Interactions, Operations, and Analytical data
 System Infrastructure Log files
 Social & People data
 Customer, Product, Sales and other business data
 Machine and Sensor Data
 Scientific Experimentation & Observation Data (Genetics, Particle
Physics, Climate modeling, Drug Discovery, etc.,)
Why Big Data is Strategically Significant for Businesses?
Big Data brings in
 Enhanced Business Value through better performance and productivity
 Bigger and Bigger Insights through a host of newer Analytics and Use Cases
Big Data :The BusinessValue
18
What to Do with Big Data?
Big Data  Big Insights
 Aggregate all kinds of distributed, different and decentralized data
 Analyze the formatted and formalized data
 Articulate the extracted actionable intelligence
 Act based on the insights delivered and raise the bar for futuristic analytics
(Real-time, predictive, prescriptive and personal analytics)
 Accentuate business performance and productivity
Big Data Analytics: Key Drivers and Applications
The Drivers for Big Data Analysis
1. There is an Exponential Growth in Data Generation due to
◦ The continued increase in diverse and distributed data sources
2. The Maturity,Stability and Convergence ofTechnologies - DataVirtualization, Management,
Storage,Transmission,Analysis andVisualizationTechniques,Tips, andTools
3. The Massive Adoption and Adaption of Cloud Infrastructures (Compute, Storage and Network)
4. The Realization of more comprehensive, accurate, and speedier Knowledge Discovery and
Dissemination Platforms and Processes
5. Enhanced BusinessValue
6. NewerTypes of Analytics
◦ Domain-specific Analytics (Customer Sentiment, Social, Security, Retail, Fraud Detection
Analysis, etc.) and
◦ Generic Analytics(Predictive, Prescriptive, High-Performance, Real-time, Smarter
Analytics, etc.)
The Reference Architectures for Big Data Analytics
The Emerging and Evolving Analytics
The Traditional Business Analytics
The Next-Generation Business Analytics
Social Media and Network Analytics
Machine Data Analytics - Use Cases
Here are a few ROI examples from a 1% improvement in productivity across different industries:
 Commercial aviation industry — a 1% improvement in fuel savings would yield a savings of $30
billion over 15 years.
 Utilities — In global gas-fired power plant fleet a 1% improvement could yield a $66 billion savings
in fuel consumption.
 Global health care industry — A 1% efficiency gain from reduction of process inefficiencies
globally could yield more than $63 billion in health care savings.
 Railway Networks — Freight moved across the world rail networks, if improved by 1% could yield
another gain of $27 billion in fuel savings.
 Upstream Oil and Gas Exploration – a 1% improvement in capital utilization upstream oil and
gas exploration and development could total $90 billion in avoided or deferred capital expenditures.
The convergence of intelligent devices, intelligent networks and intelligent decisioning (Insight vs. Hindsight
analytics) is definitely paving the foundation for the next growth spurt or productivity gains.
Machine Data Analytics – Use Cases
Machine Data Analytics
BatchVs Real-time Analytics
BatchVs Real-time Analytics
How Does Real-Time AnalyticsWork?
The Real-time Analytics Architecture
In-Memory Data Analytics
In-Memory Computing Reference Architecture
Context-Aware Analytics
Big Data Analytics:The Key Platforms
Big Data Analytics:The Platforms
 Analytical, Distributed, Scalable and Parallel Databases
 Data warehouses, Data Marts, etc.
 In-Memory Systems (SAP HANA, etc.)
 In-Database Systems (SAS, etc.)
 Distributed File Systems (HDFS)
 Hadoop Implementations (Cloudera, Map R, HortonWorks,Apache
Hadoop, DataStax, etc.)
 NoSQL & Hybrid Databases
Parallel DBMS
 Standard relational tables and SQL
◦ Indexing, compression,caching, I/O sharing
◦ Tables partitioned over nodes
◦ Transparent to the user
 Meet performance
◦ Needed highly skilled DBA
 Flexible query interfaces
◦ UDFs varies accros implementations
Fault tolerance
◦ Not score so well
Assumption: failures are rare
Assumption: dozens of nodes in clusters
45
MapReduce Programming Model & Hadoop Platforms
 MapReduce is a programming model which specifies:
◦ A map function that processes a key/value pair to generate a set of intermediate key/value pairs,
◦ A reduce function that merges all intermediate values associated with the same intermediate key.
 Hadoop comprises large-scale, distributed, elastic, and fault-tolerant data processing and storage
modules
◦ Is a MapReduce implementation for processing large data sets over 1000s of nodes.
◦ Maps and Reduces run independently of each other over blocks of data distributed across a
cluster
46
The Hadoop Architecture
How Hadoop Functions?
The Hadoop-based Big Data Business Analytics
Why Hadoop?
 Better application development productivity through a more flexible data model;
 Greater ability to scale dynamically to support more users and data;
 Improved performance to satisfy expectations of users wanting highly responsive
applications and to allow more complex processing of data.
 Scalability to large data volumes:
◦ Scan 100 TB on 1 node @ 50 MB/sec = 23 days
◦ Scan on 1000-node cluster = 33 minutes
 Divide-And-Conquer (i.e., data partitioning)
 Cost-efficiency
◦ Commodity nodes (cheap, but unreliable)
◦ Commodity network
◦ Automatic fault-tolerance (fewer administrators)
◦ Easy to use (fewer programmers)
 Satisfies fault tolerance
 Works on heterogeneous environment
NoSQL Databases
NoSQL encompasses a wide variety of different database technologies and were developed in response
to a rise in the volume of data stored about users, objects and products, the frequency in which this data
is accessed, and performance and processing needs.
Document databases pair each key with a complex data structure known as a document.Documents
can contain many different key-value pairs, or key-array pairs, or even nested documents.
Graph stores are used to store information about networks, such as social connections.Graph stores
include Neo4J and HyperGraphDB.
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an
attribute name (or "key"), together with its value. Examples of key-value stores are Riak andVoldemort.
Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds
functionality.
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and
store columns of data together, instead of rows.
 Cassandra (Facebook) (CQL is the query language)
 BigTable (Google)
 Dynomo (Amazon)
 RIAK (SoftLayer) (Apache Lucene)
 MongoDB
 CouchDB (UNQL is the query language0
RelationalVs. NoSQL Databases
SQL Databases NoSQL Databases
The relational model takes data and separates it into
many interrelated tables. Tables reference each other
through foreign keys
The relational model minimizes the amount of storage
space required, because each piece of data is only
stored in one place. However, space efficiency comes at
expense of increased complexity when looking up data.
The desired information needs to be collected from
many tables (often hundreds in today’s enterprise
applications) and combined before it can be provided to
the application. When writing data, the write needs to be
coordinated and performed on many tables.
Developers generally use object-oriented programming
languages to build applications. It’s usually most efficient
to work with data that’s in the form of an object with a
complex structure consisting of nested data, lists, arrays,
etc. The relational data model provides a very limited
data structure that doesn’t map well to the object model.
Instead data must be stored and retrieved from tens or
even hundreds of interrelated tables. Object-relational
frameworks provide some relief but the fundamental
impedance mismatch still exists between the way an
application would like to see its data and the way it’s
actually stored in a relational database
NoSQL databases have a very different model. For
example, a document-oriented NoSQL database takes
the data you want to store and aggregates it into
documents using the JSON format. Each JSON document
can be thought of as an object to be used by your
application. A JSON document might, for example, take
all the data stored in a row that spans 20 tables of a
relational database and aggregate it into a single
document/object.
Aggregating this information may lead to duplication of
information, but since storage is no longer cost
prohibitive, the resulting data model flexibility, ease of
efficiently distributing the resulting documents and read
and write performance improvements make it an easy
trade-off for web-based applications.
Document databases, on the other hand, can store an
entire object in a single JSON document and support
complex data structures. This makes it easier to
conceptualize data as well as write, debug, and evolve
applications, often with fewer lines of code
RelationalVs. NoSQL Databases
SQL Databases NoSQL Databases
Relational technology requires strict definition of a
schema prior to storing any data into a database.
Changing the schema once data is inserted is a big deal.
Want to start capturing new information not previously
considered? Want to make rapid changes to application
behavior requiring changes to data formats and content?
With relational technology, changes like these are
extremely disruptive and frequently avoided
RDBMS supports scale-up implying the fundamentally
centralized, shared-everything architecture of relational
database technology
Enhancement Techniques include
1. Sharding
2. Denormalizing,
3. Distributed caching
NoSQL databases especially document databases are
typically schemaless, allowing you to freely add fields to
JSON documents without having to first define changes.
The format of the data being inserted can be changed at
any time, without application disruption. This allows
application developers to move quickly to incorporate
new data into their applications.
NoSQL use a cluster of standard, physical or virtual
servers to store data and support database operations.
Support the following
Auto-sharding
Data Replication
Distributed query support – “Sharding” a relational
database can reduce, or eliminate in certain cases, the
ability to perform complex data queries. NoSQL database
systems retain their full query expressive power even
when distributed across hundreds of servers.
Integrated caching – Transparently cache data in system
memory. This behavior is transparent to the application
developer and the operations team, compared to
relational technology where a caching tier is usually a
separate infrastructure tier that must be developed to,
deployed on separate servers, and explicitly managed by
the ops team.
The Capability Comparison of Different Analytical Platforms
The Big Data Analytics Infrastructures
Big Data Analytics – The Emerging Infrastructures
 Analytic, Scalable, Parallel and Distributed Databases & DataWarehouses -
Hardware Appliances (MPP and SMP)
 In-Memory Compute Infrastructures (SAP HANA on IBM Power 7)
 In-Database Compute Infrastructures (SAS Teradata, etc.)
 Expertly Integrated Systems (IBM PureData System for Hadoop,Analytics,
etc.)
 Clouds (public, private and hybrid) comprising bare metal servers and
virtual machines (VMs)
In-Memory Data Grid (IMDG)
 An IMDG is a distributed non-relational data or object store. It can be distributed to
span more than one server.
 Reading from memory is more than 3,300 times faster than reading from disk.A
simple calculation would suggest that if it takes an hour to read a set of information
from disk, it would take just over a second to read it from memory
 This approach brings data to the cloud, where the application can interact with it,
and the application is completely shielded from the complexity of having to persist
or replicate data back to the on-premise store.
 The use of an IMDG also means that while the data is available on the cloud, it is
only available in memory and is never stored on a disk in the cloud.
 IMDGs usually support linear scaling to support high loads, data partitioning,
redundancy, and automatic data recovery in case of failures.
The Big Data Analytics in Clouds
TheTypes of Big Data Analytics in Cloud
Big Data Analytics in Clouds
Why Big Data Analytics in Clouds?
 Agility & Affordability - No capital investment of a large size of Infrastructures. Just Use
and Pay
 Hadoop Platforms in Clouds - Deploying and using any Hadoop Platforms (generic or
specific, open or commercial-grade, etc.) are fast
 NoSQL Databases in Clouds - NoSQL databases are made available in Clouds
 WAN OptimizationTechnologies - There areWAN optimization products and
platforms for efficiently transmitting data over the Internet infrastructure
 Business Applications in Clouds - With enterprise information systems (EISs), high-
performance computing systems, and the establishment of data storage, social, device and
sensor clouds go up in public clouds, big data analytics at remote, Internet-scale clouds
makes sense.
 Cloud Integrators, Brokers & Orchestrators –There are products and platforms for
seamless interoperability among different and distributed systems, services and data
Entering into the HybridWorld
1. TheTraditional Analytical Systems (Data Warehouse)Vs.The
Big Data Analytical systems (Hadoop)
2. TheTraditional Databases (RDBMS)Vs.The NoSQL
Databases
3. The Scalable, Distributed, Parallel RDBMSVs.The NoSQL
Databases
The HybridWorld
The Data Analytics: the Converged Architecture
Big Data Analytics Solution Architectures for Different Industry
Segments
Big Data Insights for Media Industry – A Solution Architecture
Social Network Analytics – A Solution Architecture
Big Data Analytics: the Summary
 Digitalization, service-enablement, extreme connectivity, distribution,
commoditization, Consumerization, Industrialization, etc. are the
brewing trends towards big data
 DataVolume,Variety,Velocity andVariability are on the Rise signalling
a heightened DataValue.This development is due to the diversity
and multiplicity of data sources.
 Data Capturing,transmission, Cleansing, Filtering, Formatting, and
StorageTasks,Tools, andTechnologies are maturing fast
 Big Data platforms, patterns, practices, products, processes and
infrastructures are being developed to streamline big data analytics
The Big Picture
Enterprise Space
Embedded Space
Cloud Space
Integration Bus
A Sample List of Book Chapters
Pethuru Raj PhD
peterindia@gmail.com
www.peterindia.net
http://www.linkedin.com/in/peterindia
https://www.facebook.com/sweetypeter

Más contenido relacionado

La actualidad más candente

Big Data big deal big business for utilities vesion 01
Big Data big deal big business for utilities vesion 01Big Data big deal big business for utilities vesion 01
Big Data big deal big business for utilities vesion 01
Marc Govers
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
Editor IJCATR
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPT
Nikhil Atkuri
 
Healthcare intel it 443835 443835
Healthcare intel it 443835 443835Healthcare intel it 443835 443835
Healthcare intel it 443835 443835
Liberteks
 

La actualidad más candente (20)

Big Data big deal big business for utilities vesion 01
Big Data big deal big business for utilities vesion 01Big Data big deal big business for utilities vesion 01
Big Data big deal big business for utilities vesion 01
 
Data Science
Data ScienceData Science
Data Science
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Big data
Big dataBig data
Big data
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI market
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Improving Intelligence Analysis Through Cloud Analytics
Improving Intelligence Analysis Through  Cloud AnalyticsImproving Intelligence Analysis Through  Cloud Analytics
Improving Intelligence Analysis Through Cloud Analytics
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPT
 
R180305120123
R180305120123R180305120123
R180305120123
 
Electronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approachElectronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approach
 
Secured Scheduling Technique of Network Resource Management in Vehicular Comm...
Secured Scheduling Technique of Network Resource Management in Vehicular Comm...Secured Scheduling Technique of Network Resource Management in Vehicular Comm...
Secured Scheduling Technique of Network Resource Management in Vehicular Comm...
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
 
Healthcare intel it 443835 443835
Healthcare intel it 443835 443835Healthcare intel it 443835 443835
Healthcare intel it 443835 443835
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Big Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesBig Data Analytics in Energy & Utilities
Big Data Analytics in Energy & Utilities
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 

Destacado

Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
Ajay Ohri
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Accenture
 
Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling
Mark Myers
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
Stefan Schwarz
 
Big Data and Social Media
Big Data and Social MediaBig Data and Social Media
Big Data and Social Media
Amy Shuen
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social Media
Gerald Hensel
 

Destacado (20)

Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural Perspective
 
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
 
A big-data architecture for real-time analytics
A big-data architecture for real-time analyticsA big-data architecture for real-time analytics
A big-data architecture for real-time analytics
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Big Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in EmergenciesBig Data from Social Media and Crowdsourcing in Emergencies
Big Data from Social Media and Crowdsourcing in Emergencies
 
Social media & big data
Social media & big dataSocial media & big data
Social media & big data
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Klarity - Asia digital analytic summit
Klarity -  Asia digital analytic summitKlarity -  Asia digital analytic summit
Klarity - Asia digital analytic summit
 
Big Data Social Media & Smart Apps
Big Data Social Media & Smart AppsBig Data Social Media & Smart Apps
Big Data Social Media & Smart Apps
 
Product Placement: The Present & The Future
Product Placement: The Present & The FutureProduct Placement: The Present & The Future
Product Placement: The Present & The Future
 
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
#PolíticosViolentos, un análisis de la agresión en el discurso de Cristina Ki...
 
Big Data and Social Media
Big Data and Social MediaBig Data and Social Media
Big Data and Social Media
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social Media
 

Similar a A technical Introduction to Big Data Analytics

next-generation-data-centers
next-generation-data-centersnext-generation-data-centers
next-generation-data-centers
Jason Hoffman
 
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
huyminh802
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
ranjit banshpal
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
redmondpulver
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
Priyesh Patel
 

Similar a A technical Introduction to Big Data Analytics (20)

Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
 
IRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data AnalyticsIRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data Analytics
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Big data
Big dataBig data
Big data
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
 
next-generation-data-centers
next-generation-data-centersnext-generation-data-centers
next-generation-data-centers
 
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
 
Is Your Organization Ready to Embrace a Digital Twin?
Is Your Organization Ready to Embrace a Digital Twin?Is Your Organization Ready to Embrace a Digital Twin?
Is Your Organization Ready to Embrace a Digital Twin?
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your Data
 
Nurturing Digital Twins: How to Build Virtual Instances of Physical Assets to...
Nurturing Digital Twins: How to Build Virtual Instances of Physical Assets to...Nurturing Digital Twins: How to Build Virtual Instances of Physical Assets to...
Nurturing Digital Twins: How to Build Virtual Instances of Physical Assets to...
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
Future of Big Data
Future of Big DataFuture of Big Data
Future of Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

A technical Introduction to Big Data Analytics

  • 1. A Technical Introduction to Big Data Analytics Pethuru Raj PhD Infrastructure Architect IBM Global Cloud Center of Excellence (CoE) IBM India, Bangalore E-mail: peterindia@gmail.com
  • 2. The Business Intelligence (BI) in the Pre-Big Data Era
  • 3. The Business Intelligence (BI) in the Post-Big Data Era
  • 4. The Classification of the IT Trends • The Technology Space - There is a cornucopia of technologies (Computing, Connectivity, Miniaturization, Middleware, Sensing, Actuation, Perception, Analyses, Knowledge Engineering, etc.) • The Process Space – With new kinds of services, applications, data, infrastructures, and devices joining into the mainstream IT, fresh process consolidation, orchestration, governance and management mechanisms are emerging. That is, process excellence is the ultimate aim • Infrastructure Space – Infrastructure consolidation, convergence, centralization, federation, automation and sharing methods clearly indicate the infrastructure trends in the computing and communication disciplines. Physical infrastructures turn to be virtual infrastructures. Two major infrastructural types are • System Infrastructure (Compute, Storage, & Network) • Application Infrastructure – Integration Backbones, Platforms (Design, Development, Deployment, Delivery, Management, etc.), Messaging Middleware, Databases (SQL and NoSQL), etc. • Architecture Space – Service oriented architecture (SOA), event-driven architecture (EDA), model- driven architecture (MDA), resource oriented architecture (ROA) and so on are the leading architectural patterns • The Device Space is fast evolving (Slim & Sleek, handy & trendy, mobile, wearable, implantable, portable, etc.). Everyday machines are tied up with one another as well as to the remote Web / Cloud • Data Space – Data are being produced in an automated and massive manner
  • 5. The TectonicTrendsTowards the Ensuing Knowledge Era 1. Data is being positioned as the strategic asset for any organization 2. Analytics has been an important ingredient for worldwide business enterprises to Strategize and Plan Ahead Take Informed Decisions Proceed with Confidence and Clarity (Insights-driven Enterprises) With the arrival of newer technologies, the capabilities and competencies of Analytics have been consistently on the climb. In sync up with big data, platforms and infrastructures, big insights will become the norm for worldwide organizations
  • 6. For any Strategic and SustainableTransformation  Leverage Data Assets Insightfully  Optimize InfrastructureTechnologically  Innovate Processes Consistently  Assimilate Architectures Appropriately  ChooseTechnologies Carefully  Ensure Accessibility, Simplicity & Consumability Cognitively
  • 7. The Principal Sources for Big Data
  • 8. 8 The Convergence ofTechnologies lays a profound foundation for Large-scale Data Generation Social Media Cloud Computing Mobile Internet ofThings
  • 9. The Extreme Connectivity enables Data Generation in Heaps
  • 10.
  • 11. The Deeper and Broader Integration pours out Big Data • Device to Device (D2D) Integration • Device to Enterprise (D2E) Integration - In order to have remote and real-time monitoring, management, repair, and maintenance, and for enabling decision- support and expert systems, ground-level heterogeneous devices have to be synchronized with control-level enterprise packages such as ERP, SCM, CRM, KM etc. • Device to Cloud (D2C) Integration - As most of the enterprise systems are moving to clouds, device to cloud (D2C) connectivity is gaining importance. • Cloud to Cloud (C2C) Integration – Disparate, distributed and decentralised clouds are getting connected to provide better prospects
  • 12. The Interconnectivity of Devices generates Large-scale Fast Data
  • 13. The Technology Cluster Stack Sensors, Actuators, Controllers, Tags, Stickers, consumer electronics, appliances, Devices, Machines, Utensils, instruments, gadgets, smart materials Service oriented device middleware for message routing, enrichment, adaptation etc. Applications, Services, Data sources, Packages, Platforms, Middleware, etc. Clouds (Consolidated, Centralized / Federated, Virtualized, Automated and Shared Infrastructures) Physical World Cyber World Physical Devices Device Middleware Virtual Applications & Platforms Virtual Infrastructur es
  • 14. SomeTidbits on the Enormity of Data
  • 15. The Unequivocal Result : the Data-drivenWorld  BusinessTransactions, Interactions, Operations, and Analytical data  System Infrastructure Log files  Social & People data  Customer, Product, Sales and other business data  Machine and Sensor Data  Scientific Experimentation & Observation Data (Genetics, Particle Physics, Climate modeling, Drug Discovery, etc.,)
  • 16. Why Big Data is Strategically Significant for Businesses?
  • 17. Big Data brings in  Enhanced Business Value through better performance and productivity  Bigger and Bigger Insights through a host of newer Analytics and Use Cases
  • 18. Big Data :The BusinessValue 18
  • 19.
  • 20. What to Do with Big Data?
  • 21.
  • 22. Big Data  Big Insights  Aggregate all kinds of distributed, different and decentralized data  Analyze the formatted and formalized data  Articulate the extracted actionable intelligence  Act based on the insights delivered and raise the bar for futuristic analytics (Real-time, predictive, prescriptive and personal analytics)  Accentuate business performance and productivity
  • 23. Big Data Analytics: Key Drivers and Applications
  • 24. The Drivers for Big Data Analysis 1. There is an Exponential Growth in Data Generation due to ◦ The continued increase in diverse and distributed data sources 2. The Maturity,Stability and Convergence ofTechnologies - DataVirtualization, Management, Storage,Transmission,Analysis andVisualizationTechniques,Tips, andTools 3. The Massive Adoption and Adaption of Cloud Infrastructures (Compute, Storage and Network) 4. The Realization of more comprehensive, accurate, and speedier Knowledge Discovery and Dissemination Platforms and Processes 5. Enhanced BusinessValue 6. NewerTypes of Analytics ◦ Domain-specific Analytics (Customer Sentiment, Social, Security, Retail, Fraud Detection Analysis, etc.) and ◦ Generic Analytics(Predictive, Prescriptive, High-Performance, Real-time, Smarter Analytics, etc.)
  • 25.
  • 26. The Reference Architectures for Big Data Analytics
  • 27.
  • 28.
  • 29. The Emerging and Evolving Analytics
  • 32. Social Media and Network Analytics
  • 33. Machine Data Analytics - Use Cases Here are a few ROI examples from a 1% improvement in productivity across different industries:  Commercial aviation industry — a 1% improvement in fuel savings would yield a savings of $30 billion over 15 years.  Utilities — In global gas-fired power plant fleet a 1% improvement could yield a $66 billion savings in fuel consumption.  Global health care industry — A 1% efficiency gain from reduction of process inefficiencies globally could yield more than $63 billion in health care savings.  Railway Networks — Freight moved across the world rail networks, if improved by 1% could yield another gain of $27 billion in fuel savings.  Upstream Oil and Gas Exploration – a 1% improvement in capital utilization upstream oil and gas exploration and development could total $90 billion in avoided or deferred capital expenditures. The convergence of intelligent devices, intelligent networks and intelligent decisioning (Insight vs. Hindsight analytics) is definitely paving the foundation for the next growth spurt or productivity gains.
  • 34. Machine Data Analytics – Use Cases
  • 38. How Does Real-Time AnalyticsWork?
  • 39. The Real-time Analytics Architecture
  • 43. Big Data Analytics:The Key Platforms
  • 44. Big Data Analytics:The Platforms  Analytical, Distributed, Scalable and Parallel Databases  Data warehouses, Data Marts, etc.  In-Memory Systems (SAP HANA, etc.)  In-Database Systems (SAS, etc.)  Distributed File Systems (HDFS)  Hadoop Implementations (Cloudera, Map R, HortonWorks,Apache Hadoop, DataStax, etc.)  NoSQL & Hybrid Databases
  • 45. Parallel DBMS  Standard relational tables and SQL ◦ Indexing, compression,caching, I/O sharing ◦ Tables partitioned over nodes ◦ Transparent to the user  Meet performance ◦ Needed highly skilled DBA  Flexible query interfaces ◦ UDFs varies accros implementations Fault tolerance ◦ Not score so well Assumption: failures are rare Assumption: dozens of nodes in clusters 45
  • 46. MapReduce Programming Model & Hadoop Platforms  MapReduce is a programming model which specifies: ◦ A map function that processes a key/value pair to generate a set of intermediate key/value pairs, ◦ A reduce function that merges all intermediate values associated with the same intermediate key.  Hadoop comprises large-scale, distributed, elastic, and fault-tolerant data processing and storage modules ◦ Is a MapReduce implementation for processing large data sets over 1000s of nodes. ◦ Maps and Reduces run independently of each other over blocks of data distributed across a cluster 46
  • 48.
  • 50. The Hadoop-based Big Data Business Analytics
  • 51. Why Hadoop?  Better application development productivity through a more flexible data model;  Greater ability to scale dynamically to support more users and data;  Improved performance to satisfy expectations of users wanting highly responsive applications and to allow more complex processing of data.  Scalability to large data volumes: ◦ Scan 100 TB on 1 node @ 50 MB/sec = 23 days ◦ Scan on 1000-node cluster = 33 minutes  Divide-And-Conquer (i.e., data partitioning)  Cost-efficiency ◦ Commodity nodes (cheap, but unreliable) ◦ Commodity network ◦ Automatic fault-tolerance (fewer administrators) ◦ Easy to use (fewer programmers)  Satisfies fault tolerance  Works on heterogeneous environment
  • 52. NoSQL Databases NoSQL encompasses a wide variety of different database technologies and were developed in response to a rise in the volume of data stored about users, objects and products, the frequency in which this data is accessed, and performance and processing needs. Document databases pair each key with a complex data structure known as a document.Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. Graph stores are used to store information about networks, such as social connections.Graph stores include Neo4J and HyperGraphDB. Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak andVoldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality. Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.  Cassandra (Facebook) (CQL is the query language)  BigTable (Google)  Dynomo (Amazon)  RIAK (SoftLayer) (Apache Lucene)  MongoDB  CouchDB (UNQL is the query language0
  • 53. RelationalVs. NoSQL Databases SQL Databases NoSQL Databases The relational model takes data and separates it into many interrelated tables. Tables reference each other through foreign keys The relational model minimizes the amount of storage space required, because each piece of data is only stored in one place. However, space efficiency comes at expense of increased complexity when looking up data. The desired information needs to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application. When writing data, the write needs to be coordinated and performed on many tables. Developers generally use object-oriented programming languages to build applications. It’s usually most efficient to work with data that’s in the form of an object with a complex structure consisting of nested data, lists, arrays, etc. The relational data model provides a very limited data structure that doesn’t map well to the object model. Instead data must be stored and retrieved from tens or even hundreds of interrelated tables. Object-relational frameworks provide some relief but the fundamental impedance mismatch still exists between the way an application would like to see its data and the way it’s actually stored in a relational database NoSQL databases have a very different model. For example, a document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. A JSON document might, for example, take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object. Aggregating this information may lead to duplication of information, but since storage is no longer cost prohibitive, the resulting data model flexibility, ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-off for web-based applications. Document databases, on the other hand, can store an entire object in a single JSON document and support complex data structures. This makes it easier to conceptualize data as well as write, debug, and evolve applications, often with fewer lines of code
  • 54. RelationalVs. NoSQL Databases SQL Databases NoSQL Databases Relational technology requires strict definition of a schema prior to storing any data into a database. Changing the schema once data is inserted is a big deal. Want to start capturing new information not previously considered? Want to make rapid changes to application behavior requiring changes to data formats and content? With relational technology, changes like these are extremely disruptive and frequently avoided RDBMS supports scale-up implying the fundamentally centralized, shared-everything architecture of relational database technology Enhancement Techniques include 1. Sharding 2. Denormalizing, 3. Distributed caching NoSQL databases especially document databases are typically schemaless, allowing you to freely add fields to JSON documents without having to first define changes. The format of the data being inserted can be changed at any time, without application disruption. This allows application developers to move quickly to incorporate new data into their applications. NoSQL use a cluster of standard, physical or virtual servers to store data and support database operations. Support the following Auto-sharding Data Replication Distributed query support – “Sharding” a relational database can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds of servers. Integrated caching – Transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, compared to relational technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers, and explicitly managed by the ops team.
  • 55. The Capability Comparison of Different Analytical Platforms
  • 56.
  • 57. The Big Data Analytics Infrastructures
  • 58. Big Data Analytics – The Emerging Infrastructures  Analytic, Scalable, Parallel and Distributed Databases & DataWarehouses - Hardware Appliances (MPP and SMP)  In-Memory Compute Infrastructures (SAP HANA on IBM Power 7)  In-Database Compute Infrastructures (SAS Teradata, etc.)  Expertly Integrated Systems (IBM PureData System for Hadoop,Analytics, etc.)  Clouds (public, private and hybrid) comprising bare metal servers and virtual machines (VMs)
  • 59. In-Memory Data Grid (IMDG)  An IMDG is a distributed non-relational data or object store. It can be distributed to span more than one server.  Reading from memory is more than 3,300 times faster than reading from disk.A simple calculation would suggest that if it takes an hour to read a set of information from disk, it would take just over a second to read it from memory  This approach brings data to the cloud, where the application can interact with it, and the application is completely shielded from the complexity of having to persist or replicate data back to the on-premise store.  The use of an IMDG also means that while the data is available on the cloud, it is only available in memory and is never stored on a disk in the cloud.  IMDGs usually support linear scaling to support high loads, data partitioning, redundancy, and automatic data recovery in case of failures.
  • 60. The Big Data Analytics in Clouds
  • 61. TheTypes of Big Data Analytics in Cloud
  • 62. Big Data Analytics in Clouds
  • 63.
  • 64.
  • 65. Why Big Data Analytics in Clouds?  Agility & Affordability - No capital investment of a large size of Infrastructures. Just Use and Pay  Hadoop Platforms in Clouds - Deploying and using any Hadoop Platforms (generic or specific, open or commercial-grade, etc.) are fast  NoSQL Databases in Clouds - NoSQL databases are made available in Clouds  WAN OptimizationTechnologies - There areWAN optimization products and platforms for efficiently transmitting data over the Internet infrastructure  Business Applications in Clouds - With enterprise information systems (EISs), high- performance computing systems, and the establishment of data storage, social, device and sensor clouds go up in public clouds, big data analytics at remote, Internet-scale clouds makes sense.  Cloud Integrators, Brokers & Orchestrators –There are products and platforms for seamless interoperability among different and distributed systems, services and data
  • 66. Entering into the HybridWorld 1. TheTraditional Analytical Systems (Data Warehouse)Vs.The Big Data Analytical systems (Hadoop) 2. TheTraditional Databases (RDBMS)Vs.The NoSQL Databases 3. The Scalable, Distributed, Parallel RDBMSVs.The NoSQL Databases
  • 68. The Data Analytics: the Converged Architecture
  • 69. Big Data Analytics Solution Architectures for Different Industry Segments
  • 70. Big Data Insights for Media Industry – A Solution Architecture
  • 71. Social Network Analytics – A Solution Architecture
  • 72. Big Data Analytics: the Summary  Digitalization, service-enablement, extreme connectivity, distribution, commoditization, Consumerization, Industrialization, etc. are the brewing trends towards big data  DataVolume,Variety,Velocity andVariability are on the Rise signalling a heightened DataValue.This development is due to the diversity and multiplicity of data sources.  Data Capturing,transmission, Cleansing, Filtering, Formatting, and StorageTasks,Tools, andTechnologies are maturing fast  Big Data platforms, patterns, practices, products, processes and infrastructures are being developed to streamline big data analytics
  • 73. The Big Picture Enterprise Space Embedded Space Cloud Space Integration Bus
  • 74.
  • 75. A Sample List of Book Chapters