SlideShare a Scribd company logo
1 of 11
Download to read offline
Traditional Data Warehouse Vs BDB Big Data Pipeline
Warehouse with Implementation Use Case
Background
The entire Data Warehouse Architecture has been changed by the evolution of digital
footprints of organizations. BDB (Big Data BizViz) Platform claims to adopt this alteration
seamlessly through various ways of utilizing Big Data, IOT, Cognitive BI, Machine Learning,
and Artificial Intelligence.
Introduction
Our intent is to show how our BDB platform has evolved on data pipeline architectures based
on distributed data stores, distributed computing, real-time or near real-time data processing
(Streams), use of machine learning, and analytics to support decision-making process in the
current rapidly changing business environment.
Today, Big Data has grown really ‘Big’ to become one of the pillars of any business strategy.
It is necessary in a highly competitive and regulated business environment that decision
making process is based on data instead of intuition.
To make good decisions, it is necessary to process huge amount of data in an efficient way
(the less possible computing resources and a minimum processing latency), add new data
sources (structured, semi-structured, and unstructured ones such as UI activities, logs,
performance events, sensor data, emails, documents, social media etc.) and support the
decisions using machine learning algorithms as well as visualization techniques. The more an
organization invests on its proprietary algorithms, the more chances are that it can sustain the
storm of the digital era.
Let us see how we can transform our traditional data warehouse architecture into a
contemporary one to solve the current challenges related to big data and high computing.
Try to understand this situation through the example of an Enterprise customer that has 40
Million active users and overall 100 Million User bases.
Traditional data warehouse architecture
A traditional data warehouse is still good for businesses that are completely off the cloud but
slowly they need to change their strategy to adopt digital revolution. Traditional data
warehouse architecture comprises of the following elements:
• Transactional systems (OLTP)- Produce and record the transactional data/facts
resulted from business operations.
• A data warehouse (DWH)- Centralized data stores that integrate and consolidate the
transactional data.
• ETL processes-Batch processes that move the transactional data from OLTP towards
DWH (scheduled).
Data Cleanup, Mata Data Management, Data Marts, and Cubes- Representing basically a
derived and aggregated view of the DWH.
Analytical and Visualization Tools- Enabling the visualization of data stored in the data marts
and cubes for Canned, Ad-Hoc, and dashboards for different types of users. They do first
generation analytics using data mining techniques. The BDB Platform does this quite well
and has done many successful deployments.
This kind of architecture can be illustrated in the following figure.
This architecture has limitations in today’s modern Big Data Era:
• Data sources are limited only to transactional systems (OLTP).
• The major workload is based on ETL batch processing (jobs). It’s well-known that
there is a loss of data in this step.
• The integration and consolidation of data is very complex due to the rigid nature of
ETL jobs.
• Data schema is not very flexible to be extended for new analytics use cases.
Maintenance is high with new fields coming into the business processes.
• It’s very complex to integrate semi- and un-structure data sources. So, we lose very
important information in the form of log files, sensor data, emails and documents.
• It doesn’t support naturally real-time and interactive analytics. It’s thought to be batch-
oriented. With data streams there is a need of Push Analytics.
• The efforts put in scaling the solution is quite a lot, so all the businesses are not able
to budget such projects properly.
• It is mainly designed for on-premise environment, so complex to extend and deploy
in the cloud or hybrid-based environments.
• Although all the above steps can be done efficiently with BDB Platform there is a
need to bring the data in a Pipeline based Architecture to give seamless analytics.
Modern pipeline architectures
The modern pipeline architect is an evolution from the previous one integrating new data
sources and using new computing paradigms as well as the integration of IOT data, artificial
intelligence, machine learning algorithm, Deep Learning, and cognitive computing.
In this new approach, we have a pipeline engine with the following features:
• Unified data architecture for all the data sources irrespective of the original structure.
Integration with existing data marts and data warehouses help to keep the original
data as well as transformed data with the advent of Data Lakes.
• Flexible data schemas designed to be changed frequently. Use of NoSQL-based data
stores.
• Unified computing platform for processing any kind of workload (batch, interactive,
real- time, machine learning, cognitive, etc.). Use of distributed platforms such as
Hadoop, HDFS, Spark, Kafka, Flume etc.
• Deployable on the hybrid- and cloud-based environments. Private Clouds for
Enterprises are the key.
• Horizontal scalability, so we can process unlimited data volume by just adding new
nodes to the cluster. (BDB platform gives horizontal and vertical scalability)
Although, there are a huge amount of technologies related to big data and analytics, we
have shown here how various Big Data Technologies are being used with BDB Platform to
provide a working modern data pipeline architecture (see the figure 02). Using BDB platform
reduces the risk of experimenting with too many options but to use the architecture that has
been working.
Use Case 01. Data warehouse offloading
The Data warehouse should be in major companies for many years. With the exponential
growth of data, the DWHs are reaching their capacity limits and batch windows are also
increasing putting at risk the SLA. One approach is to migrate heavy ETL and calculation
workloads into Hadoop to achieve faster processing time, lower costs per stored data, and
free DWH capacity to be used in other workloads.
Here we have two major options:
• One option is to load raw-data from OLTP into Hadoop, then transform the data into
the required models, and finally move the data into the DWH. We can also extend this
scenario by integrating semi-structured, un-structured, and sensor-based data
sources. In this sense, the Hadoop environment acts as an Enterprise Data Lake.
• Another option is moving data from the DWH into Hadoop using Kafka (real Tim
messaging services) to do pre-calculations, and then the result is stored in data marts
to be visualized using traditional tools. Because the storage cost on Hadoop is much
lower than on a DWH, we can save money and keep the data for a longer time. We
can also extend this use case by taking advantage of analytical power and creating
predictive models using Spark ML Lib or R language or cognitive computing using
BDB Predictive Tool to support future decisions of the business.
Use Case 02. Near real-time analytics on data warehouse
In this scenario, we have used Kafka as a streaming data source (front-end for real-time
analytics) to store the incoming data in the form of events/messages. As part of the ingestion
process, the data is stored directly on the Hadoop file system, or some scalable, fault tolerant,
and distributed databases such as Cassandra (provided in BDB) or HBase or something else.
Then the data is computed and some predictive models are created using BDB Predictive
Tool. The result is stored in BDB’s Data Store (Elastic Search based tool) for improving the
searching capabilities of the platform, the Predictive models can be stored in BDB Predictive
Tool (R, Scala, Shell Scripts, Python based code can be written) and the result of calculations
can be stored in Cassandra. The data can be consumed by traditional tools as well as by Web
and mobile applications via API Rest.
This architectural design has the following benefits:
• Add near real-time processing capabilities over batch-oriented processing
• Lower the latency for getting actionable insights, impacting positively on the agility of
the business on making decisions.
• Lower the storage cost significantly comparing to traditional data warehousing
technologies because we can create commodity and low-cost cluster of data and
processing nodes on premise and in the cloud.
• This architecture is prepared to be migrated to the cloud and take advantage of
elasticity, so adding computing resources when the workload is increasing and
relieving computing resources when they’re not needed.
• Once this migrates to the cloud it can address the analytics needs of Hundreds and
thousands of my customers with much lower efforts. Horizontal scalability is
extremely high.
• Business can think of increasing more number of users by providing subscription base
services.
•
BDB Data Pipeline Use Case (02) for Education Industry.
Introduction
Big data architectures are reliable, scalable, and completely automated data pipelines.
Choosing architecture and building the appropriate big data solution has never been easier.
Lambda Architecture
We are following Lambda architecture here because it is fault-tolerant and linearly scalable.
We choose this because of the below given reasons:
• Data is served by different applications to both the batch layer and the speed layer
• The batch layer manages an append-only dataset and pre-computes the batch views
• The batch views are indexed by the serving layer for low-latency, ad-hoc queries
• The speed layer processes recent data and counterbalances the lower speed of the
batch layer
• Queries are answered by combining results from both batch views and real-time
views
Let us assume you have different Source Systems and you want to bring those systems data
to Data Lake through Streaming engines. Also, most of your systems are in the same domain
but their data format is different.
In this case, you want to transform all those data to a unified format based on domain and
store it in Data Lake. We can call it as organized data storage in Data Lake.
Once data is there in Data Lake, you may want to do different computation on top of that
data and store the result somewhere for analysis or analytical or reporting purpose. Our
system provides Computation framework to do this operation. This computation framework
helps you to read data from Data Lake, and perform necessary calculations (if needed). The
result can be stored in the Analytics store or Report Store. From analytics store, various
tools/applications will be reading data to create reports/dashboards and generate data
exports. API is provided to read data directly from the Data Lake (if required).
All the modules are API driven so that technology is hidden behind the API and this helps us
to design the system loosely coupled.
How different Systems data can be send to this platform
Data can be sent to this platform through the below given options:
• BDP (BizViz Big Data Platform) provides different buckets for accepting data from
customer. The customer can export the data as CSV/Excel and dump those data to
our FTP/SFTP/S3 buckets provided for the same. These buckets are then monitored
by our system to read the data and broadcast it to our streaming engine.
Note: In this case, the customer should follow certain naming conventions recommended by
our system cookbook.
• BDP provides API for accepting data to the platform. The customer can write data to
our API that can be forwarded to the streaming engine by our API.
Note: In this case, the customer should follow certain rules recommended by our system
cookbook.
• BDP provides different Agents for fetching data directly from customer database. If
the customer allows, system agents can fetch the data from their database and
broadcast to our streaming engine.
Note: In this case, the customer should follow certain rules recommended by BDP cookbook.
How the Unification, Data quality has been taken care
Data ingestion to the Data Lake should be there once customer data is available in stream
engine. In BDP we are calling it as data unification process.
Sometimes data comes with references and before moving this data to Data Lake, we may
need a lookup process to find out rest of the part of data and polish it. Sometimes we may
need to add some extra columns to the data as a part of data polishing. All these processes
are happening at unification engine.
Sometimes data comes with some lookup reference, which is not available in Data Lake. In
this scenario, system can’t push the data to Data Lake and system will keep this information
in the raw area. There will be a scheduled process which will be working on top of these raw
data and move it to refined area later.
Sometimes data came with missing information, which is very important for the Data Lake.
This data also will keep in raw area and will move to refined area later once information is
available.
All these processes are handled by unification and Data Quality Engines. BDP Data Quality
Engine will generate alert to the customer about raw data, which will help them to correct
their data in their system.
How Meta Data Management layer works here
All above process need certain rules and instructions. BDP is a framework which works as
per the input rules and given instructions to the system.
E.g. Attendance of Students is coming to Stream engine which has “Student ID” as a
reference. In Data Lake, Attendance object needs “Student Name” and “Student Gender” as
standard. MDS provides facility to configure lookup to “Student” object and fetch this two
information. In another side, Attendance data came without “Student Id” which is mandatory
information for the Data Lake. We need a quality check on data to make sure Data Lake has
polished data. MDS provides facility to do this as configuration.
These rules and instructions are called as Metadata. BDP has an API driven Metadata Service
Layer (MDS) with the facility to accept the rules and instructions from the administrator. BDP
framework has the capability to work with these rules and instructions.
• Metadata information is key for end to end data governance.
How data lake design and implementation is planned
Data Lake is the major part of our platform. BDP Data Lake is a storage repository that holds a
vast amount of raw data in its native format until it is needed. While a hierarchical data
warehouse stores data in files or folders, a data lake uses a flat architecture to store data.
Each data element in a lake is assigned a unique identifier and tagged with a set of extended
metadata tags. When a business question arises, the Data Lake can be queried for relevant
data, and that smaller set of data can then be analyzed to help answer the question.
BDP Data Lake is often associated with Hadoop-oriented object storage.An organization's
data is first loaded into the Hadoop platform, and then BDP compute engine is applied to the
data where it resides on Hadoop's cluster nodes of commodity computers.
How ETL and Compute layers (Data Pipeline) are being worked
Compute engine follows UBER architecture. We provide components like SCALA, JAVA,
TALEND, SHELL, BDB ETL etc. which help end users to come up with their on-computation
logic and integrate with our system.
We use SPARK ecosystem in BDP. SPARK provides distributed computing capability for BDP.
Compute Engine can read data from the data Lake through API and it can perform all the
required transformations and calculations which are necessary for you and allow you to store
the result at any analytical or computing storage.
This framework follows pluggability, which allow 3rd party engines to plug into it and
complete the computation. Further, it can also be used as ETL.
All the rules and instructions needed by Compute Engine are managed in MDS layer.
Compute engine needs certain rules and instruction to do all these operations. All these rules
and instructions are managed in MDS layer. In short, BDP is a highly scalable and powerful
system completely driven through MDS.
We have robust MDS UI, which provides facility to create Compute (Data Pipeline) workflows
and schedule it on top of Data Lake. All the MDS governance can be managed through the
MDS UI.
Monitoring
BDB has very good ecosystem monitoring, error handling, and troubleshooting facility with
the help of various frameworks like AMBARI.
Scalability, Benefits & ROI of the above Solution
• BDB system works in one platform/integrated ecosystem on Cloud rather than
working in SILOS like other vendors. There is no need for any other product with BDB
Platform, it comes with Big Data Pipeline, ETL, Query Services, Predictive Tool with
Machine Learning, Survey Platform, Self Service BI, Advanced Visualization and
Dashboards.
• Provided real-time attendance & assessment reporting to its entire 40 million active
user bases – Teachers, Students, Administrators etc.
• Substantial Reduction in Cost of Operation
o Converted all Compliance Reporting into BizViz Architecture to reduce the
Costs by ¼
o Saved 2 years of time and multi - million Dollars of effort to create an Analytics
Platform.
• Market Opportunity
o Instead of selling others licensing package, customers can sell their own
Subscription Services and Licenses – Market opportunity 10x of investment.
• Additional Analytics Services revenue with BDB as Implementation team.
• The solution can be extended for other 60 million passive Users, also Data Lake
Services or Data as Services can be provided to other School Users. The entire
solution can be extended with Big Data Analytics, Predictive, and Prescription
Analytics like Student Dropout and Teacher Attritions by trapping critical Student and
Teacher related parameters that are built into the system. This itself gives another 20x
opportunity to the customer.
Mapping BDB into the BI & Analytics Ecosystem –
A picture is equivalent to 1000 words; the diagram given below is self-expressive of where
BDB stands in the BI and Analytics Ecosystem. The other products have been shown to use
them as a template to place BDB in the right perspective. Our Pre-Sales team that has
studied multiple BI products over a period has created the diagram. The diagram depicts that
BDB owns a single installation to provide for all elements of Big Data.
The following image displays our current set of thought process and feature based maturity
vis-à-vis global names. We would like to repeat that we are competing against the best
names in the industry and our closure after POC round is 100%.

More Related Content

What's hot

Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Denodo
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroDenodo
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lakeCapgemini
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...Eric Javier Espino Man
 
Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014jenjermain
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsparamitap
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseOsama Hussein
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldDenodo
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_SuiteRobin Fong 方俊强
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsSeeling Cheung
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 

What's hot (20)

Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
 
Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systems
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
DW 101
DW 101DW 101
DW 101
 

Similar to Traditional data word

Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Harsha Gowda B R
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalVMware Tanzu Korea
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdferamfatima43
 

Similar to Traditional data word (20)

Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 

Recently uploaded

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Recently uploaded (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Traditional data word

  • 1. Traditional Data Warehouse Vs BDB Big Data Pipeline Warehouse with Implementation Use Case Background The entire Data Warehouse Architecture has been changed by the evolution of digital footprints of organizations. BDB (Big Data BizViz) Platform claims to adopt this alteration seamlessly through various ways of utilizing Big Data, IOT, Cognitive BI, Machine Learning, and Artificial Intelligence. Introduction Our intent is to show how our BDB platform has evolved on data pipeline architectures based on distributed data stores, distributed computing, real-time or near real-time data processing (Streams), use of machine learning, and analytics to support decision-making process in the current rapidly changing business environment. Today, Big Data has grown really ‘Big’ to become one of the pillars of any business strategy. It is necessary in a highly competitive and regulated business environment that decision making process is based on data instead of intuition. To make good decisions, it is necessary to process huge amount of data in an efficient way (the less possible computing resources and a minimum processing latency), add new data sources (structured, semi-structured, and unstructured ones such as UI activities, logs, performance events, sensor data, emails, documents, social media etc.) and support the decisions using machine learning algorithms as well as visualization techniques. The more an organization invests on its proprietary algorithms, the more chances are that it can sustain the storm of the digital era. Let us see how we can transform our traditional data warehouse architecture into a contemporary one to solve the current challenges related to big data and high computing. Try to understand this situation through the example of an Enterprise customer that has 40 Million active users and overall 100 Million User bases. Traditional data warehouse architecture A traditional data warehouse is still good for businesses that are completely off the cloud but slowly they need to change their strategy to adopt digital revolution. Traditional data warehouse architecture comprises of the following elements: • Transactional systems (OLTP)- Produce and record the transactional data/facts resulted from business operations. • A data warehouse (DWH)- Centralized data stores that integrate and consolidate the transactional data. • ETL processes-Batch processes that move the transactional data from OLTP towards DWH (scheduled).
  • 2. Data Cleanup, Mata Data Management, Data Marts, and Cubes- Representing basically a derived and aggregated view of the DWH. Analytical and Visualization Tools- Enabling the visualization of data stored in the data marts and cubes for Canned, Ad-Hoc, and dashboards for different types of users. They do first generation analytics using data mining techniques. The BDB Platform does this quite well and has done many successful deployments. This kind of architecture can be illustrated in the following figure. This architecture has limitations in today’s modern Big Data Era: • Data sources are limited only to transactional systems (OLTP). • The major workload is based on ETL batch processing (jobs). It’s well-known that there is a loss of data in this step. • The integration and consolidation of data is very complex due to the rigid nature of ETL jobs. • Data schema is not very flexible to be extended for new analytics use cases. Maintenance is high with new fields coming into the business processes. • It’s very complex to integrate semi- and un-structure data sources. So, we lose very important information in the form of log files, sensor data, emails and documents. • It doesn’t support naturally real-time and interactive analytics. It’s thought to be batch- oriented. With data streams there is a need of Push Analytics. • The efforts put in scaling the solution is quite a lot, so all the businesses are not able to budget such projects properly. • It is mainly designed for on-premise environment, so complex to extend and deploy in the cloud or hybrid-based environments.
  • 3. • Although all the above steps can be done efficiently with BDB Platform there is a need to bring the data in a Pipeline based Architecture to give seamless analytics. Modern pipeline architectures The modern pipeline architect is an evolution from the previous one integrating new data sources and using new computing paradigms as well as the integration of IOT data, artificial intelligence, machine learning algorithm, Deep Learning, and cognitive computing. In this new approach, we have a pipeline engine with the following features: • Unified data architecture for all the data sources irrespective of the original structure. Integration with existing data marts and data warehouses help to keep the original data as well as transformed data with the advent of Data Lakes. • Flexible data schemas designed to be changed frequently. Use of NoSQL-based data stores. • Unified computing platform for processing any kind of workload (batch, interactive, real- time, machine learning, cognitive, etc.). Use of distributed platforms such as Hadoop, HDFS, Spark, Kafka, Flume etc. • Deployable on the hybrid- and cloud-based environments. Private Clouds for Enterprises are the key. • Horizontal scalability, so we can process unlimited data volume by just adding new nodes to the cluster. (BDB platform gives horizontal and vertical scalability) Although, there are a huge amount of technologies related to big data and analytics, we have shown here how various Big Data Technologies are being used with BDB Platform to provide a working modern data pipeline architecture (see the figure 02). Using BDB platform reduces the risk of experimenting with too many options but to use the architecture that has been working.
  • 4. Use Case 01. Data warehouse offloading The Data warehouse should be in major companies for many years. With the exponential growth of data, the DWHs are reaching their capacity limits and batch windows are also increasing putting at risk the SLA. One approach is to migrate heavy ETL and calculation workloads into Hadoop to achieve faster processing time, lower costs per stored data, and free DWH capacity to be used in other workloads. Here we have two major options: • One option is to load raw-data from OLTP into Hadoop, then transform the data into the required models, and finally move the data into the DWH. We can also extend this scenario by integrating semi-structured, un-structured, and sensor-based data sources. In this sense, the Hadoop environment acts as an Enterprise Data Lake. • Another option is moving data from the DWH into Hadoop using Kafka (real Tim messaging services) to do pre-calculations, and then the result is stored in data marts to be visualized using traditional tools. Because the storage cost on Hadoop is much lower than on a DWH, we can save money and keep the data for a longer time. We can also extend this use case by taking advantage of analytical power and creating predictive models using Spark ML Lib or R language or cognitive computing using BDB Predictive Tool to support future decisions of the business. Use Case 02. Near real-time analytics on data warehouse In this scenario, we have used Kafka as a streaming data source (front-end for real-time analytics) to store the incoming data in the form of events/messages. As part of the ingestion process, the data is stored directly on the Hadoop file system, or some scalable, fault tolerant, and distributed databases such as Cassandra (provided in BDB) or HBase or something else. Then the data is computed and some predictive models are created using BDB Predictive Tool. The result is stored in BDB’s Data Store (Elastic Search based tool) for improving the searching capabilities of the platform, the Predictive models can be stored in BDB Predictive Tool (R, Scala, Shell Scripts, Python based code can be written) and the result of calculations can be stored in Cassandra. The data can be consumed by traditional tools as well as by Web and mobile applications via API Rest. This architectural design has the following benefits: • Add near real-time processing capabilities over batch-oriented processing • Lower the latency for getting actionable insights, impacting positively on the agility of the business on making decisions. • Lower the storage cost significantly comparing to traditional data warehousing technologies because we can create commodity and low-cost cluster of data and processing nodes on premise and in the cloud.
  • 5. • This architecture is prepared to be migrated to the cloud and take advantage of elasticity, so adding computing resources when the workload is increasing and relieving computing resources when they’re not needed. • Once this migrates to the cloud it can address the analytics needs of Hundreds and thousands of my customers with much lower efforts. Horizontal scalability is extremely high. • Business can think of increasing more number of users by providing subscription base services. • BDB Data Pipeline Use Case (02) for Education Industry. Introduction Big data architectures are reliable, scalable, and completely automated data pipelines. Choosing architecture and building the appropriate big data solution has never been easier. Lambda Architecture We are following Lambda architecture here because it is fault-tolerant and linearly scalable. We choose this because of the below given reasons: • Data is served by different applications to both the batch layer and the speed layer • The batch layer manages an append-only dataset and pre-computes the batch views • The batch views are indexed by the serving layer for low-latency, ad-hoc queries • The speed layer processes recent data and counterbalances the lower speed of the batch layer • Queries are answered by combining results from both batch views and real-time views Let us assume you have different Source Systems and you want to bring those systems data to Data Lake through Streaming engines. Also, most of your systems are in the same domain but their data format is different. In this case, you want to transform all those data to a unified format based on domain and store it in Data Lake. We can call it as organized data storage in Data Lake. Once data is there in Data Lake, you may want to do different computation on top of that data and store the result somewhere for analysis or analytical or reporting purpose. Our system provides Computation framework to do this operation. This computation framework helps you to read data from Data Lake, and perform necessary calculations (if needed). The result can be stored in the Analytics store or Report Store. From analytics store, various tools/applications will be reading data to create reports/dashboards and generate data exports. API is provided to read data directly from the Data Lake (if required).
  • 6. All the modules are API driven so that technology is hidden behind the API and this helps us to design the system loosely coupled. How different Systems data can be send to this platform Data can be sent to this platform through the below given options: • BDP (BizViz Big Data Platform) provides different buckets for accepting data from customer. The customer can export the data as CSV/Excel and dump those data to our FTP/SFTP/S3 buckets provided for the same. These buckets are then monitored by our system to read the data and broadcast it to our streaming engine. Note: In this case, the customer should follow certain naming conventions recommended by our system cookbook. • BDP provides API for accepting data to the platform. The customer can write data to our API that can be forwarded to the streaming engine by our API. Note: In this case, the customer should follow certain rules recommended by our system cookbook. • BDP provides different Agents for fetching data directly from customer database. If the customer allows, system agents can fetch the data from their database and broadcast to our streaming engine. Note: In this case, the customer should follow certain rules recommended by BDP cookbook. How the Unification, Data quality has been taken care Data ingestion to the Data Lake should be there once customer data is available in stream engine. In BDP we are calling it as data unification process.
  • 7. Sometimes data comes with references and before moving this data to Data Lake, we may need a lookup process to find out rest of the part of data and polish it. Sometimes we may need to add some extra columns to the data as a part of data polishing. All these processes are happening at unification engine. Sometimes data comes with some lookup reference, which is not available in Data Lake. In this scenario, system can’t push the data to Data Lake and system will keep this information in the raw area. There will be a scheduled process which will be working on top of these raw data and move it to refined area later. Sometimes data came with missing information, which is very important for the Data Lake. This data also will keep in raw area and will move to refined area later once information is available. All these processes are handled by unification and Data Quality Engines. BDP Data Quality Engine will generate alert to the customer about raw data, which will help them to correct their data in their system. How Meta Data Management layer works here All above process need certain rules and instructions. BDP is a framework which works as per the input rules and given instructions to the system. E.g. Attendance of Students is coming to Stream engine which has “Student ID” as a reference. In Data Lake, Attendance object needs “Student Name” and “Student Gender” as standard. MDS provides facility to configure lookup to “Student” object and fetch this two information. In another side, Attendance data came without “Student Id” which is mandatory information for the Data Lake. We need a quality check on data to make sure Data Lake has polished data. MDS provides facility to do this as configuration. These rules and instructions are called as Metadata. BDP has an API driven Metadata Service Layer (MDS) with the facility to accept the rules and instructions from the administrator. BDP framework has the capability to work with these rules and instructions. • Metadata information is key for end to end data governance. How data lake design and implementation is planned Data Lake is the major part of our platform. BDP Data Lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the Data Lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.
  • 8. BDP Data Lake is often associated with Hadoop-oriented object storage.An organization's data is first loaded into the Hadoop platform, and then BDP compute engine is applied to the data where it resides on Hadoop's cluster nodes of commodity computers. How ETL and Compute layers (Data Pipeline) are being worked Compute engine follows UBER architecture. We provide components like SCALA, JAVA, TALEND, SHELL, BDB ETL etc. which help end users to come up with their on-computation logic and integrate with our system. We use SPARK ecosystem in BDP. SPARK provides distributed computing capability for BDP. Compute Engine can read data from the data Lake through API and it can perform all the required transformations and calculations which are necessary for you and allow you to store the result at any analytical or computing storage. This framework follows pluggability, which allow 3rd party engines to plug into it and complete the computation. Further, it can also be used as ETL. All the rules and instructions needed by Compute Engine are managed in MDS layer. Compute engine needs certain rules and instruction to do all these operations. All these rules and instructions are managed in MDS layer. In short, BDP is a highly scalable and powerful system completely driven through MDS. We have robust MDS UI, which provides facility to create Compute (Data Pipeline) workflows and schedule it on top of Data Lake. All the MDS governance can be managed through the MDS UI.
  • 9. Monitoring BDB has very good ecosystem monitoring, error handling, and troubleshooting facility with the help of various frameworks like AMBARI. Scalability, Benefits & ROI of the above Solution • BDB system works in one platform/integrated ecosystem on Cloud rather than working in SILOS like other vendors. There is no need for any other product with BDB Platform, it comes with Big Data Pipeline, ETL, Query Services, Predictive Tool with Machine Learning, Survey Platform, Self Service BI, Advanced Visualization and Dashboards. • Provided real-time attendance & assessment reporting to its entire 40 million active user bases – Teachers, Students, Administrators etc. • Substantial Reduction in Cost of Operation o Converted all Compliance Reporting into BizViz Architecture to reduce the Costs by ¼ o Saved 2 years of time and multi - million Dollars of effort to create an Analytics Platform. • Market Opportunity o Instead of selling others licensing package, customers can sell their own Subscription Services and Licenses – Market opportunity 10x of investment. • Additional Analytics Services revenue with BDB as Implementation team. • The solution can be extended for other 60 million passive Users, also Data Lake Services or Data as Services can be provided to other School Users. The entire solution can be extended with Big Data Analytics, Predictive, and Prescription Analytics like Student Dropout and Teacher Attritions by trapping critical Student and Teacher related parameters that are built into the system. This itself gives another 20x opportunity to the customer.
  • 10. Mapping BDB into the BI & Analytics Ecosystem – A picture is equivalent to 1000 words; the diagram given below is self-expressive of where BDB stands in the BI and Analytics Ecosystem. The other products have been shown to use them as a template to place BDB in the right perspective. Our Pre-Sales team that has studied multiple BI products over a period has created the diagram. The diagram depicts that BDB owns a single installation to provide for all elements of Big Data.
  • 11. The following image displays our current set of thought process and feature based maturity vis-à-vis global names. We would like to repeat that we are competing against the best names in the industry and our closure after POC round is 100%.