2. k,((( 34
Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.
3. data say they are doing it for the consumer’s benefit. But data
have a way of being used for purposes other originally
intended” [7].
Early of 1990s, the birth of the interconnected web of data
and accessible to anyone from anywhere, known as Internet.
The digital storage become more cost effective than manual
printing documents. Michael [8] describes that including the
sounds and images there are thousands of petabytes
information, the existence of 12,000 petabytes is not an
unreasonable guest. The web is increasing in size of 10-fold
each year, however, data will never be discovered values and
yield no insight. During the mid 1990s, the internet is
extremely popular, but structure relational databases cannot
cope with the variety of data types from different non-
relational databases. Thus, NoSQL system is created to
handle different languages and formats in a great flexible way.
Larry Page and Sergey Brin implement Google’s search
engine that can respond in a few seconds by returning desired
results, which processing and analyzing Big Data in
distributed method [9]. Richard comments that the purpose of
computing is insight, and not just numbers. In 1999, Kevin
introduces the term of “Internet of Things” to describe the
growing number of devices online to automated the
communication each other without a human interference;
Also, it utilizes the Internet to empower computers to sense
the world for themselves [10].
In the advent of Industry Revolution 4.0, which
developing in Germany 2013; it has been rapidly spread in
Europe and the world as a while. BDA is one of the key
adoptions and pillar for IoT initiative to improve decision
making [11]. It requires to process a large amount of data on
the fly and storing the data in various scalable storage
technologies. This lighting fast analytics implementation
allows the industries to gain rapid insights, provide prediction
for machinery, and share information. Intrinsically, it requires
a unified architecture to cater common operation for enabling
innovative applications.
B. Big Data General ‘Vs’ Concept
For understanding the Big Data concept, it always
considers the simple building block of data model which is
effectively communicating each and others. In 2001, Gartner
analyst, Doug introduces the 3Vs concept in the dimension of
data management, it consists of controlling data volume,
variety and velocity [12]. It characterizes the creation of data,
storage, retrieval and analysis. After a decade, IBM has been
coined two more worthy of Vs, which are Veracity and Value.
The following shows the brief description of 5Vs:
Volume: It implies to the enormous quantity of data is
generated.
Velocity: It refers to the speed at the data is created and
processed at staggering rate.
Variety: It defines as type of content of data analysis.
Veracity: It focuses on the quality and trust-worthiness
of the variability in the captured data.
Value: It raises to the significance of the data, which
delivering the insights and creating useful
model that answers sophisticated queries.
Inspired by the comprehensive discussion and relevant
comments on IBM website of Big Data Analytics hub, it
clusters the 5Vs into three groups [13]:
Volume
Velocity:
These translate into requirements of
hardware and software to deal with data.
Large scale distributed data processing
framework is required such as Hadoop.
Veracity
Velocity:
These translate into urgency of real-time
processing. The detection of possible data
corruption or manipulation is crucial with
high speed processing ability.
Value: This translates into the necessity of
interdisciplinary cooperation. This raise
the most difficult challenge for industrial
use of big data.
C. “Data at Rest” vs “Data in Motion”
There is no small task in gaining the insights of big data.
Firstly, “Data at Rest” refers the collected historical data from
various sources. It performs the analytics after the event
occurs. Thus, it is commonly used to discover behaviors and
patterns from the past records. Also, it refers to “batch
processing” method. To automate these tasks, there is a
scheduler application in place for executing the tasks
automatically. Secondly, “Data in Motion” refers to
processing and analyzing data in real-time as the event
happens. The latency is a key consideration, as a lag of
processing can be resulted the loss of opportunities.
Furthermore, hybrid of “Data at Rest” and “Data in Motion”
are common in the industries.
III. BIG COMPUTE FEATURES
For data intensive computing [14], the system should
encapsulate the sophisticated design technologies in storing,
managing and processing big data. There are two focus of key
areas, which are application and frameworks. These consists
the concept of data parallelism and task/application
parallelism. Data parallelism is distributed among servers,
and therefore can be processed in parallel. It has been claimed
that it opposes to task parallelism, furthermore, it is often the
simpler method to craft a parallel application [15].
The followings describe the generic features for Big
Compute:
• Being efficient in pre-processing raw data and
combining relevant data from multiple sources,
commonly known as ETL (Extract, Transform and
Load)
• Being flexible to apply various aggregation functions
and perform ad-hoc queries to compute large amount
of sources in discovering the high-level insights of
data
• Being cost effective to extends functionalities with
minimum costs and minimize maintenance cost for
keeping the system running smoothly
• Being low latency in harnessing real-time data for
analytics by optimizing the high volume operation
with minimal delay
• Being highly scalable to enlarge the growth of the
compute resources and storages with support easily
plug-in
• Being robustness and fault tolerance to have ability to
cope with erroneous input and without down any
failures
,(((RQIHUHQFHRQ2SHQ6VWHPV,26
4. 35
Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.
5. • Being systematically governance to ensure data
availability, usability, integrity and security in used
Identifying the required features for a specific domain can
be difficult. In general, different application domains might
need different type of system. It is hard to meet all stockholder
needs with a singular design. As such, Cigdem [16] attempts
to use feature modelling technique [17]. It performs drill
down by distinguish domain scoping, which determining the
domain interest, the stockholders and their goals; and domain
modelling, which aiming to derive using a commodity
analysis. Figure 1 shows the feature model diagram. This
work provides insight in the overall feature space of BDA
system. It further assists for deriving the BDA architecture.
Figure 1: Feature Model
IV. REVIEW OF BIG DATA ARCHITECTURE FRAMEWORK
A reference architecture helps to build a blueprint of the
ultimate BDA system. It is based on a collection of
characteristics and features from common for a given set of
problems. The design of the architecture has to emerge the
fluent orchestration workflow to execute either in a
synchronous or asynchronous manner between the application
and its data. In many cases, it includes the support for the
hybrid mode of batch and real-time processing. The following
reviews of architecture frameworks broaden the perspective
and enabling problem solving with the right tools.
A. Lambda ‘λ’ Architecture
In 2011, one of the popular reference BDA architecture
design has been posted by Marz [18]. It is named as “Lambda
λ Architecture”. It is designed to combine of batch and real-
time processing paradigm in a parallel form. This method is
capable to solve many BDA use cases. In addition, it has the
robustness with fault tolerant strategy for serving wide range
of workloads. Technically, it is now feasible to run ad-hoc
queries against Big Data, but querying a petabyte dataset
every time you want to compute. Figure 2 shows the λ
architecture with three major layers.
Figure 2: λ Architecture
The batch layer pre-computes the master dataset, and
processes into batch views so that queries can be resolved with
low latency. This requires striking a balance of job between
pre-computation and execution time to complete the query.
By doing a little bit of computation on the fly to complete
queries, there save the process from needing to pre-compute
large batch views. In addition, it is not expected to update the
views frequently. The batch views may be a set of flat files
and it depends on chosen technologies. The key is to
precompute just enough information so that the query can be
completed quickly.
The serving layer indexes the views and provides
interfaces, thus, the pre-computed data can be speedily
queried. Both of the batch and speed layers are executed the
same processing logic, and then reconciles the results in
serving layer. It designates to be distributed among many
servers for scalability. There is a long-standing problem
where data is too normalized, there is a need to store some
information redundantly to improve response times. However,
denormalized the data may create huge complexity of keeping
it consistent. Thus, it need to be carefully construct this view
[19].
The speed layer is similar to batch layers. The objective is
to construct views that can be efficiently queried. It mainly
uses an incremental approach and handling real-time views.
These views are updated directly when new data arrives. It
compensates for the high latency of the batch layer to enable
up-to-date results for queries. However, incremental
computation has various new challenges and significant more
complex than batch computation. Especially, resource-
efficient manner with millisecond-level of latencies. Data
must be indexed in order to using of random-read/random-
write databases.
,(((RQIHUHQFHRQ2SHQ6VWHPV,26
6. 36
Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.
7. B. Kappa ‘κ’ Architecture
Jay [20] describes that the alternatives are worth exploring
a part of λ architecture. He addresses the issue of maintaining
the codes in two complex distributed systems. There is
exactly painful development, as in operational burden.
Especially, the distributed components like Storm and
Hadoop. κ architecture has been introduced. In this approach,
re-processing will execute, whereby the processing code has
changed. Therefore, there is actually need to recompute the
result sets. The job doing the re-computation is just an
improved version of the same code, running on the same
framework, taking the same input data. Basically, it is
simplification of the λ architecture, where there have simply
removed the entire Batch Layer. Hence, it remains Speed
layer and Serve layer. Figure 3 shows the diagram of κ
Architecture.
The workflow can handle real-time data processing and
continuous data re-processing in a single stream computation
model. Streaming job reads the data and process them. When
re-processing is required, a second instance of the streaming
job is executed that starts processing the data from the
beginning of the retained data and redirects the output to a
separate table. When the second job that was executed has
caught up with the entire dataset, simply switch the
application to read from the new data view, stop the first job,
and delete the data view of the first job [21]. The entire multi
streams can spin up multiple consumers in parallel consuming
individual part of the data.
Figure 3: κ Architecture
Another pillar of κ architecture is the immutable data log.
This is similar in concept to the immutable Master Dataset in
Lambda architecture, but instead of using technologies such
as Hadoop/HDFS, κ architecture's immutable data log is
(usually) Kafka2
. It retains the full log of the data that it needs
to re-process. Data in Kafka is persisted to disk and replicated
for fault tolerance. Furthermore, growing of data in Kafka, it
doesn’t make the system slower, as it supports cluster
implementation by distributed across servers with over a
petabyte of storage.
C. Microservices Architecture
Fully built and deployed BDA solutions often include
many components of mix vendor software and open source
software as well. It uses physical servers, virtual machines
and docker containers. Nevertheless, application
programming interface (API) is a common method for
integrating the functions and also stitched together into
working pipeline for each data source. A container is similar
a very lightweight virtual machine, however, microservices
2
Apache Kafka is developed by LinkedIn and being contributed open source
community, as in Apache Software Foundation.
3
Apache Druid detail refers to “https://druid.apache.org/”
are even lighter. Based on the trends in BDA, most analytics
pipelines are easily deployed as an immutable microservices.
These microservices executes on its own process/container
and communicate in a self-regulate way without having to
depend on other services or application as a whole.
Microservices is commonly adopted Spark, Cassandra and
Kafka open source technology [22]. Figure 4 shows the
generic Microservices Architecture diagram as referring in
[23]. It can build on demand as needed in batch, speed and
serve layer.
Figure 4: Microservices Architecture
D. IOT Architecture
With the raise of Industry Revolution 4.0, the combination
of IOT and BDA with Artificial Intelligence are being driving
to optimize and automate production for industry. IOT is in
data-driven paradigm that uses real-time pervasive connected
sensors, simulations and event logs to deliver analytics
intelligent manufacturing through Internet/Intranet for every
area of the factory [24]. These IOT devices have been
deployed in daily operations to deliver operation efficiencies,
process innovation and environmental benefits. It also
presents the challenges in term of large-scale data
management, processing and analysis [25]. It consists of four
major bases; Time Series Store/Database (TSDB), Streaming
Message Queue (SMQ), Workflow Orchestration Engine
(WOE) and Distributed File System (DFS).
Time Series Store/Database (TSDB): It is an optimized
data management system for time-stamped or time-series data.
For processing the query of time series data, the time series
segment needs to be located. Then, there is a process of
retrieval based on a combination of one or more values of the
metadata, which commonly store in a relational database, such
as SQLite, PostgreSQL, MySQL or others. This mechanism
enables TSDB to have the low latency access for tracking,
monitoring, down sampling, and aggregating over time.
Typically, it has auto-shading and horizontal scaling with a
store-specific API or through a specific build connector. There
various open source TSDB, such as Apache Druid 3
,
InfluxDB4
, OpenTSDB5
and others.
Streaming Message Queue (SMQ): Machine-to-machine
uses message protocol for establishing communication with
publish-subscribe-based messaging to the servers; such as
MQTT (Message Queue Telemetry Transport), XMPP
(Extensible Messaging and Presence Protocol), DDS (Data
Distribution Service and others. It handles certain filters,
extraction and simple/complex calculation for process during
the streaming processes.
Workflow Orchestration Engine (WOE): It designs to
orchestrate enterprise level data processing operation, flow-
based controller, scheduler, data provenance with secure and
4
InfluxDB detail refers to “https://www.influxdata.com/”
5
OpenTSDB detail refers to “http://opentsdb.net/”
,(((RQIHUHQFHRQ2SHQ6VWHPV,26
8. 37
Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.
9. durable for IOT and data analytics tasks. Furthermore, the
orchestration framework supports in distributed cluster and
extensibility with plug-in. Also, it has a diagrammatic of
views and modifiable behavior from web browser. There are
two popular open source orchestration workflow systems;
Apache Nifi/MiniFi6
is written in Java and Node-Red7
is in
JavaScript on top of Node.js platform.
Figure 5: IOT Architecture
Distributed File System (DFS): It is designed to store very
large data sets reliably, and to stream those data sets at high
bandwidth to application. It has highly fault-tolerant
capability, which has file system replicates, or copies, each
piece of data multiple times and distributes the copies to
individual nodes, placing at least one copy on a different
server rack than the others. As a result, the data on nodes that
crash can be found elsewhere within a cluster. This ensures
that processing can continue while data is recovered. There
choices of technologies for DFS, which is depending of the
“brotherhood” of applications, as the most famous open
source big data eco-system is Apache Hadoop [26]. Also,
there are Ceph8
, Alluxio9
, OpenIO10
and others.
E. NIST Big Data Reference Architecture (NBD-RA)
The National Institute of Standards and Technology
(NIST) has taken United State Federal Government for the
Big Data Research and Development Initiative responsibility.
It develops open standards and BDA architecture to accelerate
the adoption of the most secure and effective Big Data
techniques with technologies. White House announces this
initiative on March 28, 2012 [27]. It starts with fix federal
departments and agencies, which more than 80 projects
involve in this development.
NBD-RA is an elastic BDA architecture design. The
conceptual model design can be vendor-neutral, technology-
neutral, and infrastructure agnostic. The system consists of
five logical functional components; System Orchestrator,
Data Provider, Big Data Application Provider, Big data
Framework Provider and Data Consumer. Then, there are two
“Management” dimension and “Security Privacy”, which
overlaying those five components. Also, these two dimensions
provide services and functionality for BDA specific tasks.
Figure 6 shows the NBD-RA architecture, which is
referencing in [28].
6
Nifi/MiNifi detail refers to “https://nifi.apache.org”
7
Node-RED detail refers to “https://nodered.org”
8
Ceph detail refers to “https://ceph.io”.
9
Alluxio detail refers to “https://github.com/Alluxio/alluxio”.
Figure 6: NBD-RA Architecture
V. TRENDS AND ANALYSIS
The discussed architectures provide a structure with filling
a set of generic tools. However, the choice of technologies to
be used and integrated, which has much complexity. Firstly,
the consideration of BDA system is either on-premise, cloud
or hybrid. Secondly, the choice of data processing, analytics,
security with governance application technologies to be
developed; open source, commercial and hybrid. Finally, the
return of investment (ROI) by having the big data system, it is
driven by valuable AI use cases such as descriptive, predictive
and prescriptive analytics.
With on-premise BDA system, it provides high bandwidth
of transfer rate with more flexibility for accessing the system.
Nevertheless, it requires big capital outlay of investment with
high maintenance cost. Alternatively, big data in cloud
computing or hybrid cloud may be an alternative approach for
offering high availability that ranging from 99.9% to
99.99999%. Also, the promising support of expandability of
storage from gigabytes to petabytes [29]. However, there
are some native Hadoop options available in public clouds like
AWS, Google, Oracle, AliCloud and others. There may not
be the best suit for certain solutions for many applications, due
to the virtualization Hadoop performs slower workload for the
intensive application [30] [31]. Generally, all these
consideration needs a comprehensive requirements analysis
and budgeting cost.
Hadoop is one of BDA eco system, but it is not the only
the choice. Elasticsearch is the alternative BDA solution,
named ‘Elastic’. It is specialized for web search, network
traffics and log analysis. It based on Apache Lucene for low-
level indexing and analysis [32] [33]. NoSQL document-
oriented data stores is popular and on-demand nowadays,
MongoDB is one of widely used to provide durability with it
write-ahead logging techniques [34] [35]. Apache
Cassandra is one of the popular wide column-oriented enables
continuous availability, tremendous scale and data
distribution across multiple data centers and cloud availability
zones [36]. It has been deployed at certain technology giants,
such as Facebook, Netflix, Twitter, eBay and others.
Nevertheless, there are variety of choices for cloud computing
10
OpenIO detail refers to “https://www.openio.io”.
,(((RQIHUHQFHRQ2SHQ6VWHPV,26
10. 38
Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.
11. technologies; Google BigTable, Amazon S3 Object storage,
Azure Cosmos DB, AlibabaCloud, ApsaraDB and others.
AI Analytics is important to every aspect of the
organization because it can help ROI at every level. Those
implemented analytics use cases need to be built around the
issues that are really clear, and the problems that businesses
are having today, to improve efficiency, effectiveness, and
specific issues such as customer satisfaction [37]. PWC
reports that 59% of executives say big data at their company
would be improved through the uses of AI [38]. By
developing best practices for quick ROI and momentum of
scale, it is critical for developing AI models, reusable building
blocks of data sets and working across organizational
boundaries to drive more valuable AI use cases [39].
VI. CONCLUSION
Nowadays, data is the fuel of an organization’s vehicle to
drive the business transformation. We are also witnessing the
growth and important of the hidden value of data. Therefore,
this paper contributes to various important aspect for
exploring BDA concepts with “V”s, features model, and key
component architectural components with trade-offs. BDA is
now being one of the main pillars of industry revolution 4.0,
as data analytics with AI are playing the crucial algorithmic
roles in producing accurate results.
REFERENCES
[1] H. Asaadi, D. Khaldi, and B. Chapman, “A Comparative Survey of
the HPC and Big Data Paradigms: Analysis and Experiments,” in
2016 IEEE International Conference on Cluster Computing
(CLUSTER), 2016, pp. 423–432.
[2] J. Yang, “From Google File System to Omega: A Decade of
Advancement in Big Data Management at Google,” in 2015 IEEE
First International Conference on Big Data Computing Service and
Applications, 2015, pp. 249–255.
[3] B. Marr, Big Data in Practice. John Wiley Sons, Inc., 2016.
[4] F. W. Kistermann, “The Invention and Development of the Hollerith
Punched Card: In Commemoration of the 130th Anniversary of the
Birth of Herman Hollerith and for the 100th Anniversary of Large
Scale Data Processing,” Ann. Hist. Comput., vol. 13, no. 3, pp. 245–
259, Jul. 1991.
[5] E. F. Codd, “A Relational Model of Data for Large Shared Data
Banks,” Commun. ACM, vol. 13, no. 6, pp. 377–387, Jun. 1970.
[6] J. Peeters, “Early MRP Systems at Royal Philips Electronics in the
1960s and 1970s,” IEEE Ann. Hist. Comput., vol. 31, no. 2, pp. 56–
69, Apr. 2009.
[7] R. Brueckner, “Where Did Big Data Come From?,” insidebigdata,
2013. [Online]. Available:
https://insidebigdata.com/2013/02/03/where-did-big-data-come-
from/. [Accessed: 12-Aug-2019].
[8] M. Lesk, “How Much Information Is There In the World?”,” 1997.
[Online]. Available: http://www.lesk.com/mlesk/ksg97/ksg.html.
[Accessed: 12-Aug-2019].
[9] B. Stone, “The Education of Google’s Larry Page,” Bloomberg
Businessweek, Apr-2012.
[10] K. Ashton, “That Internet of Things,” RFID J., 2009.
[11] A. Petrillo, “Fourth Industrial Revolution: Current Practices,
Challenges, and Opportunities,” in Digital Transformation in Smart
Manufacturing, R. Cioffi and F. De Felice, Eds. Intechopen, 2018.
[12] D. Laney, “3D Data Management: Controlling Data Volume, Velocity
and Variety,” 2001.
[13] S. Yin and O. Kaynak, “Big Data for Modern Industry: Challenges
and Trends [Point of View],” Proc. IEEE, vol. 103, no. 2, pp. 143–
146, Feb. 2015.
[14] S. Jha, J. Qiu, A. Luckow, P. K. Mantha, and G. C. Fox, “A Tale of
Two Data-Intensive Paradigms: Applications, Abstractions, and
Architectures,” CoRR, vol. abs/1403.1, 2014.
[15] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad:
distributed data-parallel programs from sequential building blocks,”
ACM SIGOPS Oper. …, pp. 59–72, 2007.
[16] C. A. Salma, B. Tekinerdogan, and I. N. Athanasiadis, “Feature
Driven Survey of Big Data Systems,” in Proceedings of the
International Conference on Internet of Things and Big Data, 2016,
pp. 348–355.
[17] K. C. Kang, S. G. Cohen, J. A. Hess, W. E. Novak, and A. S. Peterso,
“Feature-Oriented Domain Analysis (FODA) Feasibility Study,”
Pittsburgh, 1990.
[18] N. Marz, “How to beat the CAP theorem,” 2011. [Online]. Available:
http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html.
[Accessed: 13-Aug-2019].
[19] N. Marz and J. Warren, Big Data: Principles and best practices of
scalable realtime data systems. Manning Publications, 2015.
[20] J. Kreps, “Questioning the Lambda Architecture,” O’Reilly Media,
2014. [Online]. Available:
https://www.oreilly.com/ideas/questioning-the-lambda-architecture.
[Accessed: 13-Aug-2019].
[21] A. Kumar, Architecting Data-Intensive Applications. Packt
Publishing, 2018.
[22] G. Vetticaden, “Building Secure and Governed Microservices with
Kafka Streams,” Cloudera, 2018. [Online]. Available:
https://blog.cloudera.com/building-secure-and-governed-
microservices-with-kafka-streams/. [Accessed: 12-Aug-2019].
[23] J. Garrett, Data Analytics for IT Networks: Developing Innovative Use
Cases. Cisco Press, 2018.
[24] J. Davis, T. Edgar, J. Porter, J. Bernaden, and M. Sarli, “Smart
manufacturing, manufacturing intelligence and demand-dynamic
performance,” Comput. Chem. Eng., vol. 47, pp. 145–156, 2012.
[25] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep
Learning for IoT Big Data and Streaming Analytics: A Survey,” IEEE
Commun. Surv. Tutorials, vol. 20, no. 4, pp. 2923–2960, 2018.
[26] Z. Li and H. Shen, “Measuring Scale-Up and Scale-Out Hadoop with
Remote and Local File Systems and Selecting the Best Platform,”
IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 11, pp. 3201–3214,
Nov. 2017.
[27] T. Kalil, “The White House Office of Science and Technology Policy:
Big Data is a Big Deal,” Office of Science and Technology Policy
(OSTP) Blog, 2012. [Online]. Available:
https://obamawhitehouse.archives.gov/blog/2012/03/29/big-data-big-
deal. [Accessed: 27-Aug-2019].
[28] “NIST Big Data Interoperability Framework: volume 8, reference
architecture interfaces,” Gaithersburg, MD, Jun. 2018.
[29] A. Zarrabi, E. K. Karuppiah, C. H. Ngo, K. K. Yong, and S. See,
“Gravitational Search Algorithm using CUDA,” in IEEE Parallel and
Distributed Computing, Applications and Technologies, PDCAT 2014,
2014, pp. 193–198.
[30] D. Nuñez, I. Agudo, and J. Lopez, “Delegated Access for Hadoop
Clusters in the Cloud,” in 2014 IEEE 6th International Conference on
Cloud Computing Technology and Science, 2014, pp. 374–379.
[31] M. E. Wendt, “Cloud-based Hadoop Deployments: Benefits and
Considerations,” 2014.
[32] J. Rosenberg, J. B. Coronel, J. Meiring, S. Gray, and T. Brown,
“Leveraging Elasticsearch to Improve Data Discoverability in Science
Gateways,” in Proceedings of the Practice and Experience in
Advanced Research Computing on Rise of the Machines (Learning),
2019, pp. 19:1--19:5.
[33] B. Dageville et al., “The Snowflake Elastic Data Warehouse,” in
Proceedings of the 2016 International Conference on Management of
Data, 2016, pp. 215–226.
[34] R. R. Shetty, A. M. Dissanayaka, S. Mengel, L. Gittner, R. Vadapalli,
and H. Khan, “Secure NoSQL Based Medical Data Processing and
Retrieval: The Exposome Project,” in Companion Proceedings of
the10th International Conference on Utility and Cloud Computing,
2017, pp. 99–105.
[35] B. Sendir, M. Govindaraju, R. Odaira, and P. Hofstee, “Low Latency
and High Throughput Write-Ahead Logging Using CAPI-Flash,”
IEEE Trans. Cloud Comput., p. 1, 2019.
[36] A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured
Storage System,” SIGOPS Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40,
Apr. 2010.
[37] S. Earley, “Executive Roundtable Series: Driving Higher ROI and
Organizational Change,” IT Prof., vol. 17, no. 6, pp. 60–64, Nov. 2015.
[38] “2018 AI predictions: 8 insights to shape business strategy,” PwC AS,
2018.
[39] “2019 AI Predictions: Six AI priorities you can’t afford to ignore,”
PwC AS, 2019.
,(((RQIHUHQFHRQ2SHQ6VWHPV,26
12. 39
Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.