This document discusses running a Spark cluster according to need. It defines big data and cloud computing, noting that big data refers to large or complex datasets that outgrow traditional data processing capabilities. Cloud computing is defined as enabling ubiquitous, convenient access to configurable computing resources over a network. The document suggests combining big data and cloud computing, and provides a demonstration.
5. BIG DATA
"BIG DATA IS A TERM FOR THAT ARE SO LARGE OR COMPLEX
THAT TRADITIONAL IS
INADEQUATE TO DEAL WITH THEM. CHALLENGES INCLUDE ,
, , , SEARCH, , ,
, , UPDATING AND ." -
WIKIPEDIA
DATA SETS
DATA PROCESSING APPLICATION SOFTWARE
CAPTURE
STORAGE ANALYSIS DATA CURATION SHARING TRANSFER
VISUALIZATION QUERYING INFORMATION PRIVACY
6.
7. CLOUD COMPUTING
"CLOUD COMPUTING IS A MODEL FOR ENABLING UBIQUITOUS,
CONVENIENT, ON-DEMAND NETWORK ACCESS TO A SHARED POOL OF
CONFIGURABLE COMPUTING RESOURCES (E.G., NETWORKS, SERVERS,
STORAGE, APPLICATIONS, AND SERVICES) THAT CAN BE RAPIDLY
PROVISIONED AND RELEASED WITH MINIMAL MANAGEMENT EFFORT OR
SERVICE PROVIDER INTERACTION. THIS CLOUD MODEL IS COMPOSED OF
FIVE ESSENTIAL CHARACTERISTICS, THREE SERVICE MODELS, AND FOUR
DEPLOYMENT MODELS. " - NIST DEFINITION FOR CLOUD COMPUTING