SlideShare una empresa de Scribd logo
1 de 50
The Big Data Cloud:
Are You Ready for the Zettabyte?

Steven C. Markey, MSIS, PMP, CISSP, CIPP, CISM, CISA, STS-EV, CCSK, CompTIA Cloud
                                    Essentials
                         Principal, nControl, LLC
                            Adjunct Professor
President, Cloud Security Alliance – Delaware Valley Chapter (CSA-DelVal)
Big Data Cloud

• Presentation Overview
  – Why Should You Care?
  – Cloud Overview
  – Big Data Overview
  – Cloud-Based Big Data Offerings
  – Securing Cloud-Based DB Solutions
Big Data Cloud
• Why Should You Care
  – Organizational Cost Reduction Requirements
     • Justify Investments
     • Improve Efficiencies (Productivity, Time to Market)
  – Digital Information – 60%~ Annual Growth Rate (AGR)
  – Data Storage – 15-20% AGR Capital Expense (CapEx)
  – Categorization, Classification & Retention Magnify
     • Compliance, Legal & Privacy Regulations
  – Prevalent & Interconnected Business Ecosystems
     •   Supply Chains
     •   Business Process Outsourcers (BPO)
     •   Information Technology Outsourcers (ITO)
     •   Vendor’s Vendors                                    Source: IDC
Source: NIST
Service Delivery Models




                     Source: Swain Techs
Source: Matthew Gardiner, Computer Associates
Big Data Cloud




                 Source: Flickr
Big Data Cloud
• Big Data Overview
  – Aggregated Data from the Following Sources
     • Traditional
     • Source
     • Social
Big Data Cloud
• Traditional Data
  – Database Management Systems
     •   Relational Database Management Systems (RDBMS)
     •   Object-Oriented Database Management Systems (OODBMS)
     •   Non-Relational, Distributed DB Management Systems (NRDBMS)
     •   Mobile Databases (SQLite, Oracle Lite)
  – Online Transaction Processing (OLTP)
     • Real-Time Data Warehousing
  – Online Analytical Processing (OLAP)
     • Operational Data Stores (ODS)
     • Enterprise Data Warehouse (EDW)
Big Data Cloud
• Traditional Data
  – OLAP
     • Business Intelligence (BI)
        – Data Mining
        – Reporting
        – OLAP (Continued)
            » Relational OLAP (ROLAP)
            » Multi-Dimensional OLAP (MOLAP)
            » Hybrid OLAP (HOLAP)


     OLTPODSEDW (Data Marts)BI (Data Mining)
     OLTPODSEDW (Data Marts)BI (Reporting)
     OLTPODSEDW (Data Marts)BI (OLAP)
Big Data Cloud




                 Source: Flickr
Big Data Cloud
• Source Data
  – Log Files
     • Event Logs / Operating System (OS) - Level
     • Appliance / Peripherals
     • Analyzers / Sniffers
  – Multimedia
     • Image Logs
     • Video Logs
  – Web Content Management (WCM)
     • Web Logs
     • Search Engine Optimization (SEO)
        – Web Metadata
Big Data Cloud
• Big Data Overview
  – Aggregators
     • Mostly NRDBMS Implemtations
        – Not only – Structured Query Language (NoSQL)
     • NRDBMS Examples
        – Column Family Stores: BigTable (Google), Cassandra & HBase (Apache)
        – Key-Values Stores: App Engine DataStore (Google), DynamoDB &
          SimpleDB (AWS)
        – Document Databases: CouchDB, MongoDB
        – Graph Databases: Neo4J
Big Data Cloud
• Big Data Overview
  – Serial Processing
     • Hadoop
        – Hadoop Distributed File System (HDFS)
        – Hive – DW
        – Pig – Querying Language
     • Riak
  – Parallel Processing
     • HadoopDB
  – Analytics
     • Google MapReduce
     • Apache MapReduce
     • Splunk (for Security Information / Event Management [SIEM])
Source: Cloudera
Source: Wikispaces
Source: Google
Source: Cloudera
Big Data Cloud
• Cloud-Based Big Data Solutions
  – PaaS
     • DBaaS
        – Amazon Web Services (AWS)
            » DynamoDB
            » SimpleDB
            » Relational Database Service (RDS): Oracle 11g / MySQL
        – Google App Engine
            » Datastore
        – Microsoft SQL Azure
        – Oracle Public Cloud: 11g
     • Processing
        –   AWS Elastic MapReduce (EMR)
        –   Google App Engine MapReduce: Mapper API
        –   Microsoft: Apache Hadoop for Azure
        –   IBM SmartCloud Enterprise on IBM InfoSphere BigInsights Basics
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
• Cloud-Based Database Solutions
  – IaaS
     • Basic Components: Compute & Storage Nodes
           –   AWS Elastic Compute Cloud (EC2)
           –   AWS Elastic Block Store (EBS)
           –   OpenStack Compute (Nova)
           –   OpenStack Storage (Swift)
     • Advanced Components
           – Apache Hadoop
           – Apache Hadoop MapReduce
     • Commercial Applications
           –   Cloudera
           –   DataStax
           –   MapR
           –   Splunk
Big Data Cloud
                         AWS Cloud

      EC2 Availability Zone           S3 Storage



EBS   EBS   EBS    EBS   EBS   EBS
                                     EBS Snapshot

                                     EBS Snapshot

                                     EBS Snapshot

EC2          EC2         EC2         EBS Snapshot

                                     EBS Snapshot




                          Internet

                                                    Source: Amazon
Big Data Cloud
• Big Data in the Cloud Use Cases
  – Public Cloud
     •   AWS: EC2 Hadoop & S3
     •   AWS: EC2 Hadoop, DynamoDB & EMR
     •   AWS: EC2 Linux, Apache (w / Tomcat), DynamoDB & EMR
     •   AWS: EC2 Cloudera Hadoop & EMR
     •   AWS: EC2 Splunk
  – Hybrid
     • Oracle Big Data Appliance & Connector, Google App Engine
     • OpenStack Swift, AWS EC2 Cloudera Hadoop & EMR
  – Private Cloud
     • OpenStack Nova & Swift, Apache Hadoop
     • OpenStack Nova & Swift, Cloudera Hadoop
Big Data Cloud
Source: Flickr
Big Data Cloud
• Securing Cloud-Based NRDBMS Solutions
  – General
     • Focus on Application / Middleware-Level Security
        – SQL Injections Are Still Possible
        – Leverage Application IAM for NRDBMS User Rights Mgmt (URM)
        – Leverage Application & System Logging for Authentication,
          Authorization & Accounting (AAA)
     • Segregation of Duties
        – Read / Write Namespaces
        – Read-Only Namespaces
  – Specific
     • Document
        – Consistency Assurance
     • Key / Value
        – Ensure Referential Integrity
Big Data Cloud
Big Data Cloud
• Securing Big Data in the Cloud
  – Identity & Access Management (IAM)
     • Security Assertion Markup Language (SAML)
     • Representational State Transfer (REST)
        – AWS IAM
        – Windows Azure Access Control Service (ACS)
     • Web Services – Trust Language (WS-Trust)
Source: OASIS
Source: Intuit
Big Data Cloud
• Securing Big Data in the Cloud
  – Identity & Access Management (IAM)
     • Security Assertion Markup Language (SAML)
     • Representational State Transfer (REST)
        – AWS IAM
        – Windows Azure Access Control Service (ACS)
     • Web Services – Trust Language (WS-Trust)
Source: Apache
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
• Securing Big Data in the Cloud
  – Identity & Access Management (IAM)
     • Security Assertion Markup Language (SAML)
     • Representational State Transfer (REST)
        – AWS IAM
        – Windows Azure Access Control Service (ACS)
     • Web Services – Trust Language (WS-Trust)
Big Data Cloud
Big Data Cloud
• Securing Big Data in the Cloud
  – Electronic Discovery (eDiscovery)
     • eDiscovery Reference Model (EDRM)
     • Legal Holds
     • Litigation Response
  – Records & Information Management (RIM)
     •   Generally Accepted Recordkeeping Principles (GARP®)
     •   Information Governance Reference Model (IGRM)
     •   Information Lifecycle Management (ILM)
     •   MIKE2.0
Big Data Cloud
Big Data Cloud
• Privacy & Data Protection for Big Data Clouds
  – Jurisdictions*
     • Regional: EU DPA
     • National: PIPEDA, GLBA, HIPAA / HITECH, COPPA, Safe Harbor
     • Statutory: Bavarian, CA SB 1386 / 24, MA 201 CMR 17, NV SB 227
  – Data Flow & Jurisdictional Adherence
     • Data Sharing with Third Parties
         – Pseudonymization / De-Identification
     • Consent & Notices
  – Contract Clauses
     • Model Contracts
  – Privacy Best Practices
     • Generally Accepted Privacy Principles (GAPP)            * Not all inclusive.
Big Data Cloud
• Presentation Take-Aways
  – Big Data in the Cloud is Here to Stay
  – It Has to be Secure
      – Segregation of Data
      – Access Controls
         – Separation / Segregation of Duties
         – Federated Identities
         – Logging
• Questions?
• Contact
  –   Email: steve@ncontrol-llc.com
  –   Twitter: markes1
  –   LI: http://www.linkedin.com/in/smarkey
  –   CSA-DelVal: http://www.csadelval.org/

Más contenido relacionado

La actualidad más candente

Cloud Capacity Management
Cloud Capacity ManagementCloud Capacity Management
Cloud Capacity Management
Precisely
 
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS CorpAWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
Amazon Web Services
 
Leaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for CustomersLeaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for Customers
OpSource
 
Cloud Computing in Business and facts
Cloud Computing in Business and factsCloud Computing in Business and facts
Cloud Computing in Business and facts
Arun Ganesh
 

La actualidad más candente (20)

Cloud Capacity Management
Cloud Capacity ManagementCloud Capacity Management
Cloud Capacity Management
 
Running Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
Running Microsoft SharePoint On AWS - Smartronix and AWS - WebinarRunning Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
Running Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
 
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS CorpAWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
 
Leaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for CustomersLeaders in the Cloud: Identifying Cloud Business Value for Customers
Leaders in the Cloud: Identifying Cloud Business Value for Customers
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
 
Cloud Computing in Business and facts
Cloud Computing in Business and factsCloud Computing in Business and facts
Cloud Computing in Business and facts
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data Analytics
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Microsof azure class 1- intro
Microsof azure   class 1- introMicrosof azure   class 1- intro
Microsof azure class 1- intro
 
Embracing Cloud in a Traditional Data Center
Embracing Cloud in a Traditional Data CenterEmbracing Cloud in a Traditional Data Center
Embracing Cloud in a Traditional Data Center
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big data
 
Cloud Computing 101 Issue 1 (Sample)
Cloud Computing 101 Issue 1  (Sample)Cloud Computing 101 Issue 1  (Sample)
Cloud Computing 101 Issue 1 (Sample)
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The Cloud
 
Cloud Computing for Enterprise Architects
Cloud Computing for Enterprise ArchitectsCloud Computing for Enterprise Architects
Cloud Computing for Enterprise Architects
 
Cloud Computing Services And The Data Center
Cloud Computing Services And The Data CenterCloud Computing Services And The Data Center
Cloud Computing Services And The Data Center
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 

Similar a Bd cloud v3

Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1
Steve Markey
 
Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12
Steve Markey
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
Lynn Langit
 

Similar a Bd cloud v3 (20)

Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Lecture1
Lecture1Lecture1
Lecture1
 
Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
O'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The Cloud
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Bd cloud v3

  • 1. The Big Data Cloud: Are You Ready for the Zettabyte? Steven C. Markey, MSIS, PMP, CISSP, CIPP, CISM, CISA, STS-EV, CCSK, CompTIA Cloud Essentials Principal, nControl, LLC Adjunct Professor President, Cloud Security Alliance – Delaware Valley Chapter (CSA-DelVal)
  • 2. Big Data Cloud • Presentation Overview – Why Should You Care? – Cloud Overview – Big Data Overview – Cloud-Based Big Data Offerings – Securing Cloud-Based DB Solutions
  • 3. Big Data Cloud • Why Should You Care – Organizational Cost Reduction Requirements • Justify Investments • Improve Efficiencies (Productivity, Time to Market) – Digital Information – 60%~ Annual Growth Rate (AGR) – Data Storage – 15-20% AGR Capital Expense (CapEx) – Categorization, Classification & Retention Magnify • Compliance, Legal & Privacy Regulations – Prevalent & Interconnected Business Ecosystems • Supply Chains • Business Process Outsourcers (BPO) • Information Technology Outsourcers (ITO) • Vendor’s Vendors Source: IDC
  • 5. Service Delivery Models Source: Swain Techs
  • 6. Source: Matthew Gardiner, Computer Associates
  • 7. Big Data Cloud Source: Flickr
  • 8. Big Data Cloud • Big Data Overview – Aggregated Data from the Following Sources • Traditional • Source • Social
  • 9. Big Data Cloud • Traditional Data – Database Management Systems • Relational Database Management Systems (RDBMS) • Object-Oriented Database Management Systems (OODBMS) • Non-Relational, Distributed DB Management Systems (NRDBMS) • Mobile Databases (SQLite, Oracle Lite) – Online Transaction Processing (OLTP) • Real-Time Data Warehousing – Online Analytical Processing (OLAP) • Operational Data Stores (ODS) • Enterprise Data Warehouse (EDW)
  • 10. Big Data Cloud • Traditional Data – OLAP • Business Intelligence (BI) – Data Mining – Reporting – OLAP (Continued) » Relational OLAP (ROLAP) » Multi-Dimensional OLAP (MOLAP) » Hybrid OLAP (HOLAP) OLTPODSEDW (Data Marts)BI (Data Mining) OLTPODSEDW (Data Marts)BI (Reporting) OLTPODSEDW (Data Marts)BI (OLAP)
  • 11. Big Data Cloud Source: Flickr
  • 12. Big Data Cloud • Source Data – Log Files • Event Logs / Operating System (OS) - Level • Appliance / Peripherals • Analyzers / Sniffers – Multimedia • Image Logs • Video Logs – Web Content Management (WCM) • Web Logs • Search Engine Optimization (SEO) – Web Metadata
  • 13.
  • 14. Big Data Cloud • Big Data Overview – Aggregators • Mostly NRDBMS Implemtations – Not only – Structured Query Language (NoSQL) • NRDBMS Examples – Column Family Stores: BigTable (Google), Cassandra & HBase (Apache) – Key-Values Stores: App Engine DataStore (Google), DynamoDB & SimpleDB (AWS) – Document Databases: CouchDB, MongoDB – Graph Databases: Neo4J
  • 15. Big Data Cloud • Big Data Overview – Serial Processing • Hadoop – Hadoop Distributed File System (HDFS) – Hive – DW – Pig – Querying Language • Riak – Parallel Processing • HadoopDB – Analytics • Google MapReduce • Apache MapReduce • Splunk (for Security Information / Event Management [SIEM])
  • 20. Big Data Cloud • Cloud-Based Big Data Solutions – PaaS • DBaaS – Amazon Web Services (AWS) » DynamoDB » SimpleDB » Relational Database Service (RDS): Oracle 11g / MySQL – Google App Engine » Datastore – Microsoft SQL Azure – Oracle Public Cloud: 11g • Processing – AWS Elastic MapReduce (EMR) – Google App Engine MapReduce: Mapper API – Microsoft: Apache Hadoop for Azure – IBM SmartCloud Enterprise on IBM InfoSphere BigInsights Basics
  • 26.
  • 27.
  • 28. Big Data Cloud • Cloud-Based Database Solutions – IaaS • Basic Components: Compute & Storage Nodes – AWS Elastic Compute Cloud (EC2) – AWS Elastic Block Store (EBS) – OpenStack Compute (Nova) – OpenStack Storage (Swift) • Advanced Components – Apache Hadoop – Apache Hadoop MapReduce • Commercial Applications – Cloudera – DataStax – MapR – Splunk
  • 29. Big Data Cloud AWS Cloud EC2 Availability Zone S3 Storage EBS EBS EBS EBS EBS EBS EBS Snapshot EBS Snapshot EBS Snapshot EC2 EC2 EC2 EBS Snapshot EBS Snapshot Internet Source: Amazon
  • 30.
  • 31. Big Data Cloud • Big Data in the Cloud Use Cases – Public Cloud • AWS: EC2 Hadoop & S3 • AWS: EC2 Hadoop, DynamoDB & EMR • AWS: EC2 Linux, Apache (w / Tomcat), DynamoDB & EMR • AWS: EC2 Cloudera Hadoop & EMR • AWS: EC2 Splunk – Hybrid • Oracle Big Data Appliance & Connector, Google App Engine • OpenStack Swift, AWS EC2 Cloudera Hadoop & EMR – Private Cloud • OpenStack Nova & Swift, Apache Hadoop • OpenStack Nova & Swift, Cloudera Hadoop
  • 34. Big Data Cloud • Securing Cloud-Based NRDBMS Solutions – General • Focus on Application / Middleware-Level Security – SQL Injections Are Still Possible – Leverage Application IAM for NRDBMS User Rights Mgmt (URM) – Leverage Application & System Logging for Authentication, Authorization & Accounting (AAA) • Segregation of Duties – Read / Write Namespaces – Read-Only Namespaces – Specific • Document – Consistency Assurance • Key / Value – Ensure Referential Integrity
  • 36. Big Data Cloud • Securing Big Data in the Cloud – Identity & Access Management (IAM) • Security Assertion Markup Language (SAML) • Representational State Transfer (REST) – AWS IAM – Windows Azure Access Control Service (ACS) • Web Services – Trust Language (WS-Trust)
  • 39. Big Data Cloud • Securing Big Data in the Cloud – Identity & Access Management (IAM) • Security Assertion Markup Language (SAML) • Representational State Transfer (REST) – AWS IAM – Windows Azure Access Control Service (ACS) • Web Services – Trust Language (WS-Trust)
  • 44. Big Data Cloud • Securing Big Data in the Cloud – Identity & Access Management (IAM) • Security Assertion Markup Language (SAML) • Representational State Transfer (REST) – AWS IAM – Windows Azure Access Control Service (ACS) • Web Services – Trust Language (WS-Trust)
  • 46. Big Data Cloud • Securing Big Data in the Cloud – Electronic Discovery (eDiscovery) • eDiscovery Reference Model (EDRM) • Legal Holds • Litigation Response – Records & Information Management (RIM) • Generally Accepted Recordkeeping Principles (GARP®) • Information Governance Reference Model (IGRM) • Information Lifecycle Management (ILM) • MIKE2.0
  • 48. Big Data Cloud • Privacy & Data Protection for Big Data Clouds – Jurisdictions* • Regional: EU DPA • National: PIPEDA, GLBA, HIPAA / HITECH, COPPA, Safe Harbor • Statutory: Bavarian, CA SB 1386 / 24, MA 201 CMR 17, NV SB 227 – Data Flow & Jurisdictional Adherence • Data Sharing with Third Parties – Pseudonymization / De-Identification • Consent & Notices – Contract Clauses • Model Contracts – Privacy Best Practices • Generally Accepted Privacy Principles (GAPP) * Not all inclusive.
  • 49. Big Data Cloud • Presentation Take-Aways – Big Data in the Cloud is Here to Stay – It Has to be Secure – Segregation of Data – Access Controls – Separation / Segregation of Duties – Federated Identities – Logging
  • 50. • Questions? • Contact – Email: steve@ncontrol-llc.com – Twitter: markes1 – LI: http://www.linkedin.com/in/smarkey – CSA-DelVal: http://www.csadelval.org/

Notas del editor

  1. http://qugstart.com/blog/amazon-web-services/how-to-set-up-db-server-on-amazon-ec2-with-data-stored-on-ebs-drive-formatted-with-xfs/ Here’s the procedure I decided on. It involves symlinking Mysql config files and data directories onto the EBS volume. Another trick I used because I needed to migrate about 20 GiB’s of data to get started, was that I initially set up an “X-tra large” instance, with 10 GiB’s RAM to handle the data import. After the data was migrated and imported to my database, I simply terminated my X-Large instance and spun up a small instance connected to the same EBS volume! All the databases were preserved nicely and I did not have to waste money paying for an X-Large instance anymore. This exemplifies the value of thinking in the “cloud” mindset – where you can spin up and down servers in a matter of seconds! Hope this article helps someone else out there!
  2. http://qugstart.com/blog/amazon-web-services/how-to-set-up-db-server-on-amazon-ec2-with-data-stored-on-ebs-drive-formatted-with-xfs/ Here’s the procedure I decided on. It involves symlinking Mysql config files and data directories onto the EBS volume. Another trick I used because I needed to migrate about 20 GiB’s of data to get started, was that I initially set up an “X-tra large” instance, with 10 GiB’s RAM to handle the data import. After the data was migrated and imported to my database, I simply terminated my X-Large instance and spun up a small instance connected to the same EBS volume! All the databases were preserved nicely and I did not have to waste money paying for an X-Large instance anymore. This exemplifies the value of thinking in the “cloud” mindset – where you can spin up and down servers in a matter of seconds! Hope this article helps someone else out there!
  3. realm