SlideShare una empresa de Scribd logo
1 de 18
Agenda
• Quick Poll
• Overview – AIBDP / Big Data Connection
• Prasad Mavuduri – Board Member, AIBDP –
“Demystifying Big Data”
• David Sonnenschein – Vice President & Aleks
Swerdlow Community Manager – SAP Labs -
HANA In-Memory – Start-ups Success Stories”
• Networking & Q&A
Welcome
• Thank you: Francis - Silicon Valley Strategy,
Innovation and Product Management group
• Thank you: Michael & Sam and the Microsoft
Store
• Thank you: Aleks & David & SAP HANA
• Thank you: All of You… You are the ‘Secret
Sauce’
Quick Poll
• Relationship & Experience w/ Big Data
• Job Role
• Industry
• Company Years - Start-up?
• Big Data Implementation Status
• Biggest Challenges / Opportunities
– – Ask the right question…
• Vs Competitors?
Overview - Big Data Connections
Mission: Demystify Big Data
– Five E’s – entertain, engage, educate etc
– Focus on Solutions (vs technology)
– Focus on Specific Verticals
• ex Healthcare, Risk, eCom/eMarketing,
Manufacturing, Logistics, Telecom…)
– Best Practices Case Study Reviews
– Networking & Shared Learning
– Sponsored by the American Institute of
Big Data Professionals (AIBDP.org)
– Sponsored by Big Data consulting firm,
Data-Magnum
BI Platform / Reporting
OSS
Visualizations
Unstructured/ Search
Indexing / Metadata
Search
NLP
Hadoop Analytics
Hadoop Dev Platforms / Automation
HDFS
Predictive Analytics
THE CONFUSING WORLD OF BIG DATAAPPLICATIONSTOOLSDATAMANAGEMENT
STRUCTURED UNSTRUCTURED
Transactional
DB
OSS
High Performance
Analytical DB
NewSQL
Enhancement
Distributed
NoSQL
Graph Document
Key Value /
Column
Enterprise
Apps
Internet
Apps
Social Media Web Content Mobile Devices Camera / DVR Sensors / RFID Logfiles
Hadoop
aaS
HDFS Alternatives
DBaaS
HANA
GraphDB
Filesystem
EMR
Text / Sentiment Analysis
Data as a Service
Data
Warehouses
vFabric L
Drill
Vertical Market Applications
Impala
Messaging Optimization Data Integration / CEP
OSS
IMDG
Redshift
Based on Source: Perella Weinberg Partners
AI
Source:
Source: CapGemini: http://www.capgemini.com/sites/default/files/technology-blog/files/2012/09/big-data-vendors.jpg
Big Data Landscape
http://www.bigdatalandscape.com/
Source: http://www.forbes.com/special-report/2013/industry-atlas.html
Business Intelligence Analytics / Visualization
Big Data BI & Analytics/Visualization Landscape
Oracle Essbase Laurén
Predictive Analytics Leaders
Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond
AH.. Simplicity… This looks pretty straight-forward… I can handle this..
Our Landscape Collection as published on Startup50.com
Simplified (so far)
 Data Input - Sources, Databases, & Integration
Tools
 Platform / Infrastructure - Data Preparation,
map reduce, filing, governance…
 Data Presentation & Analysis – BI, Data
Discovery, Visualization
 Predictive Analytics & Machine Learning
 Vertical & Horizontal Products (Specialized
Applications)
It can be made more complicated…
o Hadoop
o NoSQL
o NewSQL
o Structured Databases
o NGDW (next generation data warehouse)
o Cloud Services
o Technical Services
o Professional Services
o Distributors
o Deployment services
o Deployment stack/appliances
o Development services
o Application stacks
o Database stacks
o Managed Monitoring
o Storage
o Security
Example Optimized Marketing

Más contenido relacionado

La actualidad más candente

Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
Raghu Kashyap
 
SAS Presentation
SAS PresentationSAS Presentation
SAS Presentation
Kali Howard
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
Raghu Kashyap
 
Geek Sync - Cloud Considerations
Geek Sync - Cloud ConsiderationsGeek Sync - Cloud Considerations
Geek Sync - Cloud Considerations
IDERA Software
 

La actualidad más candente (20)

BI and Predictive analytics 2011 shyam desigan presentation
BI and Predictive analytics 2011 shyam desigan presentationBI and Predictive analytics 2011 shyam desigan presentation
BI and Predictive analytics 2011 shyam desigan presentation
 
DataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupDataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetup
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
 
SAS Visual Analytics Overview
SAS Visual Analytics OverviewSAS Visual Analytics Overview
SAS Visual Analytics Overview
 
DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting
 
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
 
SAS Presentation
SAS PresentationSAS Presentation
SAS Presentation
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda
 
UCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a VillageUCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a Village
 
Metadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approachMetadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approach
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and Summary
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017
 
Geek Sync - Cloud Considerations
Geek Sync - Cloud ConsiderationsGeek Sync - Cloud Considerations
Geek Sync - Cloud Considerations
 
Earley Executive Roundtable Summary - Data Analytics
Earley Executive Roundtable Summary - Data AnalyticsEarley Executive Roundtable Summary - Data Analytics
Earley Executive Roundtable Summary - Data Analytics
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongThe Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analytics
 

Similar a Big Data Connection presents: Big Data: Cause of Confusion

Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Big data tim
Big data timBig data tim
Big data tim
T Weir
 

Similar a Big Data Connection presents: Big Data: Cause of Confusion (20)

BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
 
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdp
 
Delivering Value Through Business Analytics
Delivering Value Through Business AnalyticsDelivering Value Through Business Analytics
Delivering Value Through Business Analytics
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big data tim
Big data timBig data tim
Big data tim
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Big Data Connection presents: Big Data: Cause of Confusion

  • 1. Agenda • Quick Poll • Overview – AIBDP / Big Data Connection • Prasad Mavuduri – Board Member, AIBDP – “Demystifying Big Data” • David Sonnenschein – Vice President & Aleks Swerdlow Community Manager – SAP Labs - HANA In-Memory – Start-ups Success Stories” • Networking & Q&A
  • 2. Welcome • Thank you: Francis - Silicon Valley Strategy, Innovation and Product Management group • Thank you: Michael & Sam and the Microsoft Store • Thank you: Aleks & David & SAP HANA • Thank you: All of You… You are the ‘Secret Sauce’
  • 3. Quick Poll • Relationship & Experience w/ Big Data • Job Role • Industry • Company Years - Start-up? • Big Data Implementation Status • Biggest Challenges / Opportunities – – Ask the right question… • Vs Competitors?
  • 4. Overview - Big Data Connections Mission: Demystify Big Data – Five E’s – entertain, engage, educate etc – Focus on Solutions (vs technology) – Focus on Specific Verticals • ex Healthcare, Risk, eCom/eMarketing, Manufacturing, Logistics, Telecom…) – Best Practices Case Study Reviews – Networking & Shared Learning – Sponsored by the American Institute of Big Data Professionals (AIBDP.org) – Sponsored by Big Data consulting firm, Data-Magnum
  • 5. BI Platform / Reporting OSS Visualizations Unstructured/ Search Indexing / Metadata Search NLP Hadoop Analytics Hadoop Dev Platforms / Automation HDFS Predictive Analytics THE CONFUSING WORLD OF BIG DATAAPPLICATIONSTOOLSDATAMANAGEMENT STRUCTURED UNSTRUCTURED Transactional DB OSS High Performance Analytical DB NewSQL Enhancement Distributed NoSQL Graph Document Key Value / Column Enterprise Apps Internet Apps Social Media Web Content Mobile Devices Camera / DVR Sensors / RFID Logfiles Hadoop aaS HDFS Alternatives DBaaS HANA GraphDB Filesystem EMR Text / Sentiment Analysis Data as a Service Data Warehouses vFabric L Drill Vertical Market Applications Impala Messaging Optimization Data Integration / CEP OSS IMDG Redshift Based on Source: Perella Weinberg Partners AI
  • 7.
  • 9.
  • 12. Business Intelligence Analytics / Visualization Big Data BI & Analytics/Visualization Landscape Oracle Essbase Laurén
  • 15. Our Landscape Collection as published on Startup50.com
  • 16. Simplified (so far)  Data Input - Sources, Databases, & Integration Tools  Platform / Infrastructure - Data Preparation, map reduce, filing, governance…  Data Presentation & Analysis – BI, Data Discovery, Visualization  Predictive Analytics & Machine Learning  Vertical & Horizontal Products (Specialized Applications)
  • 17. It can be made more complicated… o Hadoop o NoSQL o NewSQL o Structured Databases o NGDW (next generation data warehouse) o Cloud Services o Technical Services o Professional Services o Distributors o Deployment services o Deployment stack/appliances o Development services o Application stacks o Database stacks o Managed Monitoring o Storage o Security

Notas del editor

  1. Source: sqrll:To simplify the NoSQL world, lets take a look at the top 3 databases in terms of current popularity and how they compare to Apache Accumulo, which is at the core of our product, Sqrrl Enterprise.MongoDB:  It is a wonderfully easy-to-use document store that many select as a flexible replacement for a SQL database, as it (like all NoSQL databases) does not require pre-defined schemas.   However, MongoDB has difficulty scaling to very large datasets (e.g., 100+ TB) and does not natively work with your Hadoop cluster.  It also does not possess fine-grained security controls.Cassandra:  This is an excellent choice if your data is too big for MongoDB and you require multi-datacenter replication.  Although Cassandra was not originally designed to run natively on your Hadoop cluster, it now has integrations with MapReduce, Pig, and Hive.  It does not possess fine-grained security controls.HBase:  HBase natively integrates with Hadoop, and it can handle very large datasets.  However, it does not have fine-grained security controls. Accumulo:  Accumulo has an architecture most similar to HBase, which allows it also to natively plug into your Hadoop cluster.  It is far more scalable than MongoDB, and with reported cluster sizes in the multiple thousands within the Intelligence Community it is also significantly more scalable than HBase and Cassandra.  Accumulo is the only NoSQL database with cell-level security capabilities.  Accumulo also has other features that could lead one to choose it over HBase or Cassandra for reasons other than security or scalability.  For example, Accumulo has a powerful server-side programming mechanism called Iterators, which provide it with the capability to do a variety of real-time aggregations and analytics.These high level differences between MongoDB, Cassandra, HBase, and Accumulo are summarized in the decision tree diagram below.  Of course, there are a wide variety of more detailed technical differences that will be explored in greater detail in a later post.  This decision tree can be summarized with a few simple statements:If you need a quick, simple solution and have “small” Big Data (e.g., a few dozen terabytes), MongoDB may be the answer.If you need cell-level security or multi-petabyte scalability, Accumulo is the right answer.If you have data that is too big for MongoDB and don’t need cell-level security or massive scalability, we would recommend testing HBase, Cassandra, and Accumulo for your specific workloads.  Each has their own nuanced advantages and disadvantages.If you don’t need real-time analytics, you are probably on the wrong decision tree and can stick with the Hadoop Distributed File System and batch analytics. It is worth noting that the NoSQL databases above are all open source databases.  Sqrrl Enterprise builds upon Accumulo and adds a number of additional features to Accumulo including streaming ingest, JSON, encryption, identity management integrations, full-text search, SQL queries, graph search, and statistics.  We believe that these features set Sqrrl Enterprise apart from other Big Data platforms.
  2. http://www.capgemini.com/blog/capping-it-off/2012/09/big-data-vendors-and-technologies-the-listBig Data Vendors and TechnologiesData Acquisition stream - technological providers Ab InitioHPIBM (Datastage, Streams, Data mirror)Informatica (PowerCenter, PowerExchange, CEP)KalidoMicrosoftNumentaOracleSAPSASSplunkSyncsortTalendTibcoData ProvidersComScoreDatasiftExperianFactualGfKGnipIMSInrixKaggleKnoemaLexisNexisMicrosoft (with their Windows Azure Marketplace data market)NielsenReutersSalesforce Radian6Symphony IRIsocial network websites like Facebook, Google+, LinkedIn, Tumblr, Twitter or Viadeoall the Open Data providers, like governments, regions, etc.Marshalling domain - Very Large Data Warehousing and BI AppliancesActian; ParaccelEMC² (Greenplum)HP (Vertica)IBM (Netezza)KognitioMicrosoft (SQL 2012 and PDW)Oracle (Exadata)SAP (HANA and Sybase IQ)SASTeradataNoSQL Domain – Main technologies and vendors: Amazon (as cloud provider or with their own NoSQL solution)CassandraCloudera (CDH, Hadoop distribution)CouchDBEMC²GoogleHadoop (of course)GoogleHortonworks (Hadoop distribution)HPIBMKXMapR (Hadoop distribution)MarkLogicMicrosoft (Hadoop on Windows and Azure)MongoDBNeo4JOraclePalantirSnaplogicSparsitySplunkTeradata (Aster Data)ZL TechnologiesContent Management Space:AdobeAlfrescoEMC² (Documentum)IBM (FileNet)HP (Autonomy)MicrosoftOpenTextOracle.Analytics phasePredictive technologies (such as data mining) and vendors which are Adobe, EMC², GoodData, Hadoop Map Reduce, HP, IBM (SPSS), Karmasphere, Kxen, Microsoft, Mzinga, Oracle, R, Salesforce, SAS, SAP (R on HANA) and Teradata (Aprimo). Data Virtualization (and data federation) is currently led by Composite, Denodo, HP (IDOL), IBM, Informatica, Microsoft, Oracle (Exalytics), SAP and Teiid (JBoss community).c BI Tools Vendors:ActuateDassaultSystèmes (Exalead)DomoEsriGoodDataGoogleHP (Autonomy)IBM (Cognos suite)Information BuildersLogiXML (LogiAnalytics)Microsoft (SQL 2012)MicrostrategyNeutrinoBIOracle (OBI Foundation)PanopticonPanoramaPentahoQlikviewRoambiSAP (BI4 suite)SASSpagoBITableauTIBCO Spotfire.Action Phase - Data Acquisition providers plus the ERP, CRM and BPM actorsAdobeEloquaEMC²IBMiGrafxMicrosoftOpenTextOraclePegaProgress softwareSAPSalesforceSoftware AGTeradata (Aprimo) Tibco.Data Governance area - Master Data Management (MDM), metadata and data quality toolsAdaptiveHPIBMInformaticaKalidoMicrosoftOracleOrchestra NetworksSAPSASTalendTibco. Note that the Complex Event Processing (CEP) Tools are part of Acquisition (streaming data acquisition), Marshalling (eg in-memory storage as data is used or compared immediately) and Analytics (eg Monitoring functions to detect abnormal activity) streams.Note that the BI Tools are part of Analytics (Computing Key Performance Indicators) and Action (eg Creating Alerts in a push mode by mail for instance) streams.
  3. Citrisleaf = AerospikeCouchbase – roots are in Northscale – Membase .. CouchDB; two focus audiences – Enterprise & funnel
  4. Analytics Infrastrucure = MPP – Distributed open-source, Apache-licensed distribution of Apache Hadoop ... Open source, Massively Parallel Processing (MPP) query engineInfrastucure ad a Service = Cloud IaaSOperational Infrastructure = Structure of Data – ex JSAN; ad-hoc queries; unstructured data; behaviorial, redundencyNot Listed – Hardware / Storage – NetApp, EMC, HP
  5. Per Forbes (per Wikibon), Big Data is an $18 billion industry heading to $50 billion in five years.  The companies in the inner-circle (ex: MapR, Cloudera, Splunk, Couchbase etc) are pure-plays within Big Data.  A theory is these inner-circle players will probably get gobbled up by the big boys on the outside, who are just starting to play in the Big Data space (like SAP, Microsoft, Oracle, IBM…) In the meantime, the relative sizes of the circles reflects the relative size of the companies, in terms of revenue.  The percentages reflect the % of their current business that is ‘big data’
  6. 5/18/13 w/ Paul HofmannPalantir – just text; just Homeland SecurityOracle Endica – addedHP Autonomy AddedAttivio (partner with TIBCO added)Saffron – Semantec and .. (Risk predictive) added0xData – changed logoMuSigma -= Consultant onlyRecorded Future -= Timeline; Opera = Text-only?; No predictive Analytics?Kxen – nice companySAS – Dead? Not scalable; Skytree = a platform / toolbox.. You need to have yoru own Data Quant to create yuur own analytics Sociocast – Saffron PartnerDigital Reasoning – Strong with Dept of Defense too
  7. NoSQL databases currently available include:Hbase (Apache)Cassandra (DataStax)MarkLogic (MarkLogic)Aerospike (CitrixDB)MongoDB (10gen)Accumulo (Apache)Riak (Basho)CouchDB (CouchBase)DynamoDB (Amazon)Sqrrl (?)VoltDB (?)http://thinkbiganalytics.com/leading_big_data_technologies/nosql/NoSQLNoSQL is an umbrella term for a broad class of database management systems that relax some of the tradition design constraints of relational database management systems (RDBMS) in order to meet goals of more cost-effective scalability, flexible tradeoffs of availability vs. consistency (as described by the CAP theorem), and flexibility for data structures that don’t fit well into the relational model, such as key-value data and large graphs. NoSQL databases typically don’t offer ACID transactions nor full SQL dialects.The NoSQL ecosystem is very large. Among the better known databases are HBase, Cassandra, Aerospike, DynamoDB, MongoDB, Riak, Redis, Accumulo, Datatomic, and Couchbase. Of these, HBase and Accumulo are more closely tied to Hadoop than the others, as both use HDFS, by default, for persistent storage and Zookeeper for service federation.NoSQL databases expose different information models, including key-value records, JSON or XML documents as records, or graph-oriented data. They expose corresponding programmer APIs and sometimes custom query languages that may or may not be SQL-based. However, a recent trend in this industry is the re-introduction of restricted SQL dialects to support the large user community accustomed to SQL and improving support for transactions.As an example of a scenario where a NoSQL database is a good fit, an event log for a web site might be captured in a key-value store, where fast appends and key-based retrievals are required, but not updates nor joins.HBaseHBase is a distributed, column-oriented database, where each cell is versioned (a configurable number of previous values is retained). HBase provides Bigtable-like capabilities on top of Hadoop. SQL queries (but not updates) are supported using Hive, but with high latency. Eventually, Impala will also support Hive queries with lower latency. Like many NoSQL databases, HBase does not support complex transactions, SQL, or ACID transactions. However, HBase offers high read and write performance and is used in several large applications, such as Facebook’s Messaging Platform. By default, HBase uses HDFS for durable storage, but it layers on top of this storage fast record-level queries and updates, which “raw” HDFS doesn’t support. Hence, HBase is useful when fast, record-level queries and updates are required, but storage in HDFS is desired for use with Pig, Hive, or other MapReduce-based tools.Cassandra Cassandra is the most popular NoSQL database for very large data sets. It is a key-value, clustered database that uses column-oriented storage, sharding by key ranges, and redundant storage for scalability in both data sizes and read/write performance, as well as resiliency against “hot” nodes and node failures. Cassandra has configurable consistency vs. availability (CAP theorem) tradeoffs, such as a tunable quorum model for writes.MongoDB MongoDB is a document-oriented NoSQL database where each record is a JSON document. It has a rich, Javascript-based query language that exploits the implicit structure of JSON. MongoDB supports sharding for improved scalability and resilience. It is most popular for small to large data sets and less commonly used for very large data sets.DynamoDBDynamoDB is Amazon’s highly scalable and available, key-value, NoSQL database. DynamoDB was one of the earliest NoSQL databases and papers written about it influenced the design of many other NoSQL databases, such as Cassandra.CouchbaseCouchbase is a key-value NoSQL database that is well-suited for mobile applications where a copy of a data set is resident on many devices, where changes can be performed on any copy, and copies are synchronized when connectivity is available. Think of how an email client works with local copies of your email history and corresponding email servers. RedisRedis is a key-value store with the specific support for fundamental data structures as values, including strings, hash maps, lists, sets, and sorted sets, whereas most key-value stores have limited understanding of a value’s meaning, except to represent the value as column cells, if many cases. For this reason, Redis is sometimes called a data structure server. Redis keeps all data in memory, which improves performance, but limits the data set sizes it can manage. Durability is optional, by periodic flushing to disk or writing updates to an append log. Master slave replication is also supported. Datomic Datomic is a newer entrant in the NoSQL landscape with a unique data model that remembers the state of the database at all points in the past, making historical reconstruction of events and state trivial. Many standard database operations are supported, including joins and ACID transactions. Deployments are distributed, elastic, highly available. RiakRiak is a fault-tolerant, distributed, key-value NoSQL database designed for large-scale deployments in cloud or hosted environments. A Riak database is masterless, with no single points of failure. It is resilient against the failure of multiple nodes and nodes can be added or removed easily. Riak is also optimized for read and write-intensive applications.