SlideShare una empresa de Scribd logo
1 de 16
26/04/2012




                         Investigative Analytics
                         - What's In The Data Scientists’ Toolkit



                    Mike Ferguson
                    Managing Director
                    Intelligent Business Strategies
                    Data Science London
                    April 2012




                    About Mike Ferguson

                    Mike Ferguson is Managing Director of Intelligent
                    Business Strategies Limited. As an analyst and
                    consultant he specializes in business
                    intelligence, data management and enterprise
                    business integration. With over 30 years of IT
                    experience, Mike has consulted for dozens of
                    companies, spoken at events all over the world
                    and written numerous articles. He is an expert on
                    the B-EYE-Network. Formerly he was a principal
                    and co-founder of Codd and Date Europe
                    Limited – the inventors of the Relational Model, a
                    Chief Architect at Teradata on the Teradata
                                                                          www.intelligentbusiness.biz
                    DBMS and European Managing Director of
                                                                       mferguson@intelligentbusiness.biz
                    DataBase Associates.
                                                                        Twitter: @mikeferguson1

                                                                        Tel/Fax (+44)1625 520700

                                                                                                       2




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                              1
26/04/2012




                    Topics

                        Big Data Workloads
                        Data science tools for near real-time analytics
                        Data science tools for investigative analytics of multi-structured
                        data
                        Data science tools for investigative analytics of structured data
                        Trends in a fast moving Big Data marketplace
                        Governance of data science projects




                                                                                             3




                    The Application Processing Spectrum - Big Data Is Pushing
                    Storage Options Towards Optimized Systems




                    Source: BI-Research
                                           Copyright © BI-Research, 2011                     4




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                    2
26/04/2012




                      Big Data Processing – The Number of Data Stores Optimized
                      for Operational or Analytical Workloads Is Growing
                  • ACID support missing in some NoSQL DBMSs
                                                                                  Analytical RDBMS
                  • Can you live with losing a transaction?
                       • OK for sensor data for example

                    OLTP RDBMS




                    NoSQL DBMS

                                                                                       NoSQL




                                                                                               5




                      Data Science Tools
                      – Different Analytical Workloads Need Different Tools
                          Some tools work across multiple platforms


                  Analytical           Analytical              Analytical          Analytical
                    tools                tools                   tools               tools



                 streaming data

                       Data               Data                    Data                Data
                    management         management              management          management
                       tools              tools                   tools               tools

                                                            CRM       ERP   SCM

                  Machine generated,
                    markets data,
                       sensors
                                                                                     RDF/OWL
                                                                                           6




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                        3
26/04/2012




                      Data Science Tools
                      – Near Real-Time Analytics On Data In Motion


                     Stream
                    analytics /
                      CEP              Workload analytical    Near real-time automated
                                       characteristics        analytics on text or semi-
                                                              structured data
                                       Data characteristics   Highly volatile data-in-motion,
                 streaming data                               large volumes
                                       Product Examples       IBM InfoSphere Streams,
                       Stream                                 Informatica RulePoint
                      analytics /      Trends                 CEP vendors moving to analyse
                        CEP                                   text as well as structured data

                                                              Some CEP vendors may get
                                                              acquired
                  Machine generated,
                    markets data,
                       sensors                                                                  7




                      Trends – Streaming Event Data Can Also Be Stored
                      In Hadoop or DW Appliance



                  Analytical
                    tools



                 streaming data

                       Data
                    management
                       tools




                  Machine generated,
                    markets data,
                       sensors                                                                  8




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                       4
26/04/2012




                    Data Science Tools - Investigative Analytics on Multi-
                    Structured Data In Hadoop (various distributions)

                                   Workload analytical    Investigative analysis
                                   characteristics
                     Analytical
                                   Data characteristics   Up to very large volumes of
                       tools                              multi-structured data (Variety)
                                   Data management        E.g. Informatica HParser,
                                   tools                  Pentaho ETL, Pervasive, Talend
                                                          ETL Studio
                                   Analysis               Batch analytics:
                                                          Custom MapReduce apps with
                        Data                              Mahout and R
                     management                           BI Tools (MapReduce)
                        tools                               Karmasphere, Datameer IBM
                                                            Cognos Content Analytics,
                                                          BI Tools (Search Based)
                                                               Connexica, Quid
                                                          BI Tools (Hive interface)
                                                               JasperSoft, MicroStrategy,
                                                               Tableau….
                                                                                            9




                    Data Deluge - Data Is Arriving Faster Than We Can
                    Consume It – How Good Is Your Filter?

                                                                   Enterprise
                                                F
                                              DI
                                              A L                           Enterprise
                                                                             systems
                                              TT
                                              AE
                                                R
                                                                                            10




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                    5
26/04/2012




                    Data Management Tools Are Being Extended To Embrace
                    And Exploit Massively Parallel Hadoop Clusters
                                  Approaches:
                                  • Custom code
                                  • Data Management tools suites:
                                     e.g. IBM InfoSphere Datastage and Smart Consolidation
                                          (uses InfoSphere Blueprint Director), Informatica,
                                          Pervasive, Petaho, Talend
                                                         Extract Data from Hadoop

                                                     Invoke Custom Analytics on Hadoop

                                              Transform & Cleanse Data in Hadoop (MapReduce)

                                                Parse & Prepare Data in Hadoop (MapReduce)
                        Data
                     management                           Discover data in Hadoop
                        tools                              Load Data into Hadoop


                                           Trends: Expect MUCH more from data management
                                           tool vendors including generation of MapReduce code
                                           to clean and transform data
                                                                                                 11




                    Processing Text Is A Key Part of Hadoop Based
                    Analysis
                    What Is Text Analytics?– deriving data from unstructured content




                        Popular data sources include
                         • Social media, email, news articles, on-line forums
                        Requires pre-processing prior to analysis
                         • Parsing, correction, phase extraction, semantic grouping
                                                                                                 12




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                         6
26/04/2012




                    Tools Are Appearing To Make It Easier To Parse Data
                    In Hadoop To Make It Easier To Analyse
                        Product Example: Informatica HParser




                        Source: Informatica
                                                                    13




                    Big Data Integration
                    - Talend Open Studio for Big Data




                                  Enhancing a big data job
                                  with Data Quality

                                  Several data quality
                                  components are included in
                 Source: Talend   the open source version
                                                                    14




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                             7
26/04/2012




                     Accelerating Custom Data Integration and Preparation

                        Pervasive DataRush for Hadoop                                 • Syncsort DMExpress Hadoop Edition:
                         • Call DataRush from MapReduce                                   • Move data in and out of HDFS
                         • MapReduce runs faster                                          • Create jobs using the DMExpress
                                                                                            GUI and run them within the
                         • Less code to write
                                                                                            Hadoop
                        Of interest to Map Reduce                                         • Shift transformations to the
                        developers                                                          DMExpress engine
                                                                                          • Invokes high performance
                                                                                            compression
                    Hadoop
                  Distributed
                      File
                    System      Mapper           Mapper    Mapper           Mapper
                                DataRush        DataRush   DataRush        DataRush
                                                                                           DMX       DMX      DMX      DMX



                                           Reducer               Reducer
                                           DataRush              DataRush


                                                                                                 Hadoop Acceleration

                                                                                                                             15




                     Leveraging Hadoop For Data Integration On Massive Volumes
                     Of Data To Bring Additional Insights Into A DW
                  Hundreds of              Cloud Data                                  e.g. Deriving insight from huge
                  terabytes up                                                         volumes of social web content on
                  to petabytes                                                         sites like twitter, facebook. Digg,
                                                                                       mySpace, tripAdvisor, Linkedin….for
                                                                                       sentiment analytics
                                                                                             Operational
                                                                                              systems

                                                              Extract



                                                                                                               D        DW
                                                      Cloud Data                             Transform         I
                                                                                               Map/
                                                                                              Reduce
                                                                                               apps
                                HDFS                                                                       relevant
                                                                                               e.g. PIG,
                                                                                              IBM JAQL      insight

                                                                                                                             16




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                                                     8
26/04/2012




                    Product Example – Pentaho Enterprise Data Services
                    Suite Support For Hadoop




                 Source: Pentaho                                                             17




                    In-Hadoop Analytics – Example Technologies



                      Analytical
                        tools       Hadoop MapReduce programs with custom analytics
                                    Hadoop MapReduce programs with Hadoop Mahout
                                     • Several analytical algorithms for use in batch analysis
                                    Pervasive DataRush For Hadoop Analytics Engine
                                    Radoop (UI on RapidMiner)
                         Data
                      management    Revolution Analytics RevoScaleR
                         tools      ….




                                                                                             18




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                     9
26/04/2012




                     New Big Data Analytics Technologies Are Emerging
                     On Hadoop – E.g. Radoop

                                                      Radoop interfaces RapidMiner (open source) with
                                                      Hadoop and integrates with Hive and Mahout
                                                      providing a UI for Hadoop based analytics




                                                      Source: “Radoop – It’s Like Yahoo Pipes for Hadoop”
                                                      http://siliconangle.com/blog/2011/08/11/radoop-its-like-yahoo-pipes-for-hadoop/




                                                                                                                               19




                     Revolution Analytics RevoScaleR for Distributed
                     Computing Clusters
                        Scaling R for Big Data Analytics                             •    Portions of the data source
                                                                                          are made available to each
                                       Compute                                            compute node
                     Data               Node
                    Partition        (RevoScaleR)                                    •    RevoScaleR on the master
                                                                                          node assigns a task to each
                                       Compute
                     Data               Node
                                                                                          compute node
                    Partition        (RevoScaleR)
                                                                   Master
                                                                                     •    Each compute node
                                                                   Node
                                       Compute                   (RevoScaleR)             independently processes its
                     Data               Node
                    Partition        (RevoScaleR)
                                                                                          data, and returns it’s
                                                                                          intermediate results back to
                                       Compute                                            the master node
                     Data               Node
                    Partition        (RevoScaleR)                                    •    master node aggregates all
                                                                                          of the intermediate results
                                                                                          from each compute node
                                                                                          and produces the final result


                 Source: Revolution Analytics                                                                               2020



Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                                                          10
26/04/2012




                    Analysing Hadoop Data – Multiple Options
                                        Batch analytics:
                                        • Custom MapReduce applications
                                        • Analytical Tools generating MapReduce
                      Analytical            • Karmasphere, Datameer,
                                            • IBM Cognos Content Analytics
                        tools
                                        Search Based Tools (built on Lucene)
                                            Connexica, Quid
                                        BI Tools using Hive QL
                                             JasperSoft, MicroStrategy, Tableau

                         Data              e.g. Log files
                                           Social networks
                      management           Clickstream
                         tools




                                            Source: Datameer


                                                                                                      21




                    Big Data Analysis - Exploratory Analysis of Multi-Structured
                    Data In Hadoop via Search e.g. IBM BigIndex (part of IBM BigInsights)
                   File            Use massively parallel Map Reduce
                 servers            to build a partitioned search index         index partitions


                  Web
                  sites                                                                    BI Tools,
                                                                                          Applications,
                  email                                                                    Mashups


                  CMS
                                   LOAD                                   index
                                                                            index
                                                                               Index
                   Image                                                      partition
                   server

                   Collab
                    tools
                                          Useful for analysing un-modelled semi-structured
                   Web                    content that is not well understood
                  feeds
                                                                                                      22




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                             11
26/04/2012




                    Search Based Analytical Tools For Big Data
                    - E.g. Connexica (runs on top of Lucene indexes)



                                                                                             Connexica
                                                                                             Venn Diagrams




                                           Connexica
                                           Dashboard


                                                                                                                    23




                    Data Warehouse Appliances – Analytical Workloads
                    on Structured Data using ADBMSs and BI Tools


                           Analytical          IBM Cognos, IBI WebFocus MicroStrategy,
                             tools             Oracle BIEE, SAP BusinessObjects, SAS,
                                               Pentaho, Jaspersoft, QlikView, Tableau

                                                      MPP analytical DBMS, in-database analytics,
                                                      Columnar and row storage


                                                IBM InfoSphere DataStage, Informatica
                              Data
                                                PowerCentre, Microsoft SSIS, Oracle Data
                           management
                                                Integrator, Pervasive, Pentaho ETL
                              tools
                                                Talend ETL

                         CRM    ERP      SCM


                  Workload analysis characteristics    Historical reporting and analysis, investigative analytics
                  Data characteristics                 Medium and large volumes, structured data
                                                                                                                    24




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                                           12
26/04/2012




                     In-Database Analytics – E.g. SAS Has Completely Re-Written
                     Analytics to Exploit Parallelism
                          E.g. SAS High Performance Analytics and Teradata
                          Runs ‘alongside’ the ADBMS as peers in the same MPP nodes
                           • In-memory passing of data between DBMS and analytic models
                             within every node without data movement
                           • Highly parallel, in-memory execution of analytics delivered across a
                             distributed computing environment
                              – Linear regression and variable selection with classical and
                                modern methods
                              – Nonlinear regression and maximum likelihood
                              – Correlation analysis    In-Database vs. Alongside-DBMS
                              – Logistic regression
                              – Neural nets
                              – Linear mixed models
                              – Optimization
                          GA Q4 2011
                                                                                                25




                      Trends in Data Science Tooling
                      – Tools Are Broadening Their Reach



                                                                                    Analytical
                                         Analytical tools
                                                                                      tools



                 streaming data

                                                                                      Data
                                       Data management tools                       management
                                                                                      tools

                                                          CRM   ERP    SCM

                  Machine generated,
                    markets data,                                                     RDF/OWL
                       sensors                                                                  26




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                       13
26/04/2012




                    Microsoft Big Data Solution – SQL Server 2012 Hive
                    ODBC Driver & Hive Add-in For Excel and PowerPivot




                   Source: Microsoft                                                            27




                    Front End Tools Interfacing With Hadoop And
                    Analytical RDBMS         e.g. Karmasphere
                                                        Datameer, IBM Cognos
                                                          Content Analytics    e.g.Connexica,
                                                                                    Quid

                              BI tools platform &       Map Reduce    Search based     Custom
                            data visualisation tools     BI tools       BI tools     Map Reduce
                                                                                     applications
                    SAP BO,            SQL
                  IBM Cognos,
                  Oracle BIEE,                                            Indexes
                  MicroStrategy,
                   JasperSoft,     MPP RDBMS
                    Pentaho,
                    MS Excel

                                        Polymorphic
                                       table function




                                                                                                28




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                       14
26/04/2012




                    Tools To Govern Data Science Projects
                    – Data Sources, Sandboxes, People, Results
                                                            governance
                                governance                                           governance
                                                                Sandbox

                                          MPP Analytical
                                            RDBMS
                    Graph DBMS
                                                DW

                                        governance    governance

                         Social
                       graph data                               Unstructured / semi-structured content
                                                       clickstream
                                        Files   RDBMS Web logs
                   governance
                                                                                                     29




                    Governance: Big Data Projects Need To Be Managed
                    – E.g. EMC GreenPlum Chorus




                                                                          Workspaces, sandboxes,
                                                                          people and data sources can
                                                                          all be governed

                                                                             Source: EMC GreenPlum
                                                                                                     30




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                            15
26/04/2012




                    Architectures
                    – Integrating Big Data Analytics Into The Enterprise
                       users                                             Business analysts
                                 BI tools platform &               Map Reduce      Search based
                               data visualisation tools             BI tools         BI tools
                                                                                                          developers
                 actions
                                                 SQL                                                       Custom
                           real-time
                                                                                        Indexes            MR apps
                     Stream
                   processing
                                               MPP RDBMS
                             Graph
                             DBMS

                                                 Polymorphic
                                               table function(s)
                  Event    Social
                 streams graph data             OLTP data             Unstructured / semi-structured content
                                               Information Management and Services
                                                   XML,                       clickstream
                                                   JSON
                     Cloud Data        Files   web services RDBMS     Cubes     Web logs                   office
                                                                                            web content    docs
                                                                                                                    31




                                               Thank You!
                                                                www.intelligentbusiness.biz
                                                             mferguson@intelligentbusiness.biz

                                                                   Twitter: @mikeferguson1

                                                                   Tel/Fax (+44)1625 520700




                                                                                                                    32




Copyright © Intelligent Business Strategies 2012
- All Rights Reserved                                                                                                           16

Más contenido relacionado

La actualidad más candente

Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentationMassTLC
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020Anjan Roy, PMP
 
Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environmentSasha Citino
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?Kun Le
 
All Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceAll Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceInside Analysis
 
Enterprise Master Data Architecture
Enterprise Master Data ArchitectureEnterprise Master Data Architecture
Enterprise Master Data ArchitectureBoris Otto
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsBoris Otto
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotModern Data Stack France
 
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]Rhapsody Technologies, Inc.
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategySustainableEnergyAut
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
Teradata Overview
Teradata OverviewTeradata Overview
Teradata OverviewTeradata
 
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTIONDATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTIONivan provalov
 
Big Data Whitepaper - Streams and Big Insights Integration Patterns
Big Data Whitepaper  - Streams and Big Insights Integration PatternsBig Data Whitepaper  - Streams and Big Insights Integration Patterns
Big Data Whitepaper - Streams and Big Insights Integration PatternsMauricio Godoy
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsDATAVERSITY
 
Record manager 8.0 presentation
Record manager 8.0  presentationRecord manager 8.0  presentation
Record manager 8.0 presentationAndrey Karpov
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Denodo
 

La actualidad más candente (20)

IBM Stream au Hadoop User Group
IBM Stream au Hadoop User GroupIBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
 
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentation
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
 
Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environment
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?
 
All Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceAll Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data Governance
 
Enterprise Master Data Architecture
Enterprise Master Data ArchitectureEnterprise Master Data Architecture
Enterprise Master Data Architecture
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and Options
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
 
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data Strategy
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
Teradata Overview
Teradata OverviewTeradata Overview
Teradata Overview
 
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTIONDATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
 
Big Data Whitepaper - Streams and Big Insights Integration Patterns
Big Data Whitepaper  - Streams and Big Insights Integration PatternsBig Data Whitepaper  - Streams and Big Insights Integration Patterns
Big Data Whitepaper - Streams and Big Insights Integration Patterns
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
 
Record manager 8.0 presentation
Record manager 8.0  presentationRecord manager 8.0  presentation
Record manager 8.0 presentation
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
 

Destacado

LA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJE
LA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJELA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJE
LA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJEsuperveromena
 
EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.
EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.
EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.Energy for One World
 
"Telling Stories about People and Their Influence" Ferenc Huszár @ds_ldn
"Telling Stories about People and Their Influence"  Ferenc Huszár @ds_ldn"Telling Stories about People and Their Influence"  Ferenc Huszár @ds_ldn
"Telling Stories about People and Their Influence" Ferenc Huszár @ds_ldnData Science London
 
Utilización y selección
Utilización y selecciónUtilización y selección
Utilización y selecciónweticsblog
 
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds...
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds..."Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds...
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds...Data Science London
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Data Science London
 
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...Jasper Moelker
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 
PetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPosterPetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPosterSierra Peterson
 
Lines and angles ( Class 6-7 )
Lines and angles ( Class 6-7 )Lines and angles ( Class 6-7 )
Lines and angles ( Class 6-7 )romilkharia
 
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...Mery Lucy Flores M.
 

Destacado (20)

Research at last.fm
Research at last.fmResearch at last.fm
Research at last.fm
 
LA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJE
LA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJELA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJE
LA TECNOLOGIA COMO APOYO EN EL PROCESO ENSEÑANZA-APRENDIZAJE
 
EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.
EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.
EFOW/ LERCPA: Leaders of Energy without Borders. On our way to 100% renewables.
 
"Telling Stories about People and Their Influence" Ferenc Huszár @ds_ldn
"Telling Stories about People and Their Influence"  Ferenc Huszár @ds_ldn"Telling Stories about People and Their Influence"  Ferenc Huszár @ds_ldn
"Telling Stories about People and Their Influence" Ferenc Huszár @ds_ldn
 
Gradle_ToursJUG
Gradle_ToursJUGGradle_ToursJUG
Gradle_ToursJUG
 
Utilización y selección
Utilización y selecciónUtilización y selección
Utilización y selección
 
ortodoncia
ortodonciaortodoncia
ortodoncia
 
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds...
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds..."Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds...
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds...
 
M2 actividad2 10
M2 actividad2 10M2 actividad2 10
M2 actividad2 10
 
Mais cultura
Mais culturaMais cultura
Mais cultura
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?
 
Fico success story
Fico success storyFico success story
Fico success story
 
Sarwat Jahan_cv
Sarwat Jahan_cvSarwat Jahan_cv
Sarwat Jahan_cv
 
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
PetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPosterPetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPoster
 
Lines and angles ( Class 6-7 )
Lines and angles ( Class 6-7 )Lines and angles ( Class 6-7 )
Lines and angles ( Class 6-7 )
 
aparatologia ortodontica
aparatologia ortodontica aparatologia ortodontica
aparatologia ortodontica
 
JENKINS_BreizhJUG_20111003
JENKINS_BreizhJUG_20111003JENKINS_BreizhJUG_20111003
JENKINS_BreizhJUG_20111003
 
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
 

Similar a Investigative Analytics Tools For Multi-Structured Big Data

Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
When Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and OperationsWhen Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and OperationsInside Analysis
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelInside Analysis
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12David J Rosenthal
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analyticskatsoulis
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataEMC
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) dataOscar Renalias
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
 
SAP Explorer Visual Intelligence
SAP Explorer Visual IntelligenceSAP Explorer Visual Intelligence
SAP Explorer Visual IntelligenceEric Molner
 
Big data and you
Big data and you Big data and you
Big data and you IBM
 
The New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesThe New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesInside Analysis
 

Similar a Investigative Analytics Tools For Multi-Structured Big Data (20)

Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
When Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and OperationsWhen Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and Operations
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analytics
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast Data
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) data
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
SAP Explorer Visual Intelligence
SAP Explorer Visual IntelligenceSAP Explorer Visual Intelligence
SAP Explorer Visual Intelligence
 
Big data and you
Big data and you Big data and you
Big data and you
 
The New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesThe New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front Lines
 

Más de Data Science London

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Data Science London
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingData Science London
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Data Science London
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresData Science London
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysisData Science London
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayData Science London
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignData Science London
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureData Science London
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryData Science London
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutData Science London
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutData Science London
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersData Science London
 
Understanding Cause & Effect in Customer Behaviour
Understanding Cause & Effect in Customer BehaviourUnderstanding Cause & Effect in Customer Behaviour
Understanding Cause & Effect in Customer BehaviourData Science London
 

Más de Data Science London (20)

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Nowcasting Business Performance
Nowcasting Business PerformanceNowcasting Business Performance
Nowcasting Business Performance
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysis
 
Survival Analysis of Web Users
Survival Analysis of Web UsersSurvival Analysis of Web Users
Survival Analysis of Web Users
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems Design
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Data Science for Live Music
Data Science for Live MusicData Science for Live Music
Data Science for Live Music
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music Industry
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with Mahout
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Practical Magic with Incanter
Practical Magic with IncanterPractical Magic with Incanter
Practical Magic with Incanter
 
Understanding Cause & Effect in Customer Behaviour
Understanding Cause & Effect in Customer BehaviourUnderstanding Cause & Effect in Customer Behaviour
Understanding Cause & Effect in Customer Behaviour
 
Bootstrapping Data Science
Bootstrapping Data ScienceBootstrapping Data Science
Bootstrapping Data Science
 

Último

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Investigative Analytics Tools For Multi-Structured Big Data

  • 1. 26/04/2012 Investigative Analytics - What's In The Data Scientists’ Toolkit Mike Ferguson Managing Director Intelligent Business Strategies Data Science London April 2012 About Mike Ferguson Mike Ferguson is Managing Director of Intelligent Business Strategies Limited. As an analyst and consultant he specializes in business intelligence, data management and enterprise business integration. With over 30 years of IT experience, Mike has consulted for dozens of companies, spoken at events all over the world and written numerous articles. He is an expert on the B-EYE-Network. Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata www.intelligentbusiness.biz DBMS and European Managing Director of mferguson@intelligentbusiness.biz DataBase Associates. Twitter: @mikeferguson1 Tel/Fax (+44)1625 520700 2 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 1
  • 2. 26/04/2012 Topics Big Data Workloads Data science tools for near real-time analytics Data science tools for investigative analytics of multi-structured data Data science tools for investigative analytics of structured data Trends in a fast moving Big Data marketplace Governance of data science projects 3 The Application Processing Spectrum - Big Data Is Pushing Storage Options Towards Optimized Systems Source: BI-Research Copyright © BI-Research, 2011 4 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 2
  • 3. 26/04/2012 Big Data Processing – The Number of Data Stores Optimized for Operational or Analytical Workloads Is Growing • ACID support missing in some NoSQL DBMSs Analytical RDBMS • Can you live with losing a transaction? • OK for sensor data for example OLTP RDBMS NoSQL DBMS NoSQL 5 Data Science Tools – Different Analytical Workloads Need Different Tools Some tools work across multiple platforms Analytical Analytical Analytical Analytical tools tools tools tools streaming data Data Data Data Data management management management management tools tools tools tools CRM ERP SCM Machine generated, markets data, sensors RDF/OWL 6 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 3
  • 4. 26/04/2012 Data Science Tools – Near Real-Time Analytics On Data In Motion Stream analytics / CEP Workload analytical Near real-time automated characteristics analytics on text or semi- structured data Data characteristics Highly volatile data-in-motion, streaming data large volumes Product Examples IBM InfoSphere Streams, Stream Informatica RulePoint analytics / Trends CEP vendors moving to analyse CEP text as well as structured data Some CEP vendors may get acquired Machine generated, markets data, sensors 7 Trends – Streaming Event Data Can Also Be Stored In Hadoop or DW Appliance Analytical tools streaming data Data management tools Machine generated, markets data, sensors 8 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 4
  • 5. 26/04/2012 Data Science Tools - Investigative Analytics on Multi- Structured Data In Hadoop (various distributions) Workload analytical Investigative analysis characteristics Analytical Data characteristics Up to very large volumes of tools multi-structured data (Variety) Data management E.g. Informatica HParser, tools Pentaho ETL, Pervasive, Talend ETL Studio Analysis Batch analytics: Custom MapReduce apps with Data Mahout and R management BI Tools (MapReduce) tools Karmasphere, Datameer IBM Cognos Content Analytics, BI Tools (Search Based) Connexica, Quid BI Tools (Hive interface) JasperSoft, MicroStrategy, Tableau…. 9 Data Deluge - Data Is Arriving Faster Than We Can Consume It – How Good Is Your Filter? Enterprise F DI A L Enterprise systems TT AE R 10 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 5
  • 6. 26/04/2012 Data Management Tools Are Being Extended To Embrace And Exploit Massively Parallel Hadoop Clusters Approaches: • Custom code • Data Management tools suites: e.g. IBM InfoSphere Datastage and Smart Consolidation (uses InfoSphere Blueprint Director), Informatica, Pervasive, Petaho, Talend Extract Data from Hadoop Invoke Custom Analytics on Hadoop Transform & Cleanse Data in Hadoop (MapReduce) Parse & Prepare Data in Hadoop (MapReduce) Data management Discover data in Hadoop tools Load Data into Hadoop Trends: Expect MUCH more from data management tool vendors including generation of MapReduce code to clean and transform data 11 Processing Text Is A Key Part of Hadoop Based Analysis What Is Text Analytics?– deriving data from unstructured content Popular data sources include • Social media, email, news articles, on-line forums Requires pre-processing prior to analysis • Parsing, correction, phase extraction, semantic grouping 12 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 6
  • 7. 26/04/2012 Tools Are Appearing To Make It Easier To Parse Data In Hadoop To Make It Easier To Analyse Product Example: Informatica HParser Source: Informatica 13 Big Data Integration - Talend Open Studio for Big Data Enhancing a big data job with Data Quality Several data quality components are included in Source: Talend the open source version 14 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 7
  • 8. 26/04/2012 Accelerating Custom Data Integration and Preparation Pervasive DataRush for Hadoop • Syncsort DMExpress Hadoop Edition: • Call DataRush from MapReduce • Move data in and out of HDFS • MapReduce runs faster • Create jobs using the DMExpress GUI and run them within the • Less code to write Hadoop Of interest to Map Reduce • Shift transformations to the developers DMExpress engine • Invokes high performance compression Hadoop Distributed File System Mapper Mapper Mapper Mapper DataRush DataRush DataRush DataRush DMX DMX DMX DMX Reducer Reducer DataRush DataRush Hadoop Acceleration 15 Leveraging Hadoop For Data Integration On Massive Volumes Of Data To Bring Additional Insights Into A DW Hundreds of Cloud Data e.g. Deriving insight from huge terabytes up volumes of social web content on to petabytes sites like twitter, facebook. Digg, mySpace, tripAdvisor, Linkedin….for sentiment analytics Operational systems Extract D DW Cloud Data Transform I Map/ Reduce apps HDFS relevant e.g. PIG, IBM JAQL insight 16 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 8
  • 9. 26/04/2012 Product Example – Pentaho Enterprise Data Services Suite Support For Hadoop Source: Pentaho 17 In-Hadoop Analytics – Example Technologies Analytical tools Hadoop MapReduce programs with custom analytics Hadoop MapReduce programs with Hadoop Mahout • Several analytical algorithms for use in batch analysis Pervasive DataRush For Hadoop Analytics Engine Radoop (UI on RapidMiner) Data management Revolution Analytics RevoScaleR tools …. 18 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 9
  • 10. 26/04/2012 New Big Data Analytics Technologies Are Emerging On Hadoop – E.g. Radoop Radoop interfaces RapidMiner (open source) with Hadoop and integrates with Hive and Mahout providing a UI for Hadoop based analytics Source: “Radoop – It’s Like Yahoo Pipes for Hadoop” http://siliconangle.com/blog/2011/08/11/radoop-its-like-yahoo-pipes-for-hadoop/ 19 Revolution Analytics RevoScaleR for Distributed Computing Clusters Scaling R for Big Data Analytics • Portions of the data source are made available to each Compute compute node Data Node Partition (RevoScaleR) • RevoScaleR on the master node assigns a task to each Compute Data Node compute node Partition (RevoScaleR) Master • Each compute node Node Compute (RevoScaleR) independently processes its Data Node Partition (RevoScaleR) data, and returns it’s intermediate results back to Compute the master node Data Node Partition (RevoScaleR) • master node aggregates all of the intermediate results from each compute node and produces the final result Source: Revolution Analytics 2020 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 10
  • 11. 26/04/2012 Analysing Hadoop Data – Multiple Options Batch analytics: • Custom MapReduce applications • Analytical Tools generating MapReduce Analytical • Karmasphere, Datameer, • IBM Cognos Content Analytics tools Search Based Tools (built on Lucene) Connexica, Quid BI Tools using Hive QL JasperSoft, MicroStrategy, Tableau Data e.g. Log files Social networks management Clickstream tools Source: Datameer 21 Big Data Analysis - Exploratory Analysis of Multi-Structured Data In Hadoop via Search e.g. IBM BigIndex (part of IBM BigInsights) File Use massively parallel Map Reduce servers to build a partitioned search index index partitions Web sites BI Tools, Applications, email Mashups CMS LOAD index index Index Image partition server Collab tools Useful for analysing un-modelled semi-structured Web content that is not well understood feeds 22 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 11
  • 12. 26/04/2012 Search Based Analytical Tools For Big Data - E.g. Connexica (runs on top of Lucene indexes) Connexica Venn Diagrams Connexica Dashboard 23 Data Warehouse Appliances – Analytical Workloads on Structured Data using ADBMSs and BI Tools Analytical IBM Cognos, IBI WebFocus MicroStrategy, tools Oracle BIEE, SAP BusinessObjects, SAS, Pentaho, Jaspersoft, QlikView, Tableau MPP analytical DBMS, in-database analytics, Columnar and row storage IBM InfoSphere DataStage, Informatica Data PowerCentre, Microsoft SSIS, Oracle Data management Integrator, Pervasive, Pentaho ETL tools Talend ETL CRM ERP SCM Workload analysis characteristics Historical reporting and analysis, investigative analytics Data characteristics Medium and large volumes, structured data 24 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 12
  • 13. 26/04/2012 In-Database Analytics – E.g. SAS Has Completely Re-Written Analytics to Exploit Parallelism E.g. SAS High Performance Analytics and Teradata Runs ‘alongside’ the ADBMS as peers in the same MPP nodes • In-memory passing of data between DBMS and analytic models within every node without data movement • Highly parallel, in-memory execution of analytics delivered across a distributed computing environment – Linear regression and variable selection with classical and modern methods – Nonlinear regression and maximum likelihood – Correlation analysis In-Database vs. Alongside-DBMS – Logistic regression – Neural nets – Linear mixed models – Optimization GA Q4 2011 25 Trends in Data Science Tooling – Tools Are Broadening Their Reach Analytical Analytical tools tools streaming data Data Data management tools management tools CRM ERP SCM Machine generated, markets data, RDF/OWL sensors 26 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 13
  • 14. 26/04/2012 Microsoft Big Data Solution – SQL Server 2012 Hive ODBC Driver & Hive Add-in For Excel and PowerPivot Source: Microsoft 27 Front End Tools Interfacing With Hadoop And Analytical RDBMS e.g. Karmasphere Datameer, IBM Cognos Content Analytics e.g.Connexica, Quid BI tools platform & Map Reduce Search based Custom data visualisation tools BI tools BI tools Map Reduce applications SAP BO, SQL IBM Cognos, Oracle BIEE, Indexes MicroStrategy, JasperSoft, MPP RDBMS Pentaho, MS Excel Polymorphic table function 28 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 14
  • 15. 26/04/2012 Tools To Govern Data Science Projects – Data Sources, Sandboxes, People, Results governance governance governance Sandbox MPP Analytical RDBMS Graph DBMS DW governance governance Social graph data Unstructured / semi-structured content clickstream Files RDBMS Web logs governance 29 Governance: Big Data Projects Need To Be Managed – E.g. EMC GreenPlum Chorus Workspaces, sandboxes, people and data sources can all be governed Source: EMC GreenPlum 30 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 15
  • 16. 26/04/2012 Architectures – Integrating Big Data Analytics Into The Enterprise users Business analysts BI tools platform & Map Reduce Search based data visualisation tools BI tools BI tools developers actions SQL Custom real-time Indexes MR apps Stream processing MPP RDBMS Graph DBMS Polymorphic table function(s) Event Social streams graph data OLTP data Unstructured / semi-structured content Information Management and Services XML, clickstream JSON Cloud Data Files web services RDBMS Cubes Web logs office web content docs 31 Thank You! www.intelligentbusiness.biz mferguson@intelligentbusiness.biz Twitter: @mikeferguson1 Tel/Fax (+44)1625 520700 32 Copyright © Intelligent Business Strategies 2012 - All Rights Reserved 16