SlideShare a Scribd company logo
1 of 21
Integrating Hadoop within an Enterprise
Analytic Ecosystem
Priyank Patel | Product Management
June 13 2012
Topics


•  Unified Big Data Architecture Overview

•  Aster SQL-H™ : The business user’s bridge to Hadoop Data




2   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Big Data: From Transactions to Interactions

                                                                             BIG DATA
                           User Generated
                                                                                                     Social Network
                              Content
                                                     Mobile Web
                                                                                                       External
                        User Click Stream                                         Sentiment
                                                                                                     Demographics


                                 Web logs
                                                         WEB                   A/B testing        Business Data Feeds

                      Offer history                                          Dynamic Pricing
                                                                                                       HD Video
                                                                            Affiliate Networks
                                 CRM                                                                 Speech to Text
                                                     Segmentation
                                                                            Search marketing
                                                        Offer details
                                                                                                  Product/Service Logs
                         ERP                                               Behavioral Targeting
                                               Customer Touches
                  Purchase detail
                  Purchase record               Support Contacts            Dynamic Funnels            SMS/MMS
                  Payment record




                                              Increasing data variety and complexity



3   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data Architecture
Bridging Classic & Big Data Worlds

                                                         Classic BI Method
                                                    Structured & Repeatable Analysis




Business determines what                                                                   IT structures the data to
     questions to ask                                                                       answer those questions
                                                SQL performance and structure
                                                                                             “Capture only
                                                                                            what’s needed”


                                             MapReduce Processing Flexibility




     IT delivers a platform for                         Big Data Analytics
       storing, refining, and                                                             Business explores data for
                                                  Multi-structured & Iterative Analysis
    analyzing all data sources                                                            questions worth answering

     “Capture in case
       it’s needed”
4        Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Need for a Unified Big Data Architecture for New Insights
Enabling All Users for Any Data Type from Data Capture to Analysis




          Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.


                                                                             Reporting and Execution
              Discover and Explore
                                                                                in the Enterprise


                                           Capture, Store and Refine


    Audio/                                                          Web &       Machine
                  Images             Docs            Text                                 CRM   SCM   ERP
    Video                                                           Social       Logs



5    Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data Architecture for the Enterprise



             Engineers                      Data Scientists                  Quants           Business Analysts

          Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.




            Discovery Platform                                                        Integrated Data
                                                                                        Warehouse




                                                     Capture, Store, Refine


       Audio/                                              Web &            Machine
                       Images             Text                                          CRM       SCM       ERP
       Video                                               Social            Logs




 6   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
What’s Technically Different in Big Data Analytics
Variety of data types and analytics require different schemas
•  Data that uses a stable schema (structured)
    -  Data from packaged business processes with well-defined & known attributes
       (e.g., ERP data, Inventory Records, Supply Chain records, …)


•  Data that has an evolving schema (semi-structured)
    -  Data generated by machine processes; known but changing set of attributes
       (e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …)


•  Data that has a format, but no schema (unstructured)
    -  Data captured by machines with well-defined format, but no semantics
       (e.g., images, videos, web pages, PDF documents, …)
    -  Semantics can be extracted from raw data by interpreting the format and
       pulling out required data
       (e.g., shapes from video, face recognition in images, logo detection, …)
    -  Sometimes format data is accompanied by meta-data that can have (Stable
       Schema or Evolving Schema) – that needs to be classified and treated
       separately

7     Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Diversity of Data Processing and Analytics
Unified Big Data Architecture Must Handle Each Workload Optimally

•  Low cost storage and retention
    -  Retention of raw data in manner that can provide low TCO per terabyte storage costs
    -  Access in deep storage still required but not at same speeds as in a front line system
•  Loading and refining
    -  Load: bring data into the system from the source system
    -  Pre-processing / prep/ cleansing / constraint validation: prepare data for
       downstream processing – e.g., fetch dimension data, record new incoming batch, archive
       old window batch, etc.
    -  Transformations: Convert one structure of data into another structure. This may
       require going from 3NF in relational to star/snowflake schema in Relational, or going
       from text to Relational, or going from Relational to Graph – I.e., structural
       transformations
•  Reporting
    -  This is querying of what happened, where did it happen, how much happened, who did it
•  Analytics (user-driven, interactive, ad-hoc)
    -  Relationship modeling that can be done via declarative SQL (e.g., scoring, basic stats)
    -  Relationship modeling done via procedural MR (E.g., model building, time series)



8      Confidential and proprietary. Copyright © 2012 Teradata Corporation.
When to Use Which?
 The best approach by workload and data type
 Processing as a Function of Schema Requirements by Data Type

                                                        Loading and Refining
                    Low Cost                                                                                      Analytics
                    Storage &            Data Pre-Processing,                                     Reporting     (User-driven,
                    Retention              Prep, Cleansing                      Transformations                  interactive)


                                                     Financial analysis, ad-Hoc/OLAP
Stable             Teradata /                       Enterprise-wide BI and Reporting                             Teradata
                                                   Teradata        Teradata    Teradata
Schema              Hadoop                                  Spatial/Temporal                                   (SQL analytics)
                                                             Active Execution

                                                         Interactive data discovery
                                                                        Aster
                                                       Web clickstream, social feeds                                Aster
Evolving                                            Aster /
                      Hadoop                                        (joining with  Aster                      (SQL + MapReduce
Schema                                              Hadoop Set-top box analysis
                                                                  structured data)                                Analytics)
                                                           CDRs, Sensor logs, JSON

                                                            Image processing                                      Aster
Format,
No Schema
                      Hadoop                         Audio/video storage and refining
                                                    Hadoop         Hadoop                                       (MapReduce
                                                                                                                 Analytics)
                                                    Storage and batch transformations

  9      Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Aster Digital Marketing Client


     Custom
                                                 Analytic Tools                   •  Segmentation: Custom SQL-
     Data by
      Client
                                                                                     MR algorithms to match and
                                                                                     create centralized identifiers
                                                                                  •  Sessionize by client
                                                                                  •  nPath identifies segment path
 Media Data
 (Aggregated)                                   Teradata Aster                       analysis (behavior after ads)


                                                                                  •  Benefits:
                                              Cookie-level




     Raw Web                                                           Archival     -  Marketing analysts more
                                                 data




       Logs
                                                                                       productive with Aster
                                                                                    -  Lower cost - storage and
                                                                                       batch refining done on
     Ad Server
       Logs
                                        Hadoop (on AWS)                                Amazon Elastic
                                       (Storage, aggregations,
                                              cleansing)
                                                                                       MapReduce



10      Confidential and proprietary. Copyright © 2012 Teradata Corporation.
More Accurate Customer Churn Prevention

            Hadoop captures,                                                                                                       Aster does path
                stores and                                                                                                          and sentiment
            transform images                                                     Social &                                            analysis with
                                                                                 Web data                                          multi-structured
              and call records
                                                                                                                                         data


                          Multi-Structured
                             Raw Data
                                                                            Call Data                                                     Analysis
                                                                                              Aster
                            Call Center
                                                              Hadoop                        Discovery                                        +
                           Voice Records                                    Check Data
                                                                                            Platform                                     Marketing
                                                                                                                                         Automation
                           Check Images                          Capture,




                                                                                                                Analytic Results
                                                                                             Dimensional Data
                                                                 Retention                                                                (Customer
                                                                     &                                                                     Retention
                                                                                                                                          Campaign)
                       Traditional Data Flow                  Transformation
                                                                   Layer
                          Data Sources


                                                                 ETL Tools                  Teradata
                                                                                         Integrated DW




11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Bridging the Business Analyst Gap for
            Hadoop Data
Aster SQL-H™
                 A Business User’s Bridge to Analyze Hadoop Data



Aster SQL-H gives analysts and data scientists a better way
to analyze data stored cheaply in Hadoop
      •  Allow standard ANSI SQL to Hadoop data

      •  Leverage existing BI tool investments

      •  Enable 50+ prebuilt SQL-MapReduce Apps and IDE

      •  Improve self-sufficiency for analysts going against Hadoop

 13     Confidential and proprietary. Copyright © 2012 Teradata Corporation.
The Big Data Architecture Today Has Gaps

                                                Gap 1:
                                               Analysts

               Engineers                    Data Scientists                       Quants         Business Analysts

            Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

               MapReduce
              (Processing)
                                                                              Discovery              Active Data
      Gap 2: File system lacks
                                                                              Platform               Warehouse
      optimizers, data locality,
      indexes
                                                                       Database and Analytic Processing Layer



        Data Storage and
            Refining

          Audio/                                           Web &               Machine
                          Images            Text                                           CRM     SCM      ERP
          Video                                            Social               Logs




 14    Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Analyst’s Goal: Get Insights from Data in Hadoop


      Engineers                         Data Scientists                      Quants       Business Analysts




                                                      Aster MapReduce Portfolio       Teradata Analytics Portfolio
        Custom Code and
          Development

                                                         SQL & SQL-MapReduce                    SQL

           MR, Pig, Hive
                                                           Teradata Aster                     Teradata
            IT is the optimizer                          Discovery Platform                     IDW




 15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Analytics on Hadoop Data with Aster SQL-H


      Engineers                         Data Scientists                      Quants        Business Analysts




                              Aster MapReduce Portfolio
                                           Aster MapReduce Portfolio                  Teradata Analytics Portfolio




              SQL-H                               SQL & MapReduce
                                         SQL & SQL-MapReduce                                      SQL
                                                                                                  SQL



                                                           Teradata Aster                      Teradata
                                                         Discovery Platform                      IDW




 16   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Aster SQL-H Integration with Hadoop Catalog
A Business User’s Bridge to Analyzing Data in Hadoop

•  Industry’s First Database Integration
   with Hadoop’s HCatalog                                                            Aster SQL-H
•  Abstraction layer to easily and
   efficiently read structured & multi-
   structured data stored in HDFS
                                                                             Hadoop
•  Uses Hadoop Catalog (HCatalog) to                                        MapReduce
   perform data abstraction functions
   (e.g. automatically understands
   tables, data partitions)                                                   Hive          HCatalog
•  HDFS data presented to users as
   Aster tables
                                                                               Pig
•  Fully accessible within the Aster SQL
   and SQL-MapReduce processing
   engines, plus ODBC/JDBC & BI tools
                                                                                       HDFS

17   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Benefits of Aster SQL-H™
Deep metadata layer integration between Aster and Hadoop

Business Analysts (Powerful analytics & Performance)
•  50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio)
  -  Analytics on Hadoop data no longer requires expensive talent and training
•  Simplified, SQL-based interface with Hadoop data structures (Hcatalog)
  -  No longer limited by Hive’s QL
•  Interoperability with existing ecosystem & skillset
  -  BI tools (MSTR, Tableau, Cognos), ETL tools, SQL analysts, existing apps


Architects and Administrators (Maintainability)
•  Leverage existing DBA skill-sets without additional overhead
•  Simplify administration and monitoring
  -  Competitors require manual creation and maintenance of metadata
  -  Less work and fewer errors
  -  Can do filtering with Aster; select data from HCatalog, leverage partitioning


 18   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Aster MapReduce Portfolio: the App Store of Big Data
Some of the 50+ out-of-the-box analytical apps



        Path Analysis                                                       Text Analysis
        Discover patterns in rows of                                        Derive patterns and extract
        sequential data                                                     features in textual data



        Statistical Analysis                                                Segmentation
        High-performance processing of                                      Discover natural groupings of
        common statistical calculations                                     data points



        Marketing Analytics                                                 Data Transformation
        Analyze customer interactions to                                    Transform data for more
        optimize marketing decisions                                        advanced analysis



19   Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Summary


•  Mainstream organizations need a unified big data architecture
     -  Best-of-breed with Hadoop, Aster, Teradata
     -  Brings “Data Science” to business analysts
     -  50+ business-ready MapReduce analytics and apps
     -  Enabled by SQL-MapReduce framework and new SQL-H


•  Learn more - asterdata.com/mapreduce
•  Download - developer.teradata.com/aster

•  Breakout Session : Thursday – 4:30 pm
        How does SQL-H work ?
              Sushil Thomas, Teradata Aster



20     Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified big data architecture

More Related Content

What's hot

Vitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL GiantVitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL GiantMatt Lord
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Edureka!
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Mark Kromer
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimizationSANG WON PARK
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slidesDat Tran
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino ProjectMartin Traverso
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Mark Kromer
 

What's hot (20)

Vitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL GiantVitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
 

Similar to Unified big data architecture

The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMark Ginnebaugh
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityDatabase Architechs
 
Albel pres mdm implementation
Albel pres   mdm implementationAlbel pres   mdm implementation
Albel pres mdm implementationAli BELCAID
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaleBase
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Data Warehouse Architecture
Data Warehouse ArchitectureData Warehouse Architecture
Data Warehouse Architecturepcherukumalla
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger PresentationMauricio Godoy
 

Similar to Unified big data architecture (20)

The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data Services
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
Search2012 ibm vf
Search2012 ibm vfSearch2012 ibm vf
Search2012 ibm vf
 
Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data Quality
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
Albel pres mdm implementation
Albel pres   mdm implementationAlbel pres   mdm implementation
Albel pres mdm implementation
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data Warehouse Architecture
Data Warehouse ArchitectureData Warehouse Architecture
Data Warehouse Architecture
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger Presentation
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Unified big data architecture

  • 1. Integrating Hadoop within an Enterprise Analytic Ecosystem Priyank Patel | Product Management June 13 2012
  • 2. Topics •  Unified Big Data Architecture Overview •  Aster SQL-H™ : The business user’s bridge to Hadoop Data 2 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 3. Big Data: From Transactions to Interactions BIG DATA User Generated Social Network Content Mobile Web External User Click Stream Sentiment Demographics Web logs WEB A/B testing Business Data Feeds Offer history Dynamic Pricing HD Video Affiliate Networks CRM Speech to Text Segmentation Search marketing Offer details Product/Service Logs ERP Behavioral Targeting Customer Touches Purchase detail Purchase record Support Contacts Dynamic Funnels SMS/MMS Payment record Increasing data variety and complexity 3 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 4. Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic BI Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions SQL performance and structure “Capture only what’s needed” MapReduce Processing Flexibility IT delivers a platform for Big Data Analytics storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed” 4 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 5. Need for a Unified Big Data Architecture for New Insights Enabling All Users for Any Data Type from Data Capture to Analysis Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Reporting and Execution Discover and Explore in the Enterprise Capture, Store and Refine Audio/ Web & Machine Images Docs Text CRM SCM ERP Video Social Logs 5 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 6. Unified Big Data Architecture for the Enterprise Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Discovery Platform Integrated Data Warehouse Capture, Store, Refine Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 6 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 7. What’s Technically Different in Big Data Analytics Variety of data types and analytics require different schemas •  Data that uses a stable schema (structured) -  Data from packaged business processes with well-defined & known attributes (e.g., ERP data, Inventory Records, Supply Chain records, …) •  Data that has an evolving schema (semi-structured) -  Data generated by machine processes; known but changing set of attributes (e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …) •  Data that has a format, but no schema (unstructured) -  Data captured by machines with well-defined format, but no semantics (e.g., images, videos, web pages, PDF documents, …) -  Semantics can be extracted from raw data by interpreting the format and pulling out required data (e.g., shapes from video, face recognition in images, logo detection, …) -  Sometimes format data is accompanied by meta-data that can have (Stable Schema or Evolving Schema) – that needs to be classified and treated separately 7 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 8. Diversity of Data Processing and Analytics Unified Big Data Architecture Must Handle Each Workload Optimally •  Low cost storage and retention -  Retention of raw data in manner that can provide low TCO per terabyte storage costs -  Access in deep storage still required but not at same speeds as in a front line system •  Loading and refining -  Load: bring data into the system from the source system -  Pre-processing / prep/ cleansing / constraint validation: prepare data for downstream processing – e.g., fetch dimension data, record new incoming batch, archive old window batch, etc. -  Transformations: Convert one structure of data into another structure. This may require going from 3NF in relational to star/snowflake schema in Relational, or going from text to Relational, or going from Relational to Graph – I.e., structural transformations •  Reporting -  This is querying of what happened, where did it happen, how much happened, who did it •  Analytics (user-driven, interactive, ad-hoc) -  Relationship modeling that can be done via declarative SQL (e.g., scoring, basic stats) -  Relationship modeling done via procedural MR (E.g., model building, time series) 8 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 9. When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements by Data Type Loading and Refining Low Cost Analytics Storage & Data Pre-Processing, Reporting (User-driven, Retention Prep, Cleansing Transformations interactive) Financial analysis, ad-Hoc/OLAP Stable Teradata / Enterprise-wide BI and Reporting Teradata Teradata Teradata Teradata Schema Hadoop Spatial/Temporal (SQL analytics) Active Execution Interactive data discovery Aster Web clickstream, social feeds Aster Evolving Aster / Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop Set-top box analysis structured data) Analytics) CDRs, Sensor logs, JSON Image processing Aster Format, No Schema Hadoop Audio/video storage and refining Hadoop Hadoop (MapReduce Analytics) Storage and batch transformations 9 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 10. Aster Digital Marketing Client Custom Analytic Tools •  Segmentation: Custom SQL- Data by Client MR algorithms to match and create centralized identifiers •  Sessionize by client •  nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) •  Benefits: Cookie-level Raw Web Archival -  Marketing analysts more data Logs productive with Aster -  Lower cost - storage and batch refining done on Ad Server Logs Hadoop (on AWS) Amazon Elastic (Storage, aggregations, cleansing) MapReduce 10 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 11. More Accurate Customer Churn Prevention Hadoop captures, Aster does path stores and and sentiment transform images Social & analysis with Web data multi-structured and call records data Multi-Structured Raw Data Call Data Analysis Aster Call Center Hadoop Discovery + Voice Records Check Data Platform Marketing Automation Check Images Capture, Analytic Results Dimensional Data Retention (Customer & Retention Campaign) Traditional Data Flow Transformation Layer Data Sources ETL Tools Teradata Integrated DW 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 12. Bridging the Business Analyst Gap for Hadoop Data
  • 13. Aster SQL-H™ A Business User’s Bridge to Analyze Hadoop Data Aster SQL-H gives analysts and data scientists a better way to analyze data stored cheaply in Hadoop •  Allow standard ANSI SQL to Hadoop data •  Leverage existing BI tool investments •  Enable 50+ prebuilt SQL-MapReduce Apps and IDE •  Improve self-sufficiency for analysts going against Hadoop 13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 14. The Big Data Architecture Today Has Gaps Gap 1: Analysts Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. MapReduce (Processing) Discovery Active Data Gap 2: File system lacks Platform Warehouse optimizers, data locality, indexes Database and Analytic Processing Layer Data Storage and Refining Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 15. Analyst’s Goal: Get Insights from Data in Hadoop Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio Custom Code and Development SQL & SQL-MapReduce SQL MR, Pig, Hive Teradata Aster Teradata IT is the optimizer Discovery Platform IDW 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 16. Analytics on Hadoop Data with Aster SQL-H Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Aster MapReduce Portfolio Teradata Analytics Portfolio SQL-H SQL & MapReduce SQL & SQL-MapReduce SQL SQL Teradata Aster Teradata Discovery Platform IDW 16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 17. Aster SQL-H Integration with Hadoop Catalog A Business User’s Bridge to Analyzing Data in Hadoop •  Industry’s First Database Integration with Hadoop’s HCatalog Aster SQL-H •  Abstraction layer to easily and efficiently read structured & multi- structured data stored in HDFS Hadoop •  Uses Hadoop Catalog (HCatalog) to MapReduce perform data abstraction functions (e.g. automatically understands tables, data partitions) Hive HCatalog •  HDFS data presented to users as Aster tables Pig •  Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools HDFS 17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 18. Benefits of Aster SQL-H™ Deep metadata layer integration between Aster and Hadoop Business Analysts (Powerful analytics & Performance) •  50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio) -  Analytics on Hadoop data no longer requires expensive talent and training •  Simplified, SQL-based interface with Hadoop data structures (Hcatalog) -  No longer limited by Hive’s QL •  Interoperability with existing ecosystem & skillset -  BI tools (MSTR, Tableau, Cognos), ETL tools, SQL analysts, existing apps Architects and Administrators (Maintainability) •  Leverage existing DBA skill-sets without additional overhead •  Simplify administration and monitoring -  Competitors require manual creation and maintenance of metadata -  Less work and fewer errors -  Can do filtering with Aster; select data from HCatalog, leverage partitioning 18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 19. Aster MapReduce Portfolio: the App Store of Big Data Some of the 50+ out-of-the-box analytical apps Path Analysis Text Analysis Discover patterns in rows of Derive patterns and extract sequential data features in textual data Statistical Analysis Segmentation High-performance processing of Discover natural groupings of common statistical calculations data points Marketing Analytics Data Transformation Analyze customer interactions to Transform data for more optimize marketing decisions advanced analysis 19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  • 20. Summary •  Mainstream organizations need a unified big data architecture -  Best-of-breed with Hadoop, Aster, Teradata -  Brings “Data Science” to business analysts -  50+ business-ready MapReduce analytics and apps -  Enabled by SQL-MapReduce framework and new SQL-H •  Learn more - asterdata.com/mapreduce •  Download - developer.teradata.com/aster •  Breakout Session : Thursday – 4:30 pm How does SQL-H work ? Sushil Thomas, Teradata Aster 20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.