SlideShare una empresa de Scribd logo
1 de 15
First Steps Towards a Data Lake:
Insight from Southwest Power Pool
Srinivas Kolluru and
Russell Mason
10/26/2016
Background
• Data stored in historical data store for data analysis and reporting
• Near real-time data feeds
• ETL data feeds
• More data than initial expectations
• Almost 50% of data is less frequently used
3
Background
9/23/2016World of Watson 2016
Near-term requirements – Offload less frequently used data
• SQL query access to the data
• Minimal changes to the business queries
Long-term vision – Data Lake infrastructure
• ETL and real-time data feeds
• Transactional and analytics workloads
• Cost economical model
• Should be able to add compute and storage incrementally
• Long-term partnership
4
Proof-of-Concept (PoC) Evaluation Criteria
9/23/2016World of Watson 2016
• Evaluated three different vendor technologies
• On-site PoC with three months of data and with a sample of business queries
• Six week evaluation of each vendor technology
• Chose BigInsights product
• Minimal business query modifications
• Support for Netezza functions
• Federated query capabilities between Netezza and BigInsights
• Partnership
5
Proof-of-Concept (PoC) Evaluation
9/23/2016World of Watson 2016
Data Lake Vision and Phase 1
Implementation
7
Data Lake Vision
9/23/2016World of Watson 2016
Data Zone
ETL Data
DB Data
Streaming
API Data
BLOB Data Metadata SecurityQuality GovernanceMonitoring
Landing
Zone
Data Extraction
Data Refinement
Data Discovery
Data Analytics
Business
Users
IT
Users
• Phase 1 – Offload less frequently used data, provide SQL query, and
federated query capabilities.
• Phase 2 – Query performance improvements, transactional capabilities,
and security controls
• Phase 3 – ETL data, streaming data, and governance
• Phase 4 – Real-time and BI analytics
8
Data Lake Implementation
9/23/2016World of Watson 2016
9
Phase1 - Data Offload Process
9/23/2016World of Watson 2016
Landing Zone
ORCORC
Data Zone
Data export & CRC
check validation
ORC table import and
CRC check validation
Metadata Security Monitoring
Source System
:
• Custom export process instead of using Fluid query or Scoop for data
transfer
• Fluid query
• Additional burden on source system
• Scoop
• Performance challenges with JDBC approach
• Limitations with incremental data pulls with external table approach
• Limitations with using database views
• Difficulty with data validation checks
• Difficulty with partitioning data based on record’s last update date
• Operational controls to minimize the impact on the source system
10
Architecture and Design Decisions
9/23/2016World of Watson 2016
• Data Integrity
• Row level CRC checks with data export and import processes
• ORC format
• Monthly data partitioning
• Seen better compression with ORC format compared to Parquet (Haven’t compared
the query performance with large sets of data)
• Separate HDFS storage cluster
• Scale storage independent of compute
• Align with the IT strategy and processes
• Expose the same set of files through multiple file system protocols
11
Architecture and Design Decisions
9/23/2016World of Watson 2016
12
Phase 1 Deployment Model
9/23/2016World of Watson 2016
Node
#1
Node
#2
Node
#3
Node
#4
Node
#5
Hadoop
Postgres
Metadata
Database
IBM BigInsights Cluster
Node
#1
Node
#2
Node
#3
HDFS Storage Cluster
BigInsights Component Deployment (Tentative)
Node #1
Zookeeper Server, Hive Metastore, HiveServer2, Knox Gateway, WebHCat
Server, BigInsights Home Server, BigSheets Master, BigSQL worker,
Metrics Monitoring, and Node Manager.
Node #2
Zookeeper Server, History Server, Hive Metastore, HiveServer2, Metrics
Collector, Active Resource Manager, WebHCat Server, Big SQL Worker,
Metrics Collector, and Node Manager.
Node #3
Zookeeper Server, App Timeline Server, Active Hbase Master, Hive
Metastore, HiveServer2, Oozie Server, Standby Resource Manager,
WebHCat Server, Big SQL Worker, Metrics Monitor, and Node Manager.
Node #4
Big SQL Head, Data Server Manager, Hive Metastore, HiveServer2,
WebHCat Server, Metrics Monitor, and Node Manager.
Node #5
BigSQL Secondary Head, Zookeeper Server, Hive Metastore, HiveServer2,
WebHCat Server, Region Server, HBaseREST Server, Metrics Monitor, and
Node Manager.
13
Phase 1 Production Deployment Model
9/23/2016World of Watson 2016
Node
#1
Node
#2
Node
#3
Node
#4
Node
#5
IBM BigInsights Cluster
Hadoop
Postgres
Metadata
Database
Node
#1
Node
#2
Node
#3
HDFS Storage Cluster
Node
#1
Node
#2
Node
#3
Node
#4
Node
#5
IBM BigInsights Cluster
Hadoop
Postgres
Metadata
Database
Node
#1
Node
#2
Node
#3
HDFS Storage Cluster
Data Center 1 Data Center 2
• Active-Active deployment across the data centers
• Evaluating IBM Big Replicate product
• Training for support teams
• Day light savings challenges
14
Challenges
9/23/2016World of Watson 2016
Thank You

Más contenido relacionado

La actualidad más candente

Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousingRob Winters
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
Big Data Testing Strategies
Big Data Testing StrategiesBig Data Testing Strategies
Big Data Testing StrategiesKnoldus Inc.
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyDatabricks
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityCloudera, Inc.
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeVasu S
 
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampDAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampNVISIA
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...DATAVERSITY
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThomas Kelly, PMP
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 

La actualidad más candente (20)

Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousing
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Big Data Testing Strategies
Big Data Testing StrategiesBig Data Testing Strategies
Big Data Testing Strategies
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricity
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampDAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 

Destacado

Big Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessBig Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessSeeling Cheung
 
Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Seeling Cheung
 
BigInsights For Telecom
BigInsights For TelecomBigInsights For Telecom
BigInsights For TelecomSeeling Cheung
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsSeeling Cheung
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Seeling Cheung
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and HadoopCynthia Saracco
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Cynthia Saracco
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab Cynthia Saracco
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark Cynthia Saracco
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM Cynthia Saracco
 
AddReality company overview
AddReality company overviewAddReality company overview
AddReality company overviewAddReality
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemHellmar Becker
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.Richard Vermillion
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdataPranab Ghosh
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 

Destacado (20)

Big Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessBig Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data Access
 
Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...
 
BigInsights For Telecom
BigInsights For TelecomBigInsights For Telecom
BigInsights For Telecom
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
AddReality company overview
AddReality company overviewAddReality company overview
AddReality company overview
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdata
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 

Similar a Southwest Power Pool big data case study

Offsite presentation original
Offsite presentation originalOffsite presentation original
Offsite presentation originalsally.de
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...CIARD Movement
 
Informix - The Ideal Database for IoT
Informix - The Ideal Database for IoTInformix - The Ideal Database for IoT
Informix - The Ideal Database for IoTPradeep Natarajan
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...DataStax
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Tuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraTuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraNenad Bozic
 
Historic Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianHistoric Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianInductive Automation
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCAbhijit Kumar
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
BDE Technical Webinar 1 : Requirements elicitation
BDE Technical Webinar 1 : Requirements elicitationBDE Technical Webinar 1 : Requirements elicitation
BDE Technical Webinar 1 : Requirements elicitationBigData_Europe
 
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?Precisely
 
Digital Archiving and Process Centralization for Financial and Insurance Comp...
Digital Archiving and Process Centralization for Financial and Insurance Comp...Digital Archiving and Process Centralization for Financial and Insurance Comp...
Digital Archiving and Process Centralization for Financial and Insurance Comp...soft Xpansion GmbH & Co. KG
 

Similar a Southwest Power Pool big data case study (20)

Offsite presentation original
Offsite presentation originalOffsite presentation original
Offsite presentation original
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...
 
Informix - The Ideal Database for IoT
Informix - The Ideal Database for IoTInformix - The Ideal Database for IoT
Informix - The Ideal Database for IoT
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Kanthaka - High Volume CDR Analyzer
Kanthaka - High Volume CDR AnalyzerKanthaka - High Volume CDR Analyzer
Kanthaka - High Volume CDR Analyzer
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Tuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraTuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache Cassandra
 
Historic Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianHistoric Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's Historian
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
BDE Technical Webinar 1 : Requirements elicitation
BDE Technical Webinar 1 : Requirements elicitationBDE Technical Webinar 1 : Requirements elicitation
BDE Technical Webinar 1 : Requirements elicitation
 
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?
 
Digital Archiving and Process Centralization for Financial and Insurance Comp...
Digital Archiving and Process Centralization for Financial and Insurance Comp...Digital Archiving and Process Centralization for Financial and Insurance Comp...
Digital Archiving and Process Centralization for Financial and Insurance Comp...
 

Último

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Último (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

Southwest Power Pool big data case study

  • 1. First Steps Towards a Data Lake: Insight from Southwest Power Pool Srinivas Kolluru and Russell Mason 10/26/2016
  • 3. • Data stored in historical data store for data analysis and reporting • Near real-time data feeds • ETL data feeds • More data than initial expectations • Almost 50% of data is less frequently used 3 Background 9/23/2016World of Watson 2016
  • 4. Near-term requirements – Offload less frequently used data • SQL query access to the data • Minimal changes to the business queries Long-term vision – Data Lake infrastructure • ETL and real-time data feeds • Transactional and analytics workloads • Cost economical model • Should be able to add compute and storage incrementally • Long-term partnership 4 Proof-of-Concept (PoC) Evaluation Criteria 9/23/2016World of Watson 2016
  • 5. • Evaluated three different vendor technologies • On-site PoC with three months of data and with a sample of business queries • Six week evaluation of each vendor technology • Chose BigInsights product • Minimal business query modifications • Support for Netezza functions • Federated query capabilities between Netezza and BigInsights • Partnership 5 Proof-of-Concept (PoC) Evaluation 9/23/2016World of Watson 2016
  • 6. Data Lake Vision and Phase 1 Implementation
  • 7. 7 Data Lake Vision 9/23/2016World of Watson 2016 Data Zone ETL Data DB Data Streaming API Data BLOB Data Metadata SecurityQuality GovernanceMonitoring Landing Zone Data Extraction Data Refinement Data Discovery Data Analytics Business Users IT Users
  • 8. • Phase 1 – Offload less frequently used data, provide SQL query, and federated query capabilities. • Phase 2 – Query performance improvements, transactional capabilities, and security controls • Phase 3 – ETL data, streaming data, and governance • Phase 4 – Real-time and BI analytics 8 Data Lake Implementation 9/23/2016World of Watson 2016
  • 9. 9 Phase1 - Data Offload Process 9/23/2016World of Watson 2016 Landing Zone ORCORC Data Zone Data export & CRC check validation ORC table import and CRC check validation Metadata Security Monitoring Source System :
  • 10. • Custom export process instead of using Fluid query or Scoop for data transfer • Fluid query • Additional burden on source system • Scoop • Performance challenges with JDBC approach • Limitations with incremental data pulls with external table approach • Limitations with using database views • Difficulty with data validation checks • Difficulty with partitioning data based on record’s last update date • Operational controls to minimize the impact on the source system 10 Architecture and Design Decisions 9/23/2016World of Watson 2016
  • 11. • Data Integrity • Row level CRC checks with data export and import processes • ORC format • Monthly data partitioning • Seen better compression with ORC format compared to Parquet (Haven’t compared the query performance with large sets of data) • Separate HDFS storage cluster • Scale storage independent of compute • Align with the IT strategy and processes • Expose the same set of files through multiple file system protocols 11 Architecture and Design Decisions 9/23/2016World of Watson 2016
  • 12. 12 Phase 1 Deployment Model 9/23/2016World of Watson 2016 Node #1 Node #2 Node #3 Node #4 Node #5 Hadoop Postgres Metadata Database IBM BigInsights Cluster Node #1 Node #2 Node #3 HDFS Storage Cluster BigInsights Component Deployment (Tentative) Node #1 Zookeeper Server, Hive Metastore, HiveServer2, Knox Gateway, WebHCat Server, BigInsights Home Server, BigSheets Master, BigSQL worker, Metrics Monitoring, and Node Manager. Node #2 Zookeeper Server, History Server, Hive Metastore, HiveServer2, Metrics Collector, Active Resource Manager, WebHCat Server, Big SQL Worker, Metrics Collector, and Node Manager. Node #3 Zookeeper Server, App Timeline Server, Active Hbase Master, Hive Metastore, HiveServer2, Oozie Server, Standby Resource Manager, WebHCat Server, Big SQL Worker, Metrics Monitor, and Node Manager. Node #4 Big SQL Head, Data Server Manager, Hive Metastore, HiveServer2, WebHCat Server, Metrics Monitor, and Node Manager. Node #5 BigSQL Secondary Head, Zookeeper Server, Hive Metastore, HiveServer2, WebHCat Server, Region Server, HBaseREST Server, Metrics Monitor, and Node Manager.
  • 13. 13 Phase 1 Production Deployment Model 9/23/2016World of Watson 2016 Node #1 Node #2 Node #3 Node #4 Node #5 IBM BigInsights Cluster Hadoop Postgres Metadata Database Node #1 Node #2 Node #3 HDFS Storage Cluster Node #1 Node #2 Node #3 Node #4 Node #5 IBM BigInsights Cluster Hadoop Postgres Metadata Database Node #1 Node #2 Node #3 HDFS Storage Cluster Data Center 1 Data Center 2
  • 14. • Active-Active deployment across the data centers • Evaluating IBM Big Replicate product • Training for support teams • Day light savings challenges 14 Challenges 9/23/2016World of Watson 2016