DataWorks Summit 2017 - Sydney Keynote
Madhu Kochar, Vice President, Analytics Product Development and Client Success, IBM
Data science holds the promise of transforming businesses and disrupting entire industries. However, many organizations struggle to deploy and scale key technologies such as machine learning and deep learning. IBM will share how it is making data science accessible to all by simplifying the use of a range of open source technologies and data sources, including high performing and open architectures geared for cognitive workloads.
2. Operationalizing Machine Learning and getting actionable insights has been
a huge challenge
Organization needs to act fast
ACT
NOW!
Operationalize Machine LearningData still lives in Silos
IBM Db2
3. Business Objective:
Drive top line growth and market share
Optimize Real-Time Marketing (RTM) and improve Return On
Investment (ROI)
Outdoor Equipment
Let’s meet Amy who works for
Outdoor Equipment Inc.
Amy
Marketing Director
Company:
Outdoor Equipment Inc. is a full-line sporting goods retailer
4. Amy wants to promote sales campaign
at targeted customers to increase
organization’s revenue
Sleeping Bags
Camping Chairs and Bedding
5. Ryan
Data Scientist
Amy needs to work with different teams who perform specific tasks
to execute the campaign
Amy
Marketing Director
Nick
Application Developer
Chris
Data Engineer
Product
details
Customer
details
Sales
campaign
Chris
Data Engineer
Operationalize
Machine
Learning
6. Ryan
Data Scientist
Nick
Application Developer
Federation Application IntegrationSpark Integration
With Big SQL, Amy’s team can self serve their requirement, save
time on execution and enhance productivity
Self Service
IBM Big SQL
Chris
Data Engineer
7. Big SQL Key Capabilities
Federation
and
Spark
Performance
Enterprise
and
Security
SQL
Compatibility
Relational
Databases
Leads
performance
metrics on high
volumes of data
and concurrent
streams
Role and Column
level Security
Ranger Integration
NoSQL
Object
Stores
8. PROCESSING
DATA
STORAGE
ACCESS
H o r t o n w o r k s
P o w e r S y s t e m s
E l a s t i c S t o r a g e S e r v e r
IBM
B i g S Q L
IBMIBM
3x Price-Performance Guaranteed
Get more performance with Power Systems
9. Find New Business Opportunities or Solve Business Problems using Big SQL
9
How do I get started?
Big SQL sandbox
Big SQL v5.0.1
NOW
Available on HDP v2.6.2
Try NOW!
10. Scaling Data Science
on Big Data
Date: Wed, 9/20 @ 11:00 AM
Room: C2.3
1 Ingesting Data at Blazing Speed using
Apache ORC
Data: Wed, 9/20 @ 4:20 PM
Room: C4.7
2
Open metadata and governance
with Apache Atlas
Date: Wed, 9/20 @ 5:10 PM
Room: C4.6
Empowering YOU with Democratized Data
Access, Data Science and Machine Learning
Date: Wednesday, 9/20 @ 6:00 PM
Room: C4.5
3 4
Breaching the 100TB mark
with SQL over Hadoop
Date: Thurs, 9/21 @ 2:20 PM
Room: C2.3
Birds-of a Feather: Apache Spark, Apache
Zeppelin and Data Science
Date: Thurs, 9/21 @ 6:00 PM
Room: C4.5
5 6
Thank you!
Check out the Breakout sessions
Visit IBM Booth for More Information!
11. Find more #DWS17 sessions and
slides at:
www.DataWorksSummit.com
Organizations understand the importance of machine learning and are exploring ways to implement it to improve their business. However every line of business has the challenge to find the best way of operationalizing machine learning for their business. Data gravity creates silos in the organization and it’s a challenge to bring all this data together for analyses.
Even if the data can be brought together, using an ML model with data requires special set of skills and development effort. After operationalizing the machine learning model, businesses want to take actions on the discovered insights. These actions can be of variety and demand integration and development efforts.
Businesses cant be agile and swiftly act on data unless these problems are tied together and addressed with a self service tool.
Lets meet Amy, who works for Outdoor Equipment Inc. Outdoor Equipment Inc is an sporting goods retailer. Amy works as a Marketing Director for this organization. Being an exec, her business objectives are to grow the business and her organization’s market share. She plans achieve her business goals by Real time marketing and improving ROI.
Based on competitive analysis, market trends and customer behaviors, Amy’s team has concluded that a prospective customer may convert into a paying customer if they are provided with proper incentive to shop. This key finding motivated Amy to come up with a sales campaign to send out product promotions to targeted users based on their interest in products. Amy is a well informed exec and understands the power of data science. She has decided to leverage it to get maximum ROI. She wants to put the right incentives in the hand of the right customer to convert them.
She has put together a plan to run a sales campaign for 3 months with a variety of products that are available in the store.
Amy has to work with different teams that perform specific set of tasks in order to execute the marketing campaign.
Chris is the data engineer that unifies the data which exists in different data platforms such as hadoop, db2 and other RDBMs. Chris pull all the data together into one single platform so that it can be used to operationalize the machine learning model to get predictions.
Ryan is the data scientiest that creates the ML model based on Amy’s requirement so as to recommend the product category that a customer would likely be interested into.
Once, Chris has used the ML model created by Ryan, they have a result set of customer and their interest.
Finaly Nick integrates the result with mail gun app to send out emails to targeted customers with product promotions.
This repeats everyday as the product promotions are refreshed and are extensive during seasonal sales.
With Big SQL, Amy’s team can start becoming self sufficient in operationalizing the assets on regular bases once they are created by Ryan and Nick.
Amy’s team can leverage Big SQL’s federation capabilities to connect and query data that is stored in separate data sources in a secured way as its setup by Chris. So now Chris doesn’t need to ingest and bring the data into single data location. With Federation and Predicate pushdown, only the data that matters, travels over the wire.
With Big SQL and Spark integration, Amy’s team can operationalize spark ML models without knowing the details of how Spark works or what Spark API’s.
Finally, Amy’s team can push out the discounted sales promotions that are refreshed every day to the customers by leveraging Big SQL’s capability to call applications developed by users.
Technical Meaning - Application is wrapped as a UDF and can be invoked by BigSQL
Let me show you in demo that how Eric who is a marketing analyst and works for Amy is able to operationalize this whole effort in just couple of SQL statements.
By using Big SQL, Amy’s team is more self-relaint in executing the marketing Campaign because of its capabilities to ties all these separate tasks together through a single tool. Amy still works with Ryan and Nick but only if she needs any changes in the assets.
After that exciting demo, I would like to summarize that how Big SQL can help you in making your team’s more productive and improve your business
Big SQL understands different sql dialects so you can leverage your existing skills on Oracle and Netezza to build application on Big SQL or import enterprise workloads on hadoop platform and run it as is without any change.
Big SQL’s can access remote databases and perform query pushdown to these federated data sources. Big SQL’s integrates with Spark Bi-directionally in memory to exchange data between Big SQL data sets and Spark Dataframes. This lets Big SQL call any Spark application and operationalize Spark ML models with enterprise data.
Big SQL exhibits high performance even when data scales upto 100TB with complex SQL queries. It comes with a work load manager that lets the enterprise do a lot of plumbing with resource allocation and workloads. Big SQL also has a proven track record to support many concurrent users without degrading performance.
Big SQL comes enterprise ready with build in security features and also integrates with Apache Ranger for centralized management of your hadoop environment.
Details:
SQL COMPATIBILITY
SQL Compatible with: netezza, oracle, db2, etc
Applications work as-is without any changes
FEDERATION AND SPARK:
Federates to more than 10 data sources: RDBMS, NoSQL and/or Object Stores
Integrates bi-directionally with Spark, like no other
Operationalizes ML models
PERFORMANCE
Exhibits high performance even when data scales up to 100TB with complex SQLs
Handles many concurrent users without relinquishing performance
ENTERPRISE & SECURITY
Secures data using SQL with roles
Integrates with Ranger for centralized management
We have some very exciting sessions lineup for you in this conference. Please attend these sessions to learn more. If you have questions about the demo or need any more information then please visit us at the IBM booth in the expo hall.