SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Rajesh Bordawekar
IBM T. J. Watson Research Center
bordaw@us.ibm.com
Cognitive Database: An Apache
Spark-Based AI-Enabled
Relational Database System
#AI5SAIS
Outline
• Word Embedding Overview
• Cognitive Database Design
• Cognitive Intelligence (CI) Queries
• Spark Implementation Details
• Case Study: Image and Text Database
• Summary
2#AI5SAIS
#AI5SAIS
Word Embedding Overview
• Unsupervised neural network based NLP approach to
capture meanings of words based neighborhood context
– Meaning is captured as collective contributions from words in the
neighborhood
• Generates semantic representation of words as low-
dimensional vectors (200-300 dimensions)
• Semantic similarity measured using distance metric
(e.g., cosine distance) between vectors
3#AI5SAIS
#AI5SAIS
Cognitive Database Key Ideas
• Uses dual view of relational data: tables and
meaningful text, with all relational entities mapped to
text, without loss of information
• Uses word-embedding approach to extract latent
information from database tables
• Classical Word embedding model extended to capture
constraints of the relational model (e.g., primary keys)
• Enables relational databases to capture and exploit
semantic contextual similarities
4#AI5SAIS
#AI5SAIS
Structured Data sources
Relational Tables
Word Embedding
Model
External
Unstructured
and
Structured
Data sources
Cognitive Intelligence Queries
in Structured Query Systems
Structured Results
Relational Tables
Pre-trained
Model
Model built from data
source being queried
Using Embedding Models
#AI5SAIS
#AI5SAIS
Cognitive Database Features
• Enables SQL-based information retrieval based on
semantic context, rather than, data values
• Unlike analytics databases, does not view database tables
as feature and model repositories
• Latent features exposed to users via standard SQL based
Cognitive Intelligence (CI) queries
• Users can invoke standard SQL queries using typed
relational variables over a semantic model built over
untyped strings
6#AI5SAIS
#AI5SAIS
7#AI5SAIS
#AI5SAIS
custID Date Merchant State Category Items Amount
custA 9/16 Whole Foods
NJ
Fresh Produce Bananas, Apples 200
custB
custC
custD
10/16
10/16
9/16
Target
Trader Joes
Walmart
Stationery
Fresh Produce
Stationery
Bananas, Oranges
Crayons, Pens, Notebooks
Crayons, Folders
60
80
25
NY
CT
NY
“custD 9/16 Walmart NY Stationery ‘Crayons, Folders’ 25”
Text representation of a table row
Words in the neighborhood contribute to the overall meaning of “custID”
custA
custC
custB
custD
For this relational view,
custA is similar to custC
custB is similar to custD
Words similar in meaning
closer in vector space
Customer Analytics Workload
Meaning vector
for every token
Cognitive Intelligence Queries
• Semantic Similarity/Dissimilarities
• Semantic Clustering
• Cognitive OLAP queries
• Inductive Reasoning queries
• Semantic Relational Operations
8#AI5SAIS
#AI5SAIS
Can work with externally trained models and over
multiple data types.
CI Query Example
9#AI5SAIS
#AI5SAIS
val result_df = spark.sql(s”””
SELECT VENDOR_NAME,
proximityCust_NameUDF(VENDOR_NAME, ‘$v’)
AS proximityValue FROM Index_view
HAVING proximityValue > 0.5
ORDER BY proximityValue DESC
”””)
CI similarity Query: Find similar entities to a given entity
(VENDOR_NAME) based on transaction characteristic similarities
Cognitive UDF
• Operates on relational
variables. Can be sets or
sequences
• For each input variable,
fetches vectors from the
embedding model
• Computes semantic
similarity between vectors
using nearest neighbor
approaches
Cognitive Database Applications
• Analysis over multi-modal data (Retail, Health,
Insurance)
• Entity similarity queries (Customer Analytics, IT
Ticket Management, Time-series)
• Cognitive OLAP (Finance, Insurance…)
• Entity Resolution (Master Data Management)
• Analysis of time-series data (IoT, Health)
10#AI5SAIS
#AI5SAIS
Cognitive Databases Stages
VectorDomain
Learned Vectors
Pre-computed
External Learned
Vectors
UDFs
External Text
Sources
Tokenized
Relations
Relational
Tables
Relational
System Tables
CI Queries Relations
Cognitive ETL Vector Storage Query Execution
RelationalTextDomain
#AI5SAIS
#AI5SAIS
Training from source database
Relational
Tables
Data
Cleaning
Training Text File
Word Embedding
Training
Word Embedding
Model
Create unique tokens
(Python)
k-means clustering
(Numpy/Scipy)
Create unique tokens
(Python)
text numerical values images
Get image tags
(Watson VRS)
Create image features
Window size
Vector Dimensions
Hyperparameter Tuning*
#AI5SAIS
#AI5SAIS
Why Spark?
2.2.0 Dataframes-based Representation
Spark SQL based Cognitive Intelligence Queries
(IBM Z zOS/zLinux, IBM P Linux,AIX, x86)
Relational Databases CSV Files …. JSON
Database Community Data Science Community
Spark SQL+UDFs
(Scala/Python)
PySpark/Pandas APIs
via Jupyter
…..
GPU
Acceleration
Flexibility over
multiple input
data formats
Portability
across multiple
platforms/OS
Support for
Standardized
SQL Queries
Usability
across multiple
user domains
Opportunities for
Acceleration
#AI5SAIS
#AI5SAIS
14#AI5SAIS
#AI5SAIS
Cognitive Database: Spark Execution Flow
SELECT X.custID, X.custName,
proximityAvg(X.InvestType,Y.InvestType)
FROM cust X, cust Y
WHERE Y.custID=‘471’ AND
proximityAvg(X.InvestType,Y.InvestType)
LIMIT 5
SQL Query
Similarity
Computation
Output TableTrained ModelInput Table
Spark SQL UDF
Nearest Neighbor
Spark
DF
Spark DF
Specialized
Word Embedding
Spark
SQL
Source
Data
Spark
DF
Invoking Cognitive Database in Jupyter
#AI5SAIS
#AI5SAIS
16#AI5SAIS
#AI5SAIS
Picture ID National Park Country Path of JPEG Image
PK_01 Corbett India ./Img_Folder/Img_01.JPEG
PK_05 Kruger South Africa ./Img_Folder/Img_05.JPEG
PK_09 Sunderbans India ./Img_Folder/Img_09.JPEG
PK_11 Serengeti Tanzania ./Img_Folder/Img_11.JPEG
Picture
Id
Image Id National Park Country Animal Name Class Dietary Habit color
PK_01 Img_01.JPEG Corbett India Elephant Mammal Herbivores Gray
PK_05 Img_05.JPEG Kruger South Africa Rhinoceros Mammal Herbivores Gray
PK_09 Img_09.JPEG Sunderbans India Crocodile Reptile Carnivorous Gray
PK_11 Img_11.JPEG Serengeti Tanzania Lion Mammal Carnivorous Yellow
Case Study: Application Database with links to images
Internal Training database with features extracted from linked images
The above merged data is used as an input to train the word embedding model that generates embeddings
of each unique token based on the neighborhood. Each row of the database is viewed as a sentence.
17#AI5SAIS
#AI5SAIS
CI Semantic Clustering Query: Find all images whose similarity to user chosen images of [lion,
vulture, shark] using the attributeSimAvg UDF with similarity score greater than 0.75
SELECT X.imagename, X.classA, X.classB, X.classC, X.classD,
FROM ImageDataTable X
WHERE (X.imagename <> ’n01314663_7147.jpeg’) AND (X.imagename <> ’n01323781_13094.jpeg’) AND
(X.imagename <> ’n01314663_8531.jpeg’) AND
(attributeSimAvgUDF(’n01314663_7147.jpeg’, ’n01323781_13094.jpeg’, ’n01314663_8531.jpeg’, X.imagename) > 0.75)
18#AI5SAIS
#AI5SAIS
X.Imagename X.classB X.classC X.classD
n01604330_12473 bird_of_prey,
mammal
new_world_vulture,
carnivore
andean_condor, condor, sloth_bear
n01316422_1684 mammal,
bird_of_prey
carnivore, eagle glutton_wolverine, piste_ski_run,
downhill_skiing, ern, ski_slope
n01324431_7056 bird_of_prey,
mammal
new_world_vulture,
carnivore
andean_condor, tayra
n01604330_12473 n01316422_1684
Output
n01324431_7056
19#AI5SAIS
#AI5SAIS
CI Analogy Query: Find all images whose classD satisfies the analogy query [reptile:
monitor_lizard :: aquatic_vertebrate : ?] using analogyQuery UDF having similarity score
greater than 0.5.
SELECT X.imagename, X.classA, X.classB, X.classC, X.classD
FROM ImageDataTable X
WHERE (analogyQuery(’reptile’,’monitor_lizard’,’aquatic_vertebrate’,X.classD,1) > 0.5)
X.Imagename X.classB X.classC X.classD
n02512053_1493 aquatic_vertebrate spiny_finned_fish permit, archerfish
n02512053_3292 aquatic_vertebrate spiny_finned_fish archerfish, mojarra
n02512053_602 aquatic_vertebrate spiny_finned_fish lookdown, permit
20#AI5SAIS
#AI5SAIS
CI Query using external knowledge base: Find all images of animals whose classD similarity score
to the Concept of ‘‘Hypercarnivore" of Wikipedia using proximityAvgForExtKB UDF is greater than
0.5. Exclude images that are already tagged as carnivore, herbivore, omnivore or scavenger.
SELECT X.imagename,X.classA,X.classB,X.classC, X.classD
FROM ImageDataTable X
WHERE
(proximityAvgAdvForExtKB(’CONCEPT_Hypercarnivore’,
X.classD) > 0.5)
ORDER BY SimScore DESC
Summary
• Novel relational database system that uses word
embedding approach to enable semantic queries in SQL
• Spark-based implementation that loads data from a
variety of sources and invokes Cognitive Intelligence
queries using Spark SQL
• Demonstration of the cognitive database capabilities
using a multi-modal (text+image) dataset
• Illustration of seamlessly integrating AI capabilities into
relational database ecosystem
21#AI5SAIS
#AI5SAIS
References
• Bordawekar and Shmueli, Enabling Cognitive Intelligence Queries in
Relational Databases using Low-dimensional Word Embeddings,
arXiv:1603.07185, March 2016
• Bordawekar, Bandopadhyay, and Shmueli, Cognitive Database: A Step
Towards Endowing Relational Databases with Artificial Intelligence
Capabilities, arXiv:1712:07199, December 2017
22#AI5SAIS
#AI5SAIS

Más contenido relacionado

La actualidad más candente

Azure Data services
Azure Data servicesAzure Data services
Azure Data servicesRajesh Kolla
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseGrant Fritchey
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
Azure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersAzure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersMichaela Murray
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBAdnan Hashmi
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos dbRatan Parai
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewJustin Munsters
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...Microsoft Tech Community
 

La actualidad más candente (20)

Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Azure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersAzure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginners
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DB
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos db
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
 

Similar a Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System with Rajesh Bordawekar

Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft AzureDmitry Petukhov
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RFayan TAO
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxSonuShaw16
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutesKaren Lopez
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseBrendan Tierney
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 
Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsDataStax Academy
 
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...Jim Czuprynski
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScyllaDB
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedOmid Vahdaty
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQLMichael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
 

Similar a Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System with Rajesh Bordawekar (20)

Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft Azure
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with R
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptx
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
Presentation
PresentationPresentation
Presentation
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle Database
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide Rows
 
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...
Conquer Big Data with Oracle 18c, In-Memory External Tables and Analytic Func...
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 

Último (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System with Rajesh Bordawekar

  • 1. Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System #AI5SAIS
  • 2. Outline • Word Embedding Overview • Cognitive Database Design • Cognitive Intelligence (CI) Queries • Spark Implementation Details • Case Study: Image and Text Database • Summary 2#AI5SAIS #AI5SAIS
  • 3. Word Embedding Overview • Unsupervised neural network based NLP approach to capture meanings of words based neighborhood context – Meaning is captured as collective contributions from words in the neighborhood • Generates semantic representation of words as low- dimensional vectors (200-300 dimensions) • Semantic similarity measured using distance metric (e.g., cosine distance) between vectors 3#AI5SAIS #AI5SAIS
  • 4. Cognitive Database Key Ideas • Uses dual view of relational data: tables and meaningful text, with all relational entities mapped to text, without loss of information • Uses word-embedding approach to extract latent information from database tables • Classical Word embedding model extended to capture constraints of the relational model (e.g., primary keys) • Enables relational databases to capture and exploit semantic contextual similarities 4#AI5SAIS #AI5SAIS
  • 5. Structured Data sources Relational Tables Word Embedding Model External Unstructured and Structured Data sources Cognitive Intelligence Queries in Structured Query Systems Structured Results Relational Tables Pre-trained Model Model built from data source being queried Using Embedding Models #AI5SAIS #AI5SAIS
  • 6. Cognitive Database Features • Enables SQL-based information retrieval based on semantic context, rather than, data values • Unlike analytics databases, does not view database tables as feature and model repositories • Latent features exposed to users via standard SQL based Cognitive Intelligence (CI) queries • Users can invoke standard SQL queries using typed relational variables over a semantic model built over untyped strings 6#AI5SAIS #AI5SAIS
  • 7. 7#AI5SAIS #AI5SAIS custID Date Merchant State Category Items Amount custA 9/16 Whole Foods NJ Fresh Produce Bananas, Apples 200 custB custC custD 10/16 10/16 9/16 Target Trader Joes Walmart Stationery Fresh Produce Stationery Bananas, Oranges Crayons, Pens, Notebooks Crayons, Folders 60 80 25 NY CT NY “custD 9/16 Walmart NY Stationery ‘Crayons, Folders’ 25” Text representation of a table row Words in the neighborhood contribute to the overall meaning of “custID” custA custC custB custD For this relational view, custA is similar to custC custB is similar to custD Words similar in meaning closer in vector space Customer Analytics Workload Meaning vector for every token
  • 8. Cognitive Intelligence Queries • Semantic Similarity/Dissimilarities • Semantic Clustering • Cognitive OLAP queries • Inductive Reasoning queries • Semantic Relational Operations 8#AI5SAIS #AI5SAIS Can work with externally trained models and over multiple data types.
  • 9. CI Query Example 9#AI5SAIS #AI5SAIS val result_df = spark.sql(s””” SELECT VENDOR_NAME, proximityCust_NameUDF(VENDOR_NAME, ‘$v’) AS proximityValue FROM Index_view HAVING proximityValue > 0.5 ORDER BY proximityValue DESC ”””) CI similarity Query: Find similar entities to a given entity (VENDOR_NAME) based on transaction characteristic similarities Cognitive UDF • Operates on relational variables. Can be sets or sequences • For each input variable, fetches vectors from the embedding model • Computes semantic similarity between vectors using nearest neighbor approaches
  • 10. Cognitive Database Applications • Analysis over multi-modal data (Retail, Health, Insurance) • Entity similarity queries (Customer Analytics, IT Ticket Management, Time-series) • Cognitive OLAP (Finance, Insurance…) • Entity Resolution (Master Data Management) • Analysis of time-series data (IoT, Health) 10#AI5SAIS #AI5SAIS
  • 11. Cognitive Databases Stages VectorDomain Learned Vectors Pre-computed External Learned Vectors UDFs External Text Sources Tokenized Relations Relational Tables Relational System Tables CI Queries Relations Cognitive ETL Vector Storage Query Execution RelationalTextDomain #AI5SAIS #AI5SAIS
  • 12. Training from source database Relational Tables Data Cleaning Training Text File Word Embedding Training Word Embedding Model Create unique tokens (Python) k-means clustering (Numpy/Scipy) Create unique tokens (Python) text numerical values images Get image tags (Watson VRS) Create image features Window size Vector Dimensions Hyperparameter Tuning* #AI5SAIS #AI5SAIS
  • 13. Why Spark? 2.2.0 Dataframes-based Representation Spark SQL based Cognitive Intelligence Queries (IBM Z zOS/zLinux, IBM P Linux,AIX, x86) Relational Databases CSV Files …. JSON Database Community Data Science Community Spark SQL+UDFs (Scala/Python) PySpark/Pandas APIs via Jupyter ….. GPU Acceleration Flexibility over multiple input data formats Portability across multiple platforms/OS Support for Standardized SQL Queries Usability across multiple user domains Opportunities for Acceleration #AI5SAIS #AI5SAIS
  • 14. 14#AI5SAIS #AI5SAIS Cognitive Database: Spark Execution Flow SELECT X.custID, X.custName, proximityAvg(X.InvestType,Y.InvestType) FROM cust X, cust Y WHERE Y.custID=‘471’ AND proximityAvg(X.InvestType,Y.InvestType) LIMIT 5 SQL Query Similarity Computation Output TableTrained ModelInput Table Spark SQL UDF Nearest Neighbor Spark DF Spark DF Specialized Word Embedding Spark SQL Source Data Spark DF
  • 15. Invoking Cognitive Database in Jupyter #AI5SAIS #AI5SAIS
  • 16. 16#AI5SAIS #AI5SAIS Picture ID National Park Country Path of JPEG Image PK_01 Corbett India ./Img_Folder/Img_01.JPEG PK_05 Kruger South Africa ./Img_Folder/Img_05.JPEG PK_09 Sunderbans India ./Img_Folder/Img_09.JPEG PK_11 Serengeti Tanzania ./Img_Folder/Img_11.JPEG Picture Id Image Id National Park Country Animal Name Class Dietary Habit color PK_01 Img_01.JPEG Corbett India Elephant Mammal Herbivores Gray PK_05 Img_05.JPEG Kruger South Africa Rhinoceros Mammal Herbivores Gray PK_09 Img_09.JPEG Sunderbans India Crocodile Reptile Carnivorous Gray PK_11 Img_11.JPEG Serengeti Tanzania Lion Mammal Carnivorous Yellow Case Study: Application Database with links to images Internal Training database with features extracted from linked images The above merged data is used as an input to train the word embedding model that generates embeddings of each unique token based on the neighborhood. Each row of the database is viewed as a sentence.
  • 17. 17#AI5SAIS #AI5SAIS CI Semantic Clustering Query: Find all images whose similarity to user chosen images of [lion, vulture, shark] using the attributeSimAvg UDF with similarity score greater than 0.75 SELECT X.imagename, X.classA, X.classB, X.classC, X.classD, FROM ImageDataTable X WHERE (X.imagename <> ’n01314663_7147.jpeg’) AND (X.imagename <> ’n01323781_13094.jpeg’) AND (X.imagename <> ’n01314663_8531.jpeg’) AND (attributeSimAvgUDF(’n01314663_7147.jpeg’, ’n01323781_13094.jpeg’, ’n01314663_8531.jpeg’, X.imagename) > 0.75)
  • 18. 18#AI5SAIS #AI5SAIS X.Imagename X.classB X.classC X.classD n01604330_12473 bird_of_prey, mammal new_world_vulture, carnivore andean_condor, condor, sloth_bear n01316422_1684 mammal, bird_of_prey carnivore, eagle glutton_wolverine, piste_ski_run, downhill_skiing, ern, ski_slope n01324431_7056 bird_of_prey, mammal new_world_vulture, carnivore andean_condor, tayra n01604330_12473 n01316422_1684 Output n01324431_7056
  • 19. 19#AI5SAIS #AI5SAIS CI Analogy Query: Find all images whose classD satisfies the analogy query [reptile: monitor_lizard :: aquatic_vertebrate : ?] using analogyQuery UDF having similarity score greater than 0.5. SELECT X.imagename, X.classA, X.classB, X.classC, X.classD FROM ImageDataTable X WHERE (analogyQuery(’reptile’,’monitor_lizard’,’aquatic_vertebrate’,X.classD,1) > 0.5) X.Imagename X.classB X.classC X.classD n02512053_1493 aquatic_vertebrate spiny_finned_fish permit, archerfish n02512053_3292 aquatic_vertebrate spiny_finned_fish archerfish, mojarra n02512053_602 aquatic_vertebrate spiny_finned_fish lookdown, permit
  • 20. 20#AI5SAIS #AI5SAIS CI Query using external knowledge base: Find all images of animals whose classD similarity score to the Concept of ‘‘Hypercarnivore" of Wikipedia using proximityAvgForExtKB UDF is greater than 0.5. Exclude images that are already tagged as carnivore, herbivore, omnivore or scavenger. SELECT X.imagename,X.classA,X.classB,X.classC, X.classD FROM ImageDataTable X WHERE (proximityAvgAdvForExtKB(’CONCEPT_Hypercarnivore’, X.classD) > 0.5) ORDER BY SimScore DESC
  • 21. Summary • Novel relational database system that uses word embedding approach to enable semantic queries in SQL • Spark-based implementation that loads data from a variety of sources and invokes Cognitive Intelligence queries using Spark SQL • Demonstration of the cognitive database capabilities using a multi-modal (text+image) dataset • Illustration of seamlessly integrating AI capabilities into relational database ecosystem 21#AI5SAIS #AI5SAIS
  • 22. References • Bordawekar and Shmueli, Enabling Cognitive Intelligence Queries in Relational Databases using Low-dimensional Word Embeddings, arXiv:1603.07185, March 2016 • Bordawekar, Bandopadhyay, and Shmueli, Cognitive Database: A Step Towards Endowing Relational Databases with Artificial Intelligence Capabilities, arXiv:1712:07199, December 2017 22#AI5SAIS #AI5SAIS