SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
1
METADA T A AND T HE PO WER
OF PAT T ERN- F I NDI NG
M A Y 2 4 , 2 0 1 6 F O R D A T A V E R S I T Y
LEON GUZENDA
Chief Technology Marketing Officer
2
A G E N D A
• Who We Are
• Open Source Big & Fast Data Analytics
• Our Core Technology & New Product
• Pattern Finding Examples
• Q & A
O B J E C T I V I T Y , I N C .
4
O B J E C T I V I T Y I N C . O V E R V I E W
• Private company, headquartered in Silicon Valley since 1988
• Verticals:
• Government: Intelligence, defense, crime detection & prevention
• Financial Services
• Industrial Internet of Things (IIoT)
• Energy
• Healthcare
• Horizontals:
• Graph analytics
• Complex, distributed, scalable database applications
SAMPLE CUSTOMERS AND
PARTNERSCapital
Intensive
Customers
Government
Customers
Telco &
Network
Customers
Technology
Partners
SI
Partners
5
O P E N S O U R C E B I G & F A S T
D A T A A N A L Y T I C S
OPEN SOURCE ANALYTICS...
[Fall 2016]
,R
Proprietary Rules,
Ontologies, Queries...
Reports,
Archives...
Workflow Design
GUI
Proprietary
...OPEN SOURCE ANALYTICS
PROS:
• Large community
• Lots of algorithms
• Model works at scale
• Low startup costs
• Cost effective
CONS:
• Most algorithms are based on
statistical correlation, clustering or
filtering
• Graph algorithms mainly tackle
theoretical problems
• Hadoop mostly targets files, not
metadata.
• Metadata tools focus on technical
parameters, not semantic content.
• Vertex, Edge and Triplet operations
• Graph modification operations
• RDD join operations
• Adjacent triplet operations
• Iterative graph-parallel operations
• Page rank, connected, triangle counts etc.
APACHE SPARK GRAPHX API
• Vertex, Edge and Triplet operations
• Graph modification operations
• RDD join operations
• Adjacent triplet operations
• Iterative graph-parallel operations
• Page rank, connected, triangle counts etc.
Spark GraphFrames add
Motifs (a simple subgraph
definition)
APACHE SPARK GRAPHX API
• Vertex, Edge and Triplet operations
• Graph modification operations
• RDD join operations
• Adjacent triplet operations
• Iterative graph-parallel operations
• Page rank, connected, triangle counts etc.
Spark GraphFrames add
Motifs (a simple subgraph
definition)
BUT
Efficient pathfinding and
complex navigation are
inhibited because of a
table/triplet approach.
APACHE SPARK GRAPHX API
O U R C O R E T E C H N O L O G Y
13
O U R F O C U S
• Complex Objects at scale:
• Relationships are first class citizens
• Ultra-fast navigation and pathfinding
• Not restricted by available RAM
• Scalability, performance, reliability and flexibility:
• Distributed database and distributed processing
• Light, small database kernel - from embedded to cluster to cloud
14
• 1,000’s of trillions of unique objects
• 1,000’s of petabytes of storage
• Resolving an ID fast and regardless
of the number of objects
D I S T R I B U T E D D A T A - S I N G L E L O G I C A L V I E W
Put the data and processing where it’s needed
15
Put the data and processing where it’s needed
D I S T R I B U T E D P R O C E S S I N G
ThingSpan
Cache
Client Processes
T H I N G S P A N
T H I N G S P A N E N V I R O N M E N T
• Uses Apache Spark open source processing engine
• In partnership with Cloudera, Databricks, HortonWorks and MapR
• Powerful object and relationship modeling
• Can store data in HDFS and/or POSIX
• Ultra-fast graph navigation, pathfinding and pattern finding
• REST Server and API for loading data and performing graph analytics
• Spark DataFrame support to leverage MLlib, GraphX, SQL etc.
T H I N G S P A N F E A T U R E S
D I S T R I B U T E D P R O C E S S I N G &
D A T A B A S E
Hadoop Distributed File System
Distributed from top to bottom
OPEN SOURCE ANALYTICS STACK
[Fall 2016]
,R
Proprietary Rules,
Ontologies, Queries...
Reports,
Archives...
Workflow Design
GUI
Proprietary
THINGSPAN ENHANCED ANALYTICS STACK
[Later this year]
T H I N G S P A N C O M P O N E N T S
P A T T E R N F I N D I N G
• Conventional Business Intelligence Analytics: Uses filters and statistical correlation to find relationships
between parameters.
• Graph Pattern Finding Analytics: Uses a combination of outlier, navigational and pathfinding queries.
• Find outliers with SQL or MLlib
• Navigational query can specify Vertex and Edge types to be included/excluded and can invoke
methods during the traversal, e.g. to compute transit time to a node.
• Pathfinding query can find shortest or all paths between two or more Vertices.
• Query type order depends upon the problem
P A T T E R N F I N D I N G T E C H N I Q U E S
CITY
LINK
• Mode
• Duration
• Cost
P A T H - F I N D I N G Q U E R Y
• Problem: Find the least expensive route between San Francisco and New
York for a 60 ton, very wide load that must arrive by Saturday and
minimizes mode transitions (road/rail/water etc.)
• Implied: We can avoid Rail connections.
• Financial: Money Laundering Detection
• Intelligence Analysis: Threat Detection
• AdTech: Recommendation Engine Support
• Industrial Internet of Things (IIoT): Network Congestion Analysis
P A T T E R N F I N D I N G E X A M P L E S
1. Load Person, Account and Transaction data into ThingSpan
$
$
$
$
$
$
$
$
🏡🏡
F I N A N C I A L : M O N E Y L A U N D E R I N G D E T E C T I O N
P1
Acc 1
Acc 2
Acc 22
Acc 23
Acc 24
Acc 35
Acc 21
Acc 31
Acc 32
Acc 33
Acc 20
P2 P3
$
2. Identify people with more than 5 accounts (centrality)
$ $
$
$
$
$
$
$
$
🏡🏡 🏡🏡
F I N A N C I A L : A P P L Y S P A R K G R A P H X
Acc 1
Acc 2
P1 P2
Acc 20
Acc 21
Acc 22
Acc 23
Acc 24
Acc 35
P3
Acc 31
Acc 32
Acc 33
3. Look at all of that person's transactions to see if they terminate in just 1 or 2 offshore accounts
$ $$
$
$
$
$
$
4. INVESTIGATE
🏡🏡 🏡🏡
F I N A N C I A L : A P P L Y A N A V I G A T I O N A L Q U E R Y
Acc 1
Acc 2
P2
Acc 20
Acc 21
Acc 22
Acc 23
Acc 24
Acc 35
Acc 31
Acc 32
Acc 33
P1 P3
$
1. Load People, Calls, Places and Sightings into the Graph
Seen2Seen1
PlaceZ
Seen3
Seen4
H U M I N T : T H R E A T D E T E C T I O N
P1 P2 P3 P5
P6 P7 P8
P9
P1
0
P1
2
P1
3
P1
1
P1
4
P1
5
P1
6
P1
8
P1
7 PlaceX
PlaceY
CDR1 CDR2 CDR3
CDR4 CDR5
CDR7
CDR13
CDR15 CDR16
CDR14
CDR6
CDR12
CDR10
CDR8
CDR11
CDR9
CDR17
2. Use Spark GraphX to find "islands" of callers/callees.
P3CDR1 CDR1
CDR1 CDR1
CDR1
CDR1
CDR1 CDR1
P1
7
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1 CDR2 CDR3
CDR4 CDR5 CDR6
CDR7
CDR8
CDR9
CDR10
CDR11 CDR12
CDR13 CDR14
CDR15 CDR16
H U M I N T : A P P L Y S P A R K G R A P H X
P1 P2
P6
P1
0
P1
6
P1
1
P7 P8
P1
4
P9
P1
2
P1
3
P1
5
P5
P1
8
CDR17
3. Use a navigational query to see if any of those People have been seen
near Places that need to be protected.
PlaceX
CDR1 CDR1
CDR1 CDR1
CDR1
CDR1
CDR1 CDR1
P1
7
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
Seen2Seen1
CDR2 CDR3
CDR4 CDR5 CDR6
CDR7
CDR8
CDR9 CDR10
CDR11 CDR12
CDR13 CDR14
CDR15 CDR16
PlaceY PlaceZ
Seen3
Seen4 CDR17
H U M I N T : A P P L Y A N A V I G A T I O N A L Q U E R Y
P1 CDR1 P2 P3 P5
P6
P1
0
P1
1
P7 P8
P9
P1
6
P1
4
P1
2
P1
3
P1
5
P1
8
CDR1
CDR1
4. P14 and P15 have been seen near potential target PlaceX, so
they plus P11, P7 and P8 should be put under surveillance.
PlaceX
CDR1 CDR1 CDR1
CDR1
CDR1
CDR1
CDR1 CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
Seen2Seen1
CDR2 CDR3
CDR4 CDR5 CDR6
CDR7
CDR8
CDR9 CDR10
CDR11 CDR12
CDR13 CDR14
CDR15 CDR16
PlaceZSeen4
H U M I N T : P L A N A C T I O N
P1 P2
P6
P3
P7 P8
P5
P9
P1
2
PlaceY
Seen3
P1
0
P1
6
P1
3
P1
7
CDR17
P1
8
P1
1
P1
4
P1
5
Joe Fred Mary Jane
1. Load Products, Orders, People and Social_Links into ThingSpan.
Bill
A D T E C H : P R E - P L A N N E D A D S
Pr
1
Pr
2
Pr
3
Pr
4
Pr
5
Pr
6
Sale2 Sale3 Sale4 Sale5
Follows Follows Follows
Sale1
Joe Fred Mary
2. We want to place adds for Product Pr2
Bill
A D T E C H : P R E - P L A N N E D A D S
Pr
2
Pr
4
Pr
5
Pr
6
Sale1 Sale2 Sale3 Sale4 Sale5
Follows Follows Follows
Jane
Pr
1
Pr
3
Joe Fred Mary Jane
3. Use ThingSpan to find bloggers who bought Pr2 and who also have followers.
Bill
Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.
A D T E C H : W H O F O L L O W S B U Y E R S O F T H E P R O D U C T ?
Pr1 Pr2
Pr
3
Pr
4
Pr
5
Pr
6
Sale1 Sale2 Sale3 Sale4 Sale5
Follows
Follows
Follows
Joe Fred Mary Jane
4. Next time you spot Mary, Jane or Bill, display a personalized Ad for Pr2.
Bill
Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.
💥💥
Buy
1!
A D T E C H : D I S P L A Y T H E A D
Pr1 Pr2
Pr
3
Pr
4
Pr
5
Pr
6
Sale1 Sale2 Sale3 Sale4 Sale5
Follows
Follows
Follows
1. Load Location, Equipment, Link (+Load) into the graph
20% 20%
95%
65%
20%
50%
30%
25%
Link 2
Link 3
Link 4
Link 5 Link 7
Link 8
Link 9
Link 1
Off
Link 6
SAN JOSE SALT LAKE CITY CHICAGO NEW YORK
I I O T : T E L C O N E T W O R K C O N G E S T I O N
L1 L2 L3 L4
E1
E2
E3
E20
E21
E22
E30
E31
E32
E33
E40
2. Use Spark SQL to find links that are over 90% loaded.
20%
95%
65%
20%
50%
30%
Off 25%
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
Link 1
Link 5
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : A P P L Y S P A R K S Q L
L1 L2 L3 L4
E1
E2
E3
E20
E21
E22
E31
E32
E33
E4020% E30
3. Use a graph query to find the leaf nodes (branch ends)...
20% 20%
95%
65%
20%
50%
30%
25%
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
Link 1
Link 5
Off
... Then Investigate...
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : A P P L Y A T H I N G S P A N N A V I G A T I O N A L Q U E R Y
L1 L2 L3 L4
E1 E20 E30 E40
E31E21E2
E3 E22 E32
E33
20% 20%
95%
65%
20%
50%
30%
25%
4. Aha! E2 and E3 in San Jose are streaming 8K UHDTV
video movies from MovieFlix in New York, overloading Link 6.
Link 1
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
Off
Link 5
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : D I A G N O S E
L1 L2 L3 L4
E1 E20 E30 E40
E31E21E2
E3 E22 E32
E33
20% 20%
50%
65%
20%
50%
30%
25%
5. Solved - by switching on Link 5.
Link 1
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
45%
Link 5
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : F I X
L1 L2 L3 L4
E1 E20 E30 E40
E2 E21 E31
E3 E22 E32
E33
S U M M A R Y
• Open Source Big & Fast Data analytics tools are great at what they're
designed for.
• ThingSpan adds a Metadata Store and scalable graph analytics
• Ultra-fast navigation and pathfinding queries.
• It can interoperate with streaming systems and Big Data platforms
• ThingSpan is extensible to other open source systems
QUESTIONS?
Info@objectivity.com
408-992-7100

Más contenido relacionado

Destacado

Fermars field School_ Facilitation Skill
Fermars field School_ Facilitation SkillFermars field School_ Facilitation Skill
Fermars field School_ Facilitation Skill
Tapan Maity
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data Management
DATAVERSITY
 

Destacado (12)

Lançamentos Editora Pensamento Cultrix abril 2016
Lançamentos Editora Pensamento Cultrix abril 2016Lançamentos Editora Pensamento Cultrix abril 2016
Lançamentos Editora Pensamento Cultrix abril 2016
 
Mobile world 2014
Mobile world 2014Mobile world 2014
Mobile world 2014
 
Sex, lies & innovation
Sex, lies & innovationSex, lies & innovation
Sex, lies & innovation
 
Fermars field School_ Facilitation Skill
Fermars field School_ Facilitation SkillFermars field School_ Facilitation Skill
Fermars field School_ Facilitation Skill
 
Insight Investments Overview
Insight Investments OverviewInsight Investments Overview
Insight Investments Overview
 
Skultsje 1 | Wetterwalden Butenfjild | ROC Friese Poort | Centrum Duurzaam
Skultsje 1 | Wetterwalden Butenfjild | ROC Friese Poort | Centrum Duurzaam Skultsje 1 | Wetterwalden Butenfjild | ROC Friese Poort | Centrum Duurzaam
Skultsje 1 | Wetterwalden Butenfjild | ROC Friese Poort | Centrum Duurzaam
 
Sydney JTBD meetup - the 4 forces of progress
Sydney JTBD meetup - the 4 forces of progressSydney JTBD meetup - the 4 forces of progress
Sydney JTBD meetup - the 4 forces of progress
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data Management
 
女性のためのキャリアセミナー 自分のためにはたらこう
女性のためのキャリアセミナー 自分のためにはたらこう女性のためのキャリアセミナー 自分のためにはたらこう
女性のためのキャリアセミナー 自分のためにはたらこう
 
Química computacional - Treball de recerca- Pau Bosch Cabot
Química computacional - Treball de recerca- Pau Bosch CabotQuímica computacional - Treball de recerca- Pau Bosch Cabot
Química computacional - Treball de recerca- Pau Bosch Cabot
 
Successful Data Governance Models and Frameworks
Successful Data Governance Models and FrameworksSuccessful Data Governance Models and Frameworks
Successful Data Governance Models and Frameworks
 
Model-Driven Development of Semantic Mashup Applications with the Open-Source...
Model-Driven Development of Semantic Mashup Applications with the Open-Source...Model-Driven Development of Semantic Mashup Applications with the Open-Source...
Model-Driven Development of Semantic Mashup Applications with the Open-Source...
 

Similar a Metadata and the Power of Pattern-Finding

From Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dotsFrom Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dots
Ronald Ashri
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
Ronald Ashri
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 

Similar a Metadata and the Power of Pattern-Finding (20)

Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Piano rubyslava final
Piano rubyslava finalPiano rubyslava final
Piano rubyslava final
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
 
Tfm slides
Tfm slidesTfm slides
Tfm slides
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
Choosing the Right Database
Choosing the Right DatabaseChoosing the Right Database
Choosing the Right Database
 
WALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics StackWALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics Stack
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]
 
MOUG17: Visualizing Air Traffic with Oracle APEX and Raspberry PI
MOUG17: Visualizing Air Traffic with Oracle APEX and Raspberry PIMOUG17: Visualizing Air Traffic with Oracle APEX and Raspberry PI
MOUG17: Visualizing Air Traffic with Oracle APEX and Raspberry PI
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Analisis Text Mining tentang #Papua di Twitter
Analisis Text Mining tentang #Papua di TwitterAnalisis Text Mining tentang #Papua di Twitter
Analisis Text Mining tentang #Papua di Twitter
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQL
 
From Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dotsFrom Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dots
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 

Más de DATAVERSITY

The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Más de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Metadata and the Power of Pattern-Finding

  • 1. 1 METADA T A AND T HE PO WER OF PAT T ERN- F I NDI NG M A Y 2 4 , 2 0 1 6 F O R D A T A V E R S I T Y LEON GUZENDA Chief Technology Marketing Officer
  • 2. 2 A G E N D A • Who We Are • Open Source Big & Fast Data Analytics • Our Core Technology & New Product • Pattern Finding Examples • Q & A
  • 3. O B J E C T I V I T Y , I N C .
  • 4. 4 O B J E C T I V I T Y I N C . O V E R V I E W • Private company, headquartered in Silicon Valley since 1988 • Verticals: • Government: Intelligence, defense, crime detection & prevention • Financial Services • Industrial Internet of Things (IIoT) • Energy • Healthcare • Horizontals: • Graph analytics • Complex, distributed, scalable database applications
  • 5. SAMPLE CUSTOMERS AND PARTNERSCapital Intensive Customers Government Customers Telco & Network Customers Technology Partners SI Partners 5
  • 6. O P E N S O U R C E B I G & F A S T D A T A A N A L Y T I C S
  • 7. OPEN SOURCE ANALYTICS... [Fall 2016] ,R Proprietary Rules, Ontologies, Queries... Reports, Archives... Workflow Design GUI Proprietary
  • 8. ...OPEN SOURCE ANALYTICS PROS: • Large community • Lots of algorithms • Model works at scale • Low startup costs • Cost effective CONS: • Most algorithms are based on statistical correlation, clustering or filtering • Graph algorithms mainly tackle theoretical problems • Hadoop mostly targets files, not metadata. • Metadata tools focus on technical parameters, not semantic content.
  • 9. • Vertex, Edge and Triplet operations • Graph modification operations • RDD join operations • Adjacent triplet operations • Iterative graph-parallel operations • Page rank, connected, triangle counts etc. APACHE SPARK GRAPHX API
  • 10. • Vertex, Edge and Triplet operations • Graph modification operations • RDD join operations • Adjacent triplet operations • Iterative graph-parallel operations • Page rank, connected, triangle counts etc. Spark GraphFrames add Motifs (a simple subgraph definition) APACHE SPARK GRAPHX API
  • 11. • Vertex, Edge and Triplet operations • Graph modification operations • RDD join operations • Adjacent triplet operations • Iterative graph-parallel operations • Page rank, connected, triangle counts etc. Spark GraphFrames add Motifs (a simple subgraph definition) BUT Efficient pathfinding and complex navigation are inhibited because of a table/triplet approach. APACHE SPARK GRAPHX API
  • 12. O U R C O R E T E C H N O L O G Y
  • 13. 13 O U R F O C U S • Complex Objects at scale: • Relationships are first class citizens • Ultra-fast navigation and pathfinding • Not restricted by available RAM • Scalability, performance, reliability and flexibility: • Distributed database and distributed processing • Light, small database kernel - from embedded to cluster to cloud
  • 14. 14 • 1,000’s of trillions of unique objects • 1,000’s of petabytes of storage • Resolving an ID fast and regardless of the number of objects D I S T R I B U T E D D A T A - S I N G L E L O G I C A L V I E W Put the data and processing where it’s needed
  • 15. 15 Put the data and processing where it’s needed D I S T R I B U T E D P R O C E S S I N G ThingSpan Cache Client Processes
  • 16. T H I N G S P A N
  • 17. T H I N G S P A N E N V I R O N M E N T
  • 18. • Uses Apache Spark open source processing engine • In partnership with Cloudera, Databricks, HortonWorks and MapR • Powerful object and relationship modeling • Can store data in HDFS and/or POSIX • Ultra-fast graph navigation, pathfinding and pattern finding • REST Server and API for loading data and performing graph analytics • Spark DataFrame support to leverage MLlib, GraphX, SQL etc. T H I N G S P A N F E A T U R E S
  • 19. D I S T R I B U T E D P R O C E S S I N G & D A T A B A S E Hadoop Distributed File System Distributed from top to bottom
  • 20. OPEN SOURCE ANALYTICS STACK [Fall 2016] ,R Proprietary Rules, Ontologies, Queries... Reports, Archives... Workflow Design GUI Proprietary
  • 21. THINGSPAN ENHANCED ANALYTICS STACK [Later this year]
  • 22. T H I N G S P A N C O M P O N E N T S
  • 23. P A T T E R N F I N D I N G
  • 24. • Conventional Business Intelligence Analytics: Uses filters and statistical correlation to find relationships between parameters. • Graph Pattern Finding Analytics: Uses a combination of outlier, navigational and pathfinding queries. • Find outliers with SQL or MLlib • Navigational query can specify Vertex and Edge types to be included/excluded and can invoke methods during the traversal, e.g. to compute transit time to a node. • Pathfinding query can find shortest or all paths between two or more Vertices. • Query type order depends upon the problem P A T T E R N F I N D I N G T E C H N I Q U E S
  • 25. CITY LINK • Mode • Duration • Cost P A T H - F I N D I N G Q U E R Y • Problem: Find the least expensive route between San Francisco and New York for a 60 ton, very wide load that must arrive by Saturday and minimizes mode transitions (road/rail/water etc.) • Implied: We can avoid Rail connections.
  • 26. • Financial: Money Laundering Detection • Intelligence Analysis: Threat Detection • AdTech: Recommendation Engine Support • Industrial Internet of Things (IIoT): Network Congestion Analysis P A T T E R N F I N D I N G E X A M P L E S
  • 27. 1. Load Person, Account and Transaction data into ThingSpan $ $ $ $ $ $ $ $ 🏡🏡 F I N A N C I A L : M O N E Y L A U N D E R I N G D E T E C T I O N P1 Acc 1 Acc 2 Acc 22 Acc 23 Acc 24 Acc 35 Acc 21 Acc 31 Acc 32 Acc 33 Acc 20 P2 P3 $
  • 28. 2. Identify people with more than 5 accounts (centrality) $ $ $ $ $ $ $ $ $ 🏡🏡 🏡🏡 F I N A N C I A L : A P P L Y S P A R K G R A P H X Acc 1 Acc 2 P1 P2 Acc 20 Acc 21 Acc 22 Acc 23 Acc 24 Acc 35 P3 Acc 31 Acc 32 Acc 33
  • 29. 3. Look at all of that person's transactions to see if they terminate in just 1 or 2 offshore accounts $ $$ $ $ $ $ $ 4. INVESTIGATE 🏡🏡 🏡🏡 F I N A N C I A L : A P P L Y A N A V I G A T I O N A L Q U E R Y Acc 1 Acc 2 P2 Acc 20 Acc 21 Acc 22 Acc 23 Acc 24 Acc 35 Acc 31 Acc 32 Acc 33 P1 P3 $
  • 30. 1. Load People, Calls, Places and Sightings into the Graph Seen2Seen1 PlaceZ Seen3 Seen4 H U M I N T : T H R E A T D E T E C T I O N P1 P2 P3 P5 P6 P7 P8 P9 P1 0 P1 2 P1 3 P1 1 P1 4 P1 5 P1 6 P1 8 P1 7 PlaceX PlaceY CDR1 CDR2 CDR3 CDR4 CDR5 CDR7 CDR13 CDR15 CDR16 CDR14 CDR6 CDR12 CDR10 CDR8 CDR11 CDR9 CDR17
  • 31. 2. Use Spark GraphX to find "islands" of callers/callees. P3CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 P1 7 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR2 CDR3 CDR4 CDR5 CDR6 CDR7 CDR8 CDR9 CDR10 CDR11 CDR12 CDR13 CDR14 CDR15 CDR16 H U M I N T : A P P L Y S P A R K G R A P H X P1 P2 P6 P1 0 P1 6 P1 1 P7 P8 P1 4 P9 P1 2 P1 3 P1 5 P5 P1 8 CDR17
  • 32. 3. Use a navigational query to see if any of those People have been seen near Places that need to be protected. PlaceX CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 P1 7 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 Seen2Seen1 CDR2 CDR3 CDR4 CDR5 CDR6 CDR7 CDR8 CDR9 CDR10 CDR11 CDR12 CDR13 CDR14 CDR15 CDR16 PlaceY PlaceZ Seen3 Seen4 CDR17 H U M I N T : A P P L Y A N A V I G A T I O N A L Q U E R Y P1 CDR1 P2 P3 P5 P6 P1 0 P1 1 P7 P8 P9 P1 6 P1 4 P1 2 P1 3 P1 5 P1 8
  • 33. CDR1 CDR1 4. P14 and P15 have been seen near potential target PlaceX, so they plus P11, P7 and P8 should be put under surveillance. PlaceX CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 Seen2Seen1 CDR2 CDR3 CDR4 CDR5 CDR6 CDR7 CDR8 CDR9 CDR10 CDR11 CDR12 CDR13 CDR14 CDR15 CDR16 PlaceZSeen4 H U M I N T : P L A N A C T I O N P1 P2 P6 P3 P7 P8 P5 P9 P1 2 PlaceY Seen3 P1 0 P1 6 P1 3 P1 7 CDR17 P1 8 P1 1 P1 4 P1 5
  • 34. Joe Fred Mary Jane 1. Load Products, Orders, People and Social_Links into ThingSpan. Bill A D T E C H : P R E - P L A N N E D A D S Pr 1 Pr 2 Pr 3 Pr 4 Pr 5 Pr 6 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows Sale1
  • 35. Joe Fred Mary 2. We want to place adds for Product Pr2 Bill A D T E C H : P R E - P L A N N E D A D S Pr 2 Pr 4 Pr 5 Pr 6 Sale1 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows Jane Pr 1 Pr 3
  • 36. Joe Fred Mary Jane 3. Use ThingSpan to find bloggers who bought Pr2 and who also have followers. Bill Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's. A D T E C H : W H O F O L L O W S B U Y E R S O F T H E P R O D U C T ? Pr1 Pr2 Pr 3 Pr 4 Pr 5 Pr 6 Sale1 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows
  • 37. Joe Fred Mary Jane 4. Next time you spot Mary, Jane or Bill, display a personalized Ad for Pr2. Bill Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's. 💥💥 Buy 1! A D T E C H : D I S P L A Y T H E A D Pr1 Pr2 Pr 3 Pr 4 Pr 5 Pr 6 Sale1 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows
  • 38. 1. Load Location, Equipment, Link (+Load) into the graph 20% 20% 95% 65% 20% 50% 30% 25% Link 2 Link 3 Link 4 Link 5 Link 7 Link 8 Link 9 Link 1 Off Link 6 SAN JOSE SALT LAKE CITY CHICAGO NEW YORK I I O T : T E L C O N E T W O R K C O N G E S T I O N L1 L2 L3 L4 E1 E2 E3 E20 E21 E22 E30 E31 E32 E33 E40
  • 39. 2. Use Spark SQL to find links that are over 90% loaded. 20% 95% 65% 20% 50% 30% Off 25% Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 Link 1 Link 5 SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : A P P L Y S P A R K S Q L L1 L2 L3 L4 E1 E2 E3 E20 E21 E22 E31 E32 E33 E4020% E30
  • 40. 3. Use a graph query to find the leaf nodes (branch ends)... 20% 20% 95% 65% 20% 50% 30% 25% Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 Link 1 Link 5 Off ... Then Investigate... SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : A P P L Y A T H I N G S P A N N A V I G A T I O N A L Q U E R Y L1 L2 L3 L4 E1 E20 E30 E40 E31E21E2 E3 E22 E32 E33
  • 41. 20% 20% 95% 65% 20% 50% 30% 25% 4. Aha! E2 and E3 in San Jose are streaming 8K UHDTV video movies from MovieFlix in New York, overloading Link 6. Link 1 Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 Off Link 5 SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : D I A G N O S E L1 L2 L3 L4 E1 E20 E30 E40 E31E21E2 E3 E22 E32 E33
  • 42. 20% 20% 50% 65% 20% 50% 30% 25% 5. Solved - by switching on Link 5. Link 1 Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 45% Link 5 SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : F I X L1 L2 L3 L4 E1 E20 E30 E40 E2 E21 E31 E3 E22 E32 E33
  • 43. S U M M A R Y • Open Source Big & Fast Data analytics tools are great at what they're designed for. • ThingSpan adds a Metadata Store and scalable graph analytics • Ultra-fast navigation and pathfinding queries. • It can interoperate with streaming systems and Big Data platforms • ThingSpan is extensible to other open source systems