Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Roadmap for
Enterprise Graph Strategy
Michael Moore, Ph.D.
Executive Director, Enterprise Knowledge Graphs + AI
EY Perform...
The Database Landscape is Changing
SQL RDBMS
Column
Document Key Value
Graph
SearchServerlessStreams In-Memory
Traditional...
Scale Out  Scale Up Continued increase in capacity and dropping compute costs are challenging
scale-out commodity server ...
Rankings Change in Popularity (db-engines.com)
*Proprietary method based on general interest, mentions, relevance in socia...
“We send email to people, so they will visit
our website and buy our product”
A Database specifically designed for creatin...
This is a Graph.
6
This is a Graph.7
This is a Graph.
This is a Graph.
This is a Graph.
10
Graph Use Cases
► Customer 360°
► Recommendation
Engines
► Marketing Attribution
► Enterprise Search
► Fraud Detection
► M...
Real-Time, Evolving Graph View Across the Business
Data Ingestion, Cleansing, Reduction & Pipelining
Real-time BI & Scorec...
13Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scient...
Talk to the business, pick a graphy problem
What is a “Graphy” problem?
• Requires many entities (eg many SQL tables, 360°...
Get comfortable with Neo4j – don’t need to become an expert
• Get hands on – be fearless! Neo4j is the easiest graph datab...
Design and build your POC Graph
• Start small and simple, limit yourself to 3-4 data sources, shallow extracts.
Snapshot S...
Example Knowledge Graph Schema
for Spend and Supply Chain Analytics
Supplier 360°
Spend Graph
• Accurately captures the
so...
1
Example Customer 360° Graph Schema
Account
Transactions
Segments
Product
Interactions
18
Customer 360°
Graph
• Accuratel...
Example B2B MDM Graph Schema
Product
Core Data Elements
Customer
& Contact
Orders
19
Master Data
Management
Graph Schema
•...
Example Polyglot Discovery Graph Schema
Searchable Pointers to
Unstructured blobs
Text & Metrics from
Semi-Structured
data...
Design and build your POC Graph 21
Production BuildCloud PilotLocalhost POCGraphy Problem
• Breakthrough queries
• Graph a...
22Neo4j - Power BI Integration with GraphQL
Graph Database
Neo4j GraphQL API
2
3
4
1. Client issues GraphQL query
2. Graph...
Neo4j – React Integration with GraphQL (GRAND Stack)
23
Pick and build your demo application for your snapshot graph 24
Production BuildCloud PilotLocalhost POCGraphy Problem
• P...
Pick and build your demo application for your snapshot graph 25
• MVP data domains
• Graph database, app-informed
• Simple...
CRM
Reporting
(Tableau, PBI)
Blobs FilesQueuesTables
Azure Cloud Storage
AI Sandbox
(Azure ML Studio)
Stream ETL
(Azure Ev...
Reporting
(Tableau,QuickSight)
S3 Blobs FilesQueuesEBS Tables
AWS Cloud Storage
Data
Discovery
(AWS Athena)
Stream ETL
(AW...
Enterprise Knowledge Graph Development with Neo4j
• Locate and validate data lake tables
• Design test graph schema
• Esti...
Polyglot Graph Data Processing
Extract XML,
Convert to JSON,
Load JSON with
Azure Blob URI
Extract and Load
Azure Blob URI...
Production BuildCloud PilotLocalhost POCGraphy Problem
Go to Production 30
• Follow your IT best practices
• Security, ass...
31Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scient...
EY Cross-Sector Graph Experience: MDM, 360°, AML/Fraud, Recommenders 32
Fortune 100 Tech Company
Use Case:
Global B2B Acco...
Better Questions
How can I get more business value and deeper
insights from the data I already have?
How can I get a bette...
Michael Moore, Ph.D.
Executive Director
► Michael Moore is an Executive Director and Practice Lead for Graph + AI
in EY’s ...
Próxima SlideShare
Cargando en…5
×

Your Roadmap for An Enterprise Graph Strategy

68 visualizaciones

Publicado el

Michael Moore, Ph.D., Executive Director, Knowledge Graphs + AI, EY National Advisory

Publicado en: Tecnología
  • Sé el primero en comentar

Your Roadmap for An Enterprise Graph Strategy

  1. 1. Roadmap for Enterprise Graph Strategy Michael Moore, Ph.D. Executive Director, Enterprise Knowledge Graphs + AI EY Performance Improvement Advisory michael.moore4@ey.com July 30, 2019
  2. 2. The Database Landscape is Changing SQL RDBMS Column Document Key Value Graph SearchServerlessStreams In-Memory Traditional Databases & Data Warehousing NoSQL Databases Data Services & Data Processing Batch MR Blockchain 2
  3. 3. Scale Out  Scale Up Continued increase in capacity and dropping compute costs are challenging scale-out commodity server assumptions, particularly for database workloads 20182017 2019 12TB RAM 2019 24TB RAM
  4. 4. Rankings Change in Popularity (db-engines.com) *Proprietary method based on general interest, mentions, relevance in social networks, frequency of technical discussions etc. Graph DBs 4
  5. 5. “We send email to people, so they will visit our website and buy our product” A Database specifically designed for creating, storing, and querying graphs MATCH (e:Email)-[:SENT_TO]-> (p:Person {fullName: ’Steve Newman'})-[:VISITED]-> (w:Website)<-[:SOLD_ON]-(pr:Product)<-[:PURCHASED]-(p) RETURN * Semantic Representation Graph Representation Physical Representation ► Graphs have all possible logical relationships precomputed, much, much faster than SQL ► Graphs are fast and easy understand, develop and use ► Graphs integrate well with applications and data sources, great for real-time digital workloads ► Graphs surface, unify and mobilize data held in silos and data lakes What is a Graph Database? 5
  6. 6. This is a Graph. 6
  7. 7. This is a Graph.7
  8. 8. This is a Graph.
  9. 9. This is a Graph.
  10. 10. This is a Graph. 10
  11. 11. Graph Use Cases ► Customer 360° ► Recommendation Engines ► Marketing Attribution ► Enterprise Search ► Fraud Detection ► Master Data Management ► Supply Chain ► Geolocation & Routing ► Access & Asset Control ► Social Networks ► IT & Network Management 11
  12. 12. Real-Time, Evolving Graph View Across the Business Data Ingestion, Cleansing, Reduction & Pipelining Real-time BI & ScorecardsMobile & Web Applications Data Science access control, metadata, recos, monitoring KPIs, targets, reporting, drill down/across attribution, similarity, fraud, pathing, cliques Marketing ROI & Digital Experience (CMO) Data Governance & Data Quality (CDO) Operations & Risk Management (CFO) Account Coverage & Customer LTV (CRO) Product Marketing & Recommendations (CPO) UNSTRUCTURED LEGACY SNAPSHOTS CONFORMED & CURATED STREAMS Graphs Accelerate Enterprise Data Mobilization 12
  13. 13. 13Roadmap for Enterprise Graph Strategy Small Team: • Graph Architect • Data Engineer • Full-stack Developer • Data Scientist • Report Developer Problem / Scope What will the graph solve? Production BuildCloud PilotLocalhost POCGraphy Problem Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX Stakeholder Input Graph Design Data Work APIs / Data Services Integration / Refinement Scale / Harden / Run Validate What questions can now be answered? Connect Does the data support the graph model and semantics? Mobilize What data does the new experience need? Use Cases What is the feedback from the business on how well the graph solves the use case? Deploy What monitoring, testing, process needs to be put in place to achieve a robust SLA? Key Conversations
  14. 14. Talk to the business, pick a graphy problem What is a “Graphy” problem? • Requires many entities (eg many SQL tables, 360° views) • Involves recursion (eg. SQL self joins) • Has complex, potentially colliding, hierarchies (eg SQL 1 to many, many-to-many) • Based on informatics of the relationships themselves (eg collaborative filtering shared relationship counts, shortest path segment summations for wayfinding, cost/time minimization for supply chain, money flows for finance) • Requires mapping, direct or indirect across data sources (eg data lake unification) • Demands fast query results (eg digital applications, search) • Most importantly, go talk to the business – what are the analytics you’d like to have or customer experiences you’d like to light up – but can’t because of our current data limitations? • What’s the most critical data that you’d like to see connected? • What would be an example demo that you’d find compelling (report/analysis/experience) 14 Production BuildCloud PilotLocalhost POCGraphy Problem
  15. 15. Get comfortable with Neo4j – don’t need to become an expert • Get hands on – be fearless! Neo4j is the easiest graph database to learn. • Install Neo4j, Apoc procedures, set the following in Manage/Settings #Apoc Plugin Configurations apoc.import.file.enabled=true apoc.export.file.enabled=true dbms.security.procedures.unrestricted=*.* • Go through the Cypher lessons, and learn basics graph modeling and to load csv LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS row CALL apoc.load.csv(url,{}) YIELD map • Any reasonably sized laptop should be able to handle a graph with several million nodes and relationships You will quickly see some of the significant benefits of connected data. • For extra credit you can go onto github/neo4j-examples and download starter applications for your favorite languages. 15 Production BuildCloud PilotLocalhost POCGraphy Problem
  16. 16. Design and build your POC Graph • Start small and simple, limit yourself to 3-4 data sources, shallow extracts. Snapshot SQL top queries for a pool of linked transactions • Use common sense, business-friendly naming for your node labels and relationship types. You’ll iterate this model using input from the business, and the model should be clear and readable • Don’t be afraid of recursion (Employee)-[:REPORTS_TO]->(Employee) who is the boss? • Don’t get too hung up on whether something should be a node label, property, or relationship. Just keep in mind that node labels define set members, and that it’s faster to search along relationships (traversal) than properties (full graph scan) • You can use call db.schema() to see the graph schema, and we often use http://apcjones.com/arrows/# to build illustrative schemas for conversations with business stakeholders • Test your graph design by writing some example queries, do this with your business stakeholder • Does this look right to you – is this how you would whiteboard this process? Am I missing any key entities or relationships? 16 Production BuildCloud PilotLocalhost POCGraphy Problem
  17. 17. Example Knowledge Graph Schema for Spend and Supply Chain Analytics Supplier 360° Spend Graph • Accurately captures the sourcing complexity of products and services • Enables more insightful indirect spend analytics for products and services • Reconciles line-item detail to top parent company, across intermediate entities • Extensible for audit, fraud detection, tracking & traceability • Integrates with data lake, reporting platforms and transactional applications Product Supply Chain Service Providers Procurement Top Parent Line Item Detail Tracking and Traceability Invoicing Data fabric composed of nodes and relationships that connect and mobilize data, using consistent semantics 17
  18. 18. 1 Example Customer 360° Graph Schema Account Transactions Segments Product Interactions 18 Customer 360° Graph • Accurately captures full range of customer touchpoints across enterprise surface area • Enables more insightful indirect spend analytics for products and services • Reconciles product usage, marketing interactions and digital identity • Integrates with execution layer for AI driven UX
  19. 19. Example B2B MDM Graph Schema Product Core Data Elements Customer & Contact Orders 19 Master Data Management Graph Schema • Accurately captures data lineage for core identity components • Provides ”Golden Record” from multi-source probabilistic authority scores • Relates contacts, customers, orders and products without loss of fidelity • Enables detailed whitespace analysis and next best sales action • Integrates with data lake and CRM applications
  20. 20. Example Polyglot Discovery Graph Schema Searchable Pointers to Unstructured blobs Text & Metrics from Semi-Structured data Structured Data and Derived Entities 20 Data Discovery Graph Schema • Connects structured, semi- structured and unstructured data across polyglot storage • Accurately handles complex data and documents hierarchies • Enables full text search in graph or in document store, directly and via NLP • Provides source document access through blob URLs • Integrates with data lake, reporting platforms and transactional applications
  21. 21. Design and build your POC Graph 21 Production BuildCloud PilotLocalhost POCGraphy Problem • Breakthrough queries • Graph algorithms • Data unification & mobilization • Use-case specific (Customer 360, Supply Chain, Fraud, Reco) • Make a localhost graph->app stack so you understand how parameterized Cypher & Bolt drivers work • Use any of the neo4j-examples to jumpstart • If you don’t want to spend time creating a REST API, check out GraphQL and the GRAND stack (https://github.com/grand- stack/grand-stack-starter) • Focus on the business value of the new graph enabled analytics – We can now know this to make better decisions We can now do this for our customers
  22. 22. 22Neo4j - Power BI Integration with GraphQL Graph Database Neo4j GraphQL API 2 3 4 1. Client issues GraphQL query 2. GraphQL API sends Cypher query to Neo4j 3. Response data sent to Client 4. Data updated in PBI report GraphQL schema, registered in Neo4j m query cURL wrapper PBI report 1
  23. 23. Neo4j – React Integration with GraphQL (GRAND Stack) 23
  24. 24. Pick and build your demo application for your snapshot graph 24 Production BuildCloud PilotLocalhost POCGraphy Problem • Pick a cloud or on-prem • Use Marketplace images if possible • Start with a single instance VM for Neo4j, (~ RAM 50% of SQL size) • Attach external drives so you can scale the server • Determine your stack architecture • Understand your data processing requirements • Install Python – very good for performing batch operations, pip neo4j-driver • Leverage Neo4j’s high speed loader • Determine what cleansing needs to occur • If you need help reach out to SI partner or Neo4j services
  25. 25. Pick and build your demo application for your snapshot graph 25 • MVP data domains • Graph database, app-informed • Simplest data service • MVP app experience • Add new experiences, same data • Add new data domains Nodejs, .Net, Python, React, Swift, Tableau, etc. REST, Bolt Production BuildCloud PilotLocalhost POCGraphy Problem Michael’s I-Frame model For Graph ROI  Accelerate Graph-driven User Experiences
  26. 26. CRM Reporting (Tableau, PBI) Blobs FilesQueuesTables Azure Cloud Storage AI Sandbox (Azure ML Studio) Stream ETL (Azure Event Hub) Audience Manager Campaign Target Experience Manager Analytics Marketo Engage Adobe Experience Cloud Scheduled ETL Data Reduction (Azure Spark) Cloud Data Lake In-Memory Document Store Data Models (Azure Analysis Services) Data Catalog (Azure Data Catalog) ERP AZURE VPC In-Memory Knowledge Graph Data Services APIs REST Ingest Batch StoreIngest Real-time SearchConsolidate Connect & Unify Mobilize Semantic Layer Analytics Layer Azure Data Factory Automated Reports and Dashboards Consistent Metrics Data Discovery Retention Models Deep Learning In-Memory Sessionization Data Aggregation Syndicated Data and Analytics Knowledge Graph Customer/Contact 360° View Marketing Attribution Recommendations Real-time Document Search Elastic SQL Repository for Curated & Conformed Data Data Staging Elastic Repository for Raw and Unstructured Data Real Time Updates Customer Events Automated Data Loading Triggered Marketing Consistent Experience Example Graph Architecture Execution
  27. 27. Reporting (Tableau,QuickSight) S3 Blobs FilesQueuesEBS Tables AWS Cloud Storage Data Discovery (AWS Athena) Stream ETL (AWS Kinesis) Audience Manager Campaign Target Experience Manager Analytics Marketo Engage Adobe Experience Cloud (Azure) Scheduled ETL (AWS Data Pipeline, PDI Kettle) Data Reduction (AWS EMR) Cloud Data Lake In-Memory Document Store Machine Learning (AWS SageMaker) Data Catalog (AWS Glue) ERP AWS VPC In-Memory Knowledge Graph Data Services APIs REST Ingest Batch StoreIngest Real-time SearchConsolidate Connect & Unify Mobilize Execution Semantic Layer Analytics Layer Example Graph Architecture Automated Reports and Dashboards Retention Models Deep Learning Data Discovery Consistent Data Models Sessionization Data Aggregation Knowledge Graph Customer/Contact 360° View Marketing Attribution Recommendations Real-time Document Search Elastic SQL Repository for Curated & Conformed Data Data Staging Elastic Repository for Raw and Unstructured Data CRM Real Time Updates Customer Events Automated Data Loading Triggered Marketing Consistent Experience Syndicated Data and Analytics
  28. 28. Enterprise Knowledge Graph Development with Neo4j • Locate and validate data lake tables • Design test graph schema • Estimate graph size from nodes, relationships and properties • Configure Neo4j server to minimize SSD disk contention • Prepare Hive queries to generate graph-form tables (nodes, relationships) • Validate key uniqueness, string handling, character types, relationship mappings • Export graph form tables to gzip csv files • Iteratively test data loader scripts, file by file • On successful completion of hydration, apply constraints and indexes, refactor as needed Graph-form TablesData Lake Tables CSV.gz Files Load Script Data Store EXTRACT EXTRACT HIGH SPEED LOADER IMPORT DONE in 1h 29m 16s 530ms. Imported: 458356377 nodes 2176603843 relationships 9064981812 properties Peak memory usage: 9.46 GB 28
  29. 29. Polyglot Graph Data Processing Extract XML, Convert to JSON, Load JSON with Azure Blob URI Extract and Load Azure Blob URIs Extract and Load • Document Metadata • Named Entities • Map Relationships • Text Summaries Graph Analytics & Queries Couchbase Full Text Search Pointers to Azure Blob URIs Leveraging fit-for-purpose storage: Graph storage for unified many-to-many access to cross-domain data Document storage for searchable access to semi-structured data Blob storage repository for large, raw and unstructured data 37,157 blobs 5.5 TB Unstructured: Semi-Structured: Load CSV to Graph Structured: 20,573 JSONs 5 GB Reports/Applications Data Mobilization and Graph Unification – Full Lineage and Auditability 215K nodes & relationships 1.5 GB 29
  30. 30. Production BuildCloud PilotLocalhost POCGraphy Problem Go to Production 30 • Follow your IT best practices • Security, assume you’ll be breached • Deploy full environment set – Prod cluster, Stg cluster, Test, Dev • DevOps - leverage Jenkins, Ansible • Wrap your solution in test automation • Do load testing against your APIs to look for additional optimization opportunities (Gatling) • Monitor your logs (Splunk, Dynatrace) • Monitor your common queries, refactor or reindex as needed, optimize for speed • Leverage the I-Frame Model to provide more value
  31. 31. 31Roadmap for Enterprise Graph Strategy Small Team: • Graph Architect • Data Engineer • Full-stack Developer • Data Scientist • Report Developer Problem / Scope What will the graph solve? Production BuildCloud PilotLocalhost POCGraphy Problem Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX Stakeholder Input Graph Design Data Work APIs / Data Services Integration / Refinement Scale / Harden / Run Validate What questions can now be answered? Connect Does the data support the graph model and semantics? Mobilize What data does the new experience need? Use Cases What is the feedback from the business on how well the graph solves the use case? Deploy What monitoring, testing, process needs to be put in place to achieve a robust SLA? Key Conversations
  32. 32. EY Cross-Sector Graph Experience: MDM, 360°, AML/Fraud, Recommenders 32 Fortune 100 Tech Company Use Case: Global B2B Account 360° view and marketing attribution Approach: Neo4j graph with 500M nodes and 2.2B relationships, representing all known business accounts, contacts and marketing touches. Mastered data from 17disparate transactional sources in Azure Data Lake. Supported in- graph analytics for marketing attribution and next best action recommendations across global geographies Duration: 16 weeks to working graph Fortune 100 Footwear Company Use Case: Converged Brick & Mortar + Online Shopper 360° View Approach: Neo4j graph with 2B nodes and relationships, representing sales transactions for 40M shoppers across 275 physical stores and the ecommerce platform. Algorithmic extraction and profiling from raw XML records in AWS Hadoop, MDM record concordance and in- graph analytics for product associations, store analytics and recommendation services. Duration: 12 weeks to working graph, ongoing project through 2018 Fortune 500 Cruise Line Company Use Case: Shipboard and Shoreside Recommendation Engine Approach: Neo4j graph deployable to shipboard VM Ware data centers, with streaming updates from large shoreside Neo4j graph integrating data from Azure Cerebro, Adobe Experience Manager and legacy transactional systems. In-graph analytics,services API, recommendation engine for next best activity for passengers surfaced via mobile app Duration: 12 weeks to working graph, ongoing project through 2018 Fortune 100 Investment Firm Use Case: Enhanced Anti-Money Laundering and Fraud Detection using Graph+AI Approach: Neo4j graph of account 360° view representing activity of 2M accounts over 4 years. MDM and entity extraction for account and party identity elements from enterprise Oracle system. Network clustering, feature engineering and graph embedding in TensorFlow deep learning classifier for suspicious activity patterns across accounts and between parties. Duration: 16 weeks to working graph Fortune 100 Tech Company Use Case: B2B Local Marketing Events Recommendation Engine Approach: Neo4j graph and personalized next best event recommendation engine for B2B field marketers. Reconciles physical and digital event attendees with corporate account structures for 10K accounts and 5M contacts Entities mastered from transactional data in SQLServer and Azure Data Lake. Microservices APIs support data syndication to martech applications and PowerBI reporting. Duration: 10 weeks to working graph
  33. 33. Better Questions How can I get more business value and deeper insights from the data I already have? How can I get a better understanding of my customers to create more relevant experiences? How can I more effectively mobilize and syndicate the data I’m ingesting? What is the next best action I can take? Thank You! 33
  34. 34. Michael Moore, Ph.D. Executive Director ► Michael Moore is an Executive Director and Practice Lead for Graph + AI in EY’s Tech Consulting Emerging Technology (ET) Group ► Joined EY in 2017, based in the Seattle, WA office ► Ph.D. University of California, Berkeley ► B.S. & B.A. University of California, Santa Cruz ► Society Consulting – Graph Architect Schema, ETL & systems design for a high-performance Neo4j graph database encompassing the totality of Microsoft’s B2B data on Azure VM. Graph database supports multi-touch marketing attribution analytics and multi-dimensional event-based audience segmentation & recommendations for direct marketing. Provided POC graph reporting and visualization interfaces. Neo4j Enterprise edition, Python, Node.js, nGraph, Javascript. ► Microsoft Corporation – General Manager Management of core BI infrastructure and measurement capabilities supporting Microsoft's global marketing budget cascade, campaign reporting, pipeline reporting, incentive reporting, ROMI reporting, social and web analytics on Microsoft.com for the Global Marketing Operations team. Management of complex projects across multiple subsidiaries, agencies and vendors. Strategic focus on foundational database, digital and social marketing capabilities including: marketing ROI, customer & channel partner engagement, marketing conversion, sales pipeline, dynamic personalization, data mining, predictive modeling, behavioral segmentation, privacy governance, web enablement, tracking & measurement, and internal & external data quality, and instrumentation process control. ► Grey San Francisco – VP Analytics Responsible for ongoing campaign reporting, ROI analysis, creative and placement optimizations for agency clients. Architected and deployed an enterprise OLAP reporting solution on Oracle RAC / Microstrategy to improve quality and efficiency of analytics operations. Provided advanced analytical services to clients in retail, tech, banking and automotive, including consulting, regression modeling and data mining. Profile Select professional experience Skills and tool knowledge ► Michael Moore, Ph.D. is an Executive Director in the Advisory Services practice of Ernst & Young LLP. He is the National practice lead for Enterprise Knowledge Graphs + AI in EY’s Data and Analytics (DnA) Group. ► Michael has industry and solution in customer experience, customer service, e-commerce, ad-serving, web and media analytics, consumer loyalty and churn, marketing optimization, enterprise and partner pipeline, and social media ► He specializes in graph database architecture, graph-based advanced analytics, machine learning and recommender systems. Michael is certified Neo4j Professional, and has active enterprise graph engagements in financial services, tech, oil & gas, retail and hospitality sectors. 34

×