The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights

•Descargar como PPTX, PDF•

0 recomendaciones•192 vistas

Neo4j

Tecnología

${ "orig_bytes":141, "resp_bytes":311, "orig_pkts":3, "orig_ip_bytes":225, "resp_pkts":3, "resp_ip_bytes":395, "missed_bytes":0, "connection_count":3, "id_orig_h":"10.0.1.36", "id_resp_h":"10.0.1.1", "windowStart":1620072900000, "windowEnd":1620073200000 } { "ts":1619562183740, "uid":"Ct6k6d2TnyF7Zp0Gs", "id_orig_h":"10.0.1.41", "id_orig_p":3, "id_resp_h":"10.0.1.16", "id_resp_p":10, "proto":"icmp", "conn_state":"OTH", "local_orig":true, "local_resp":true, "missed_bytes":0, "orig_pkts":1, "orig_ip_bytes":106, "resp_pkts":0, "resp_ip_bytes":0 }$

"transforms": "unwrap,extractfield",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.extractfield.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractfield.field": "id"

FROM confluentinc/cp-kafka-connect:7.1.1
RUN confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:2.0.2
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-salesforce:2.0.4
…
docker build . -t alexwoolford/kafka-connect:1.0.1
docker push alexwoolford/kafka-connect:1.0.1
https://woolford.io/2021-05-17-confluent-cloud-to-neo4j-aura/

$MATCH (contact:Contact {email: event.email}) MERGE (unsubscribeEvent:EmailUnsubscribe:Event {timestamp: apoc.date.fromISO8601(event.unsubscribe_timestamp)}) MERGE (contact)-[:LAST]->(unsubscribeEvent) WITH contact, unsubscribeEvent WITH contact, unsubscribeEvent, CASE WHEN NOT ((contact)-[:FIRST]->()) THEN [1] ELSE [] END AS firstExists FOREACH (i IN firstExists | MERGE (contact)-[:FIRST]->(unsubscribeEvent)) WITH contact, unsubscribeEvent MATCH (unsubscribeEvent)<-[:LAST]-(contact)-[oldRel:LAST]->(oldLast) DELETE oldRel MERGE (oldLast)-[:NEXT]->(unsubscribeEvent)$

Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [snowplow-enriched-good])
--> KSTREAM-MAPVALUES-0000000001
Processor: KSTREAM-MAPVALUES-0000000001 (stores: [])
--> KSTREAM-SINK-0000000002
<-- KSTREAM-SOURCE-0000000000
Sink: KSTREAM-SINK-0000000002 (topic: snowplow-enriched-json-good)
<-- KSTREAM-MAPVALUES-0000000001

The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights

Más contenido relacionado

La actualidad más candente

Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j

Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)Neo4j

Graph-Based Network Topology Analysis for Telecom OperatorsNeo4j

Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko

Road to NODES - Handling Neo4j Data with Apache HopNeo4j

How Expedia’s Entity Graph Powers Global TravelNeo4j

Introduction: Relational to GraphsNeo4j

DataOps introduction : DataOps is not only DevOps applied to data!Adrien Blind

The Path To Success With Graph Database and AnalyticsNeo4j

BT Group: Use of Graph in VENA (a smart broadcast network)Neo4j

Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...Neo4j

Technip Energies Italy: Planning is a graph matterNeo4j

Knowledge Graphs for Network Digital TwinsNeo4j

THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioDevOpsDays Tel Aviv

Modern Monitoring [ with Prometheus ]Haggai Philip Zagury

Elastic Stack IntroductionVikram Shinde

The end of polling : why and how to transform a REST API into a Data Streamin...Audrey Neveu

Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j

Standard Chartered- Threat Intelligence using Knowledge Graphs.pdfNeo4j

Devfest 2021' - Artifact Registry Introduction (Taipei)KAI CHU CHUNG

La actualidad más candente (20)

Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...

Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)

Graph-Based Network Topology Analysis for Telecom Operators

Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx

Road to NODES - Handling Neo4j Data with Apache Hop

How Expedia’s Entity Graph Powers Global Travel

Introduction: Relational to Graphs

DataOps introduction : DataOps is not only DevOps applied to data!

The Path To Success With Graph Database and Analytics

BT Group: Use of Graph in VENA (a smart broadcast network)

Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...

Technip Energies Italy: Planning is a graph matter

Knowledge Graphs for Network Digital Twins

THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io

Modern Monitoring [ with Prometheus ]

Elastic Stack Introduction

The end of polling : why and how to transform a REST API into a Data Streamin...

Neo4j Graph Platform Overview, Kurt Freytag, Neo4j

Standard Chartered- Threat Intelligence using Knowledge Graphs.pdf

Devfest 2021' - Artifact Registry Introduction (Taipei)

Similar a The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights

OSMC 2013 | Making monitoring simple? by Michael MedinNETWAYS

NSClient++: Monitoring Simplified at OSMC 2013Michael Medin

EsperwhisperingTheo Schlossnagle

Ns client++ whats new (nwc2013)Michael Medin

Nagios Conference 2013 - Michael Medin - NSClient++ Whats NewNagios

IcingaCamp Stockholm - NSClient++Icinga

Ns client++ icinga campMichael Medin

Similar a The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights (7)

OSMC 2013 | Making monitoring simple? by Michael Medin

NSClient++: Monitoring Simplified at OSMC 2013

Esperwhispering

Ns client++ whats new (nwc2013)

Nagios Conference 2013 - Michael Medin - NSClient++ Whats New

IcingaCamp Stockholm - NSClient++

Ns client++ icinga camp

Más de Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j

EY_Graph Database Powered SustainabilityNeo4j

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Build your next Gen AI Breakthrough - April 2024Neo4j

Connecting the Dots for Information Discovery.pdfNeo4j

ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j

BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j

Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j

Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j

Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j

Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j

IA Generativa y Grafos de Neo4j: RAG timeNeo4j

Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j

Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j

Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j

Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j

Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j

Más de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

EY_Graph Database Powered Sustainability

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Build your next Gen AI Breakthrough - April 2024

Connecting the Dots for Information Discovery.pdf

ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...

BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos

Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j

Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf

Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf

Webinar - IA generativa e grafi Neo4j: RAG time!

IA Generativa y Grafos de Neo4j: RAG time

Neo4j: Data Engineering for RAG (retrieval augmented generation)

Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf

Enabling GenAI Breakthroughs with Knowledge Graphs

Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf

Neo4j Jesus Barrasa The Art of the Possible with Graph

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Platformless Horizons for Digital AdaptabilityWSO2

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights

9. { "orig_bytes":141, "resp_bytes":311, "orig_pkts":3, "orig_ip_bytes":225, "resp_pkts":3, "resp_ip_bytes":395, "missed_bytes":0, "connection_count":3, "id_orig_h":"10.0.1.36", "id_resp_h":"10.0.1.1", "windowStart":1620072900000, "windowEnd":1620073200000 } { "ts":1619562183740, "uid":"Ct6k6d2TnyF7Zp0Gs", "id_orig_h":"10.0.1.41", "id_orig_p":3, "id_resp_h":"10.0.1.16", "id_resp_p":10, "proto":"icmp", "conn_state":"OTH", "local_orig":true, "local_resp":true, "missed_bytes":0, "orig_pkts":1, "orig_ip_bytes":106, "resp_pkts":0, "resp_ip_bytes":0 }

10. "transforms": "unwrap,extractfield", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState", "transforms.extractfield.type": "org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractfield.field": "id"

11. FROM confluentinc/cp-kafka-connect:7.1.1 RUN confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:2.0.2 RUN confluent-hub install --no-prompt confluentinc/kafka-connect-salesforce:2.0.4 … docker build . -t alexwoolford/kafka-connect:1.0.1 docker push alexwoolford/kafka-connect:1.0.1 https://woolford.io/2021-05-17-confluent-cloud-to-neo4j-aura/

12.

13.

14. MATCH (contact:Contact {email: event.email}) MERGE (unsubscribeEvent:EmailUnsubscribe:Event {timestamp: apoc.date.fromISO8601(event.unsubscribe_timestamp)}) MERGE (contact)-[:LAST]->(unsubscribeEvent) WITH contact, unsubscribeEvent WITH contact, unsubscribeEvent, CASE WHEN NOT ((contact)-[:FIRST]->()) THEN [1] ELSE [] END AS firstExists FOREACH (i IN firstExists | MERGE (contact)-[:FIRST]->(unsubscribeEvent)) WITH contact, unsubscribeEvent MATCH (unsubscribeEvent)<-[:LAST]-(contact)-[oldRel:LAST]->(oldLast) DELETE oldRel MERGE (oldLast)-[:NEXT]->(unsubscribeEvent)

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25. Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [snowplow-enriched-good]) --> KSTREAM-MAPVALUES-0000000001 Processor: KSTREAM-MAPVALUES-0000000001 (stores: []) --> KSTREAM-SINK-0000000002 <-- KSTREAM-SOURCE-0000000000 Sink: KSTREAM-SINK-0000000002 (topic: snowplow-enriched-json-good) <-- KSTREAM-MAPVALUES-0000000001

Notas del editor

#TODO: Kafka/Neo graphic When to use Kafka and Neo4j together: transactional #TODO: name, email
Timeliness examples: Supply-chain: if two companies use the same component and there’s a shortage, the company that discovers this first will have a huge advantage: they’ll be able to buy all the available inventory. This will enable them to ship finished goods AND prevent their competitor from doing so. Clickstream: every click shows intent.
Producers, consumers, brokers, topics, partitions Compacted topics
Fan-out: write to read ratio.
What is long-polling? Plugin deprecation: RIP, CDC (for now) Streams plugin: proper eventing CDC (Debezium-style before/after, schema info) no need to deploy/monitor/manage external Connect cluster includes procedures, e.g. publish/consume directly to Kafka from function call (streams.publish and streams.consume) being deprecated edge-case where data loss is possible (asynchronous producer) not available on Aura No schema registry support Connect plugin: no data loss edge-case Connect source/sink from over 100 different technologies state is stored in Kafka, so it works on Aura Long polling depends on timestamp or incrementing integer no CDC (today) Streams: everything inside Neo4j JVM; Neo4j can take a while to start Connect: outer turquoise box == connect JVM; red box == connect tasks See /Users/alexwoolford/PycharmProjects/scratch/kafka_outtage.py for a practical example of the data-loss edge case. See Neo4j-Streams deprecation notice: https://neo4j.com/labs/kafka/4.1/consumer/ #TODO: add source/sink labels to Neo4j-Streams Drama
More detail (e.g. licensing, etc…) available at: https://docs.google.com/spreadsheets/d/1h2DBG5kqzeihDXnPZVdP93QbtYt8yau8Ri86C4_6B9I/edit?usp=sharing
Each Connect instance runs inside an OS (typically a stripped-down version of Linux inside a Docker container). The instance has plugins installed inside it. There are more than 200 possible plugin types to choose from. In addition to plugins there are also single message transforms. These are used to do [typically simple] stateless manipulations to the payload before it’s written to Kafka or sink’d to some other system. SMT’s are optional. In the example in the slide, there is no SMT used in the top job (x). A connector job consists of one or more tasks. These are often spread over multiple instances for parallelization and fault tolerance. # TODO: add antennas
Are connectors running in distributed or standalone mode? They should be running in distributed mode. Are the correct number of tasks configured for the required throughput? Don't exceed 20 tasks per worker in production. Are Connect workers configured correctly? See https://docs.confluent.io/home/connect/self-managed/userguide.html#configuring-workers Have you read the Monitoring Connect Operations Guide? See https://docs.confluent.io/platform/current/connect/monitoring.html Have you configured a dead letter queue to handle bad records? Are task statuses via the REST API monitored to ensure tasks haven’t failed? In a CDC source use case, is a "native" CDC connector used instead of the JDBC source connector? The JDBC source connector puts added load on the source system. Unfortunately, this isn’t an option for the Connect Neo4j source. Don’t use Zookeeper. MERGE works best when there’s an index. Show how to access Connect logs Talk about Neo4j locking w/ connect # TODO: locking bullet
The same data might be stored multiple ways to provide different access patterns. Show my clickstream and the two connectors.
Show how easy it is to plugin enrichment logic to events in Kafka, and then use those to enrich the graph. Snatch 18:20 Show Streams visualization by pasting RTD topology into visualizer [main] INFO io.woolford.rtd.stream.RtdStreamer - Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [rtd-bus-position]) --> KSTREAM-TRANSFORM-0000000001 Processor: KSTREAM-TRANSFORM-0000000001 (stores: [busPositionStore]) --> KSTREAM-FILTER-0000000002 <-- KSTREAM-SOURCE-0000000000 Processor: KSTREAM-FILTER-0000000002 (stores: []) --> KSTREAM-MAPVALUES-0000000003 <-- KSTREAM-TRANSFORM-0000000001 Processor: KSTREAM-MAPVALUES-0000000003 (stores: []) --> KSTREAM-SINK-0000000004 <-- KSTREAM-FILTER-0000000002 Sink: KSTREAM-SINK-0000000004 (topic: rtd-bus-position-enriched) <-- KSTREAM-MAPVALUES-0000000003 https://zz85.github.io/kafka-streams-viz/ Note that ‘MapValues’ doesn’t require re-keying where ‘Map’ does. Implication: use MapValues if you can. Docs: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#stateless-transformations Stateful operations: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#stateful-transformations Aggregating: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-aggregating Joining: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-joins Windowing: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-windowing Custom: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-process TODO: split into two slides so it separates Kafka Streams into its own slides and shows which to use where
Flavors of windowing: hopping, tumbling, session, sliding See https://developer.confluent.io/learn-kafka/kafka-streams/windowing/ For sessionization, consider adding to https://github.com/alexwoolford/snowplow-kafka-streams
https://docs.confluent.io/platform/current/connect/transforms/overview.html https://jsonpath.com/ <- handy to test JSON path. https://github.com/alexwoolford/kafka-connect-transform-jolt #TODO: add diagram showing where SMT gets executed. Also, show multiple transforms; mention ability to write your own.
The Connect API has a plugins folder. Connector plugin jars are put in the plugins folder. Go to Dockerhub and get the latest version of confluentinc/cp-kafka-connect http deepthought.woolford.io:8083/connector-plugins #TODO: “create a connect worker image…”
Caveats: if your events are coming from different topics, the connectors had better not fall behind. If strict ordering is an absolute must-have, then we’d need to have all the events for any given customer in a single partition, and use APOC’s
https://dmccreary.medium.com/how-to-explain-index-free-adjacency-to-your-manager-1a8e68ec664a Discussion of BTree when querying an index.
#TODO: mask logos
Graph, search, k/v, OLAP, Specialized hardware (e.g. GPU, TPU, etc…) Add timeseries DB’s.
https://en.wikipedia.org/wiki/Strangler_fig Martin Fowler: use strangler to avoid the risk of a massive re-write
Consider showing Snowplow recommender API
https://zz85.github.io/kafka-streams-viz/ This is particularly useful if you find yourself working on a Kafka Streams job that was written by someone else.
select ID_RESP_H, getGeoForIp(ID_RESP_H) from CONN emit changes;
Allows downstream consumers to restore state after a crash or system failure. Great blog article that explains the detail: https://towardsdatascience.com/log-compacted-topics-in-apache-kafka-b1aa1e4665a7 Show “cleanup policy” in C3.
https://docs.confluent.io/current/schema-registry/avro.html#summary Quick demo: https://github.com/alexwoolford/multiple-event-types-demo #TODO: change to non-binary: MFX
SMT (single message transform): simple stateless transformations Streams DSL: 9/10 use cases aggregations, joins, windowing, custom processors Streams PAPI: more flexible; harder to use Possible to combine DSL and PAPI in the same streaming job. https://docs.confluent.io/platform/current/streams/developer-guide/dsl-api.html https://docs.confluent.io/platform/current/streams/developer-guide/processor-api.html ^^ show layers (SMT, DSL, PAPI) and where those components exist
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/ The Kafka API has become a standard, and can be consumed in many guises (not just Apache Kafka or Confluent). #TODO: add a graph version of this, and show how Cypher is becoming a standard (e.g. Neptune, Memgraph)

The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights

Similar a The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights (7)

Más de Neo4j

Más de Neo4j (20)

Último

Último (20)

The Streaming Graph: Integration Strategies With Kafka and Neo4j for Near Real-Time Insights

Notas del editor