Publicidad

Build Real-Time Applications with Databricks Streaming

Developer Marketing and Relations at MuleSoft en Databricks
15 de Jun de 2021
Publicidad

Más contenido relacionado

Similar a Build Real-Time Applications with Databricks Streaming(20)

Publicidad

Más de Databricks(20)

Publicidad

Build Real-Time Applications with Databricks Streaming

  1. Real-Time Data Streaming with Databricks, Spark and Power BI Bennie Haelen Principal Architect – Insight Digital Innovation
  2. Use Case Description • Large Metropolitan Fire Department • Implemented a MDW architecture on Azure • Based upon the Insight repeatable MDW framework architecture Legend RAW Ins-swdi-lens-aas Azure Automation Ins-swdi-lens-lapp PL_MT_raw2stage PL_processAAS Dataflow Workflow PL_DATA_ORA_2_ADLS_FULL DROPZONE CSV file 1 2 4 7 8 9 Power BI 5 PL_MT_stage2mdw PL_DATA_mdw2asql 6 Ins-swdi-lens-asql 3 Ins-swdi-lens-adf RAW/Archive STAGE MDW Oracle .parquet Workspace Folders Storage Acct ins-swdi-lens-adls Databricks Hive Databases Key Vaults Ins-swdi-lens-email-lapp
  3. Use Case Extension • Need to add a real-time reporting channel • Up-to-date location & status of equipment • Location & status of firefighters, EMT personnel • List of active incidents within the city • Near real-time Visualization • Automatically updating dashboard • Map with automatic updates of locations and incidents • Used by fire chiefs to make real-time move-up decisions • Pre-emptively Move-up equipment & resources
  4. Use Case Analysis • Forwarding of events through the Azure Cloud • ESB exposes a Web Sockets interface • Azure function reads events from ESB through WebSockets interface • Function forwards the events to the Azure cloud • Function is hosted in a Web Application Central FD Database Ingest data from the various event sources Change Data Capture Triggered with each transactional operation Enterprise Service Bus CDC Ingest & forward events to consumers Solution • Create Cloud ingest • Real Time Stream processing • Performant ACID Data Store • Real-Time Visualization `
  5. Architectural Requirements • Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service • Stream Events, perform domain-specific conversions • Need real-time streaming analytics • Stored Processed Data in high-performant data store • Keyed access to the data • Ability to perform UPSERT operations • Visualize the data in a real-time dashboard • Updates triggered by data changes in the underlying data store
  6. Solution Architecture Ingestion Channel Azure Event Hubs Event Processing Databricks with Spark Structured Streaming Real-Time Data Store Databricks Delta Lake Visualization Power BI Service Dashboard Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service Azure Event Hubs • Microsoft real-time data ingestion engine • Can ingest millions of events/second • Kafka compatibility Process Stream • Continuous Processing • Real time ingestion • Micro-batch processing Databricks on Azure • Spark Structured Streaming • Fault-tolerant Stream processing engine • Kafka compatibility Real-Time Storage • Keyed Access to Data • Ability to perform UPSERTS • Simple SQL-based access Delta Lake • ACID Transactions • High Scalability Real-Time Visualization • Simple Integration • Updates through Data Triggers • Direct Query into Data Source Microsoft Power BI • Direct Query against Delta Lake • Real-time dashboarding facilities • Updates trigger through data changes or push datasets
  7. Demo Architecture • nb-create-unitStatusTable notebook Invokes the generic CreateDeltaTable with the appropriate parameters to create our UnitStatus table • nb-create-delta-table notebook Generic notebook which creates a Delta table • nb-eventhub-spark-streaming notebook reads the events from Event Hubs and invokes the foreachBatch sink function implemented in nb- unitstatus-event-processor notebook • nb-unitstatus-event-processor Processes the events, performs the transformations, and finally updates our UnitStatusTable Units-eh Event Hub C# .NET Console Application nb-eventhub-spark- streaming Databricks Notebook nb-unitstatus- event-processor Delta Table old_stream_fd. unit_status Databricks Notebook nb-create-unit- status-table Databricks Notebook nb-create-delta- table Create Delta Table unit_status UPSERTS Power BI Premium Power BI Report Streaming- demo.eventsimulator Databricks Notebook
  8. Demo - Organization Creation of Delta Lake Table Implementation Resources Walk Through Spark Streaming Notebook Stream Processor Function Demo Run Event Simulator
  9. Demo 1 – Infrastructure Walkthrough
  10. Demo 2 – Code Walkthrough
  11. Demo 3 – Sample Run
  12. Summary • The need for large scale real-time stream processing become more evident every day • Provide organizations with the ability to respond quickly to a dynamic business climate • Spark Structured Streaming makes it easy to add a real- time channel • Simple extensions on top of Spark SQL
  13. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
Publicidad