Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Briefing Room analyst comments - streaming analytics

751 visualizaciones

Publicado el

Slides for Briefing Room webcast ( )
Organizations worldwide are learning hard lessons these days about the constraints of dated information systems. The time-tested process of Extract-Transform-Load (ETL) is fast losing its ability to cope with the volume, velocity and variety of Big Data coming down the pike. Forward-thinking companies are therefore prepping the battle field by designing on-ramps to the future of streaming analytics. Register for this episode of The Briefing Room to hear Analyst Mark Madsen explain how a new era of data solutions is rising to the challenge of streaming data. He'll be briefed by Steve Wilkes, founder and CTO of the Striim platform. Steve will share how enterprises are turning to streaming data integration, in-memory transformations and continuous processing to achieve the goals of ETL in milliseconds – at a fraction of the cost and complexity of legacy systems. Several case studies will be shared.

Publicado en: Datos y análisis
  • Inicia sesión para ver los comentarios

Briefing Room analyst comments - streaming analytics

  1. 1. 1 Is ETL Now a 4‐Letter  Word? Preparing for  Streaming Analytics Analyst commentary October, 2015 Mark Madsen Third Nature @markmadsen #!$@*ETL! %*ELT#&*!
  2. 2. Copyright Third Nature, Inc. In a mostly‐connected world, events occur in  different time frames, follow different cycles of use 2 Source: Noumenal Disconnected Milliseconds Minutes Hours+
  3. 3. Copyright Third Nature, Inc. In a mostly‐connected world, events occur in  different time frames, follow different cycles of use 3 Source: Noumenal Disconnected Every event is  persisted for some  period of time before  it is forgotten or  forwarded Milliseconds Minutes Hours+
  4. 4. Copyright Third Nature, Inc. In a mostly‐connected world, events occur in  different time frames, follow different cycles of use 4 Source: Noumenal Disconnected Local context  and control,  local decisions,  local latency Milliseconds Minutes Hours+
  5. 5. Copyright Third Nature, Inc. In a mostly‐connected world, events occur in  different time frames, follow different cycles of use 5 Disconnected Source: Noumenal Milliseconds Minutes Hours+ Bigger context,  likely correlated,  more complex  rules, external  monitoring
  6. 6. Copyright Third Nature, Inc. In a mostly‐connected world, events occur in  different time frames, follow different cycles of use 6 Disconnected Source: Noumenal Milliseconds Minutes Hours+ Broad context, human  intervention, diagnosis  and analytical tasks that  have to be coordinated.
  7. 7. Copyright Third Nature, Inc. Source: Noumenal Milliseconds Minutes Hours+ In a mostly‐connected world, events occur in  different time frames, follow different cycles of use 7 Disconnected Data lives in multiple places,  at multiple levels of detail, for  differing durations. Unlikely to  all be in one place. Nor should it be.
  8. 8. Copyright Third Nature, Inc. We have a model for the persisted portion only The DW can’t handle real time ingest ▪ One of the original DW design assumptions: solve for  conflicting workloads by using a different database ▪ Workload management has limits ▪ Scalability problem for event streams ▪ Spiky flow patterns and dynamic scaling Static schema: ▪ Reaction time ‐ shapes, holes, dropped packets ▪ What happens first, upstream change or data model change? Polling architectures do not work well for streaming ▪ Introduces latency ▪ Polling creates performance and scaling problems
  9. 9. Copyright Third Nature, Inc. Activities and functions based on the data flow cycle Capture Sensors Machine  Data Logs Events Transctions Table  changes Propagate Filter Transform Correlate Aggregate Analyze Classify Detect  anomalies Detect  patterns Correlate Elect Rules Algorithms Select Coordinate Effect Notify Publish Approve Execute Persist Database, NoSQL, Files, Hadoop
  10. 10. Copyright Third Nature, Inc. Flowing Persisted Sliding window of “now” Persisted but not yet loaded into a platform Queryable history Managed history Streaming isn’t either‐or, it’s part of core architecture A DB or ETL can get you to within minutes (at large scale) but it won’t be easy or cheap; mainly lives in the realm of history Event streams, in-mem stores, CEP streaming SQL can be used for these Real time monitoring doesn’t use only real time data: windows, restarts, detecting deviation, so the above boundaries are crossed. ESB Cache/Queue Database / platform
  11. 11. Copyright Third Nature, Inc. Stream If you want to do realtime and still manage your data  effectively then you need to think about data architecture Collect Refine Manage Deliver Flowing Managed historyPersisted Microservices Metadata Metadata & reuse? Flow, persisted, managed define different access,  processing, storage and retrieval requirements
  12. 12. Copyright Third Nature, Inc. About Third Nature Third Nature is a research and consulting firm focused on new and emerging technology  and practices in analytics, business intelligence, and performance management. If your  question is related to data, analytics, information strategy and technology infrastructure  then you‘re at the right place. Our goal is to help companies take advantage of information‐driven management  practices and applications. We offer education, consulting and research services to  support business and IT organizations as well as technology vendors. We fill the gap between what the industry analyst firms cover and what IT needs. We  specialize in product and technology analysis, so we look at emerging technologies and  markets, evaluating technology and hw it is applied rather than vendor market positions.