Deep Dive Time Series Anomaly Detection in Azure with dotnet
1. DATA SATURDAY #10
Sofia, Oct 09th
Deep Dive Time Series Anomaly
Detection with different Azure
2. Marco Parenzan
• Senion Solution Architect @ beanTech
• 1nn0va Community Lead (Pordenone)
• Microsoft Azure MVP
o Linkedin: https://www.linkedin.com/in/marcoparenzan/
o Slideshare: https://www.slideshare.net/marco.parenzan
o GitHub: https://github.com/marcoparenzan
3. This is the journey of…
• …a .NET developer…
• …or an IoT developer…
• …a one-man band (sometimes )…
• …facing typical data science world topics…
• …that wants to use .NET everywhere!
15. Threshold anomalies?
• Threshold alarms are not enough
o Anomalies cannot be just «over a threshold for
o Condenser or Evaporator with difficulties starting
o Distinguish from Opening a door (that is also an
o Or also counting the number of times that there
are peaks (too many times)
• You can considering each of these
events as anomalies that alter the
temperature you measure in
different part of the fridge
16. Anomaly Detection
• Anomaly detection is the process of identifying unexpected items or events in
data sets, which differ from the norm.
• And anomaly detection is often applied on unlabeled data which is known as
unsupervised anomaly detection.
• Anomaly is not just a matter of time and scalar values. It can also be a matter
of visual anomalies!
17. Time Series
• is a general direction in which something is
developing or changing. A trend can be
upward(uptrend) or downward(downtrend).
It is not always necessary that the increase or
decrease is consistently in the same direction
in a given period.
• Predictable pattern that recurs or repeats
over regular intervals. Seasonality is often
observed within a year or less.
o Irregular fluctuation
• These are variations that occur due to
sudden causes and are unpredictable. For
example the rise in prices of food due to war,
flood, earthquakes, farmers striking etc.
o Time series is a sequence of data points recorded
in time order, often taken at successive equally
paced points in time.
o Stock prices, Sales demand, website traffic, daily
temperatures, quarterly sales
• Time series is different from
regression analysis because of its
18. Anomaly Detection in Time Series
• In time series data, an anomaly or outlier can be termed as a data point
which is not following the common collective trend or seasonal or cyclic
pattern of the entire data and is significantly distinct from rest of the data. By
significant, most data scientists mean statistical significance, which in order
words, signify that the statistical properties of the data point is not in
alignment with the rest of the series.
• Anomaly detection has two basic assumptions:
o Anomalies only occur very rarely in the data.
o Their features differ from the normal instances significantly.
20. Helping no-data scientits developers (all! )
• Unsupervised Machine
• Automated Training Set for
Anomaly Detection Algorithms
• the algorithms automatically
generates a simulated training set
based non your input data
• Auto(mated) MLfind the best
tuning for you with parameters
21. Spectrum Residual Cnn (SrCnn)
• To monitor the time-series continuously and alert for potential incidents on time
• The algorithm first computes the Fourier Transform of the original data. Then it computes
the spectral residual of the log amplitude of the transformed signal before applying the
Inverse Fourier Transform to map the sequence back from the frequency to the time domain.
This sequence is called the saliency map. The anomaly score is then computed as the relative
difference between the saliency map values and their moving averages. If the score is above
a threshold, the value at a specific timestep is flagged as an outlier.
• There are several parameters for SR algorithm. To obtain a model with good performance, we
suggest to tune windowSize and threshold at first, these are the most important parameters
to SR. Then you could search for an appropriate judgementWindowSize which is no larger
than windowSize. And for the remaining parameters, you could use the default value directly.
• Time-Series Anomaly Detection Service at Microsoft [https://arxiv.org/pdf/1906.03821.pdf]
23. Data Science and AI for the .NET developer
• ML.NET is first and foremost a framework that you can use to
create your own custom ML models. This custom approach
contrasts with “pre-built AI,” where you use pre-designed general
AI services from the cloud (like many of the offerings from Azure
Cognitive Services). This can work great for many scenarios, but
it might not always fit your specific business needs due to the
nature of the machine learning problem or to the deployment
context (cloud vs. on-premises).
• ML.NET enables developers to use their existing .NET skills to
easily integrate machine learning into almost any .NET
application. This means that if C# (or F# or VB) is your
programming language of choice, you no longer have to learn a
new programming language, like Python or R, in order to
develop your own ML models and infuse custom machine
learning into your .NET apps.
25. Some tools required
• .NET 5 + WPF + ML.NET
• Mandatory , the platform where we try to make experiments
• Xplot.Ploty (soon you will understand I use this) https://fslab.org/XPlot/
• XPlot is a cross-platform data visualization package for the F# programming language
provides a complete mapping for the configuration options of the underlying libraries and so
you get a nice F# interface that gives you access to the full power of Plotly and Google
Charts. The XPlot library can be used interactively from F# Interactive, but charts can equally
easy be embedded in F# applications and in HTML reports.
• WebView2 https://docs.microsoft.com/en-us/microsoft-edge/webview2/gettingstarted/wpf
• The Microsoft Edge WebView2 control enables you to embed web technologies (HTML, CSS,
as the rendering engine to display the web content in native apps. With WebView2, you may
embed web code in different parts of your native app. Build all of the native app within a
single WebView instance.
28. Batch vs. Notebooks
o Work on slow data stored into a Datalake
o Submit a complete app in one single deploy
o Receive the entire output
o «sketching» the code
o Write/delete/rewrite continuously
o Run cell by cell (but also all at once) interactive
• In a world of Mathematica
• Evolution and generalization of the seminal role of Mathematica
• In web standards way
o Web (HTTP+Markdown)
o Python adoption (ipynb)
• Written in Java
• Python has an interop bridge...not native (if ever important)Python is a
kernel for Jupyter
• Simple to start (that why C# is pythonizing…)
• “Open Source”
• TensorFlow, Scikit-learn, Keras, Pandas, PyTorch
• Remember one thing:
o Often behind a Data Science framework there is a native library and Python binds that library
o Spark is written in Java and there is a bridge for Python to Spark
o Jupyter is written in Java and there is a bridge (kernel) for Python
31. Spark Unifies:
An unified, open source, parallel, data processing framework for Big Data Analytics
Spark Core Engine
32. .NET Interactive and Jupyter
and Visual Studio Code
• .NET Interactive gives C# and F# kernels to Jupyter
• .NET Interactive gives all tools to create your hosting application
independently from Jupyter
• In Visual Studio Code, you have two different notebooks (looking similar but
developed in parallel by different teams)
o .NET Interactive Notebook (by the .NET Interactive Team) that can run also Python
o Jupyter Notebook (by the Azure Data Studio Team – probably) that can run also C# and F#
• There is a little confusion on that
• .NET Interactive has a strong C#/F# Kernel...
o ...a less mature infrastructure (compared to Jupiter)
33. .NET for Apache Spark 1.1.1
• .NET bindings (C# e F#) to Spark
o Written on the Spark interop layer, designed to
provide high performance bindings to multiple
• Re-use knowledge, skills, code you
have as a .NET developer
o Compliant with .NET Standard
• You can use .NET for Apache
Spark anywhere you write .NET
• Original project Moebius
41. Azure Cognitive Services
• Cognitive Services brings AI within reach of every developer—without
requiring machine-learning expertise. All it takes is an API call to embed the
ability to see, hear, speak, search, understand, and accelerate decision-
making into your apps. Enable developers of all skill levels to easily add AI
capabilities to their apps.
• Five areas:
• Web search
Identify potential problems early on.
Detect potentially offensive or unwanted
Metrics Advisor PREVIEW
Monitor metrics and diagnose issues.
Create rich, personalized experiences for every
42. Anomaly Detector
• Through an API, Anomaly Detector ingests time-series data of all types and
selects the best-fitting detection model for your data to ensure high accuracy.
Customize the service to detect any level of anomaly and deploy it where you
need it most -- from the cloud to the intelligent edge with containers. Azure
is the only major cloud provider that offers anomaly detection as an AI
45. Fully managed big data analytics service
• Fully managed
Focus on insights, not the
infra-structure for fast time to
• No infrastructure to manage;
provision the service, choose
the SKU for your workload,
and create database.
• Optimized for
Get near-instant insights
from fast-flowing data
• Scale linearly up to 200 MB per
second per node with highly
performant, low latency
• Designed for
• Run ad-hoc queries using the
intuitive query language
• Returns results from 1 Billion
records < 1 second without
modifying the data or
46. •seconds freshness, days retention
•in-mem aggregated data
•pre-defined standing queries
•split-seconds query performance
•minutes freshness, months retention
•seconds-minutes query perf
•hours freshness, years retention
•programmatic batch processing
•minutes-hours query perf
• in-mem cube
• stream analytics
• column store
• distributed file
• map reduce
Multi-temperature data processing paths
Anomaly detection is the process of identifying unexpected items or events in data sets, which differ from the norm. And anomaly detection is often applied on unlabeled data which is known as unsupervised anomaly detection.
The Spectral Residual outlier detector is based on the paper Time-Series Anomaly Detection Service at Microsoft and is suitable for unsupervised online anomaly detection in univariate time series data. The algorithm first computes the Fourier Transform of the original data. Then it computes the spectral residual of the log amplitude of the transformed signal before applying the Inverse Fourier Transform to map the sequence back from the frequency to the time domain. This sequence is called the saliency map. The anomaly score is then computed as the relative difference between the saliency map values and their moving averages. If the score is above a threshold, the value at a specific timestep is flagged as an outlier. For more details, please check out the paper.
Modernize applications with .NET Core
Today we focused on Cloud-optimized .NET Framework apps. However, many applications will benefit from modern architecture built on .NET Core – a much faster, modular, cross-platform, open source .NET. Websites can be modernized with ASP.NET Core to bring in better security, compliance, and much better performance than ASP.NET on .NET Framework. .NET Core also provides code patterns for building resilient, high-performance microservices on Linux and Windows.
WHAT is ADX EXACTLY?
Is a Fully managed big data analytics service, based on an Analytical database.
Analytical databases are optimized to query and run advanced analytics on large volumes of data with extremely low response times.
Modern analytical databases are generally distributed, scalable, fault-tolerant
They are columnar based databases that deal with compressed formats and with an intelligent softweare infrastructure composed by a blend of in-memory and disk caching technologies.
Hot is in terms of Instant results from a continuous dataflow
Warm path mean analytical approach, not immediate but full of rough data to be modeled
Cold doesn't mean unreachable
So the question is: In a Multi Temperature situation, is there any Azure service that can be the answer to all the three data paths.