This document contains the code and results for 8 queries written in Azure Stream Analytics to analyze streaming data from various IoT sensors. Query 1 counts the number of Audis passing through a toll station each minute. Query 2 calculates the total number of cars passing a speed camera by color every 90 seconds. Query 3 finds the oldest car passing a toll station by color every 20 seconds.
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Azure Stream Analytics Report - Toll Booth Stream
1. AZURE STREAM ANALYTICS
ASSIGNMENT
~
Data Management and Business Intelligence - Assignment 3
Academic Year: 2018-2019 (Full-Time)
Assignment Partners:Baratsas Sotiris (f2821803) |Spanos Nikos (f2821826)
CONTENTS:
1. Report.pdf - A report that summarizes the queries written and their results from the
console
2. queries.sql- The code for all the queries used in this assignment
3. JSON files - One JSON file for each query, with the outputs from the Live Data
section of Azure Stream. We started the Stream Analytics job for each query, run it
for 4-10 minutes and then extracted the JSON file from the container of BlobStorage
we created.
Query 1
In a tumbling window of 1 minute count the number of Audis that passed through a
toll station.
CODE:
SELECT COUNT([input].[vehicleTypeID]) as Total_Audis
INTO [output]
FROM [input]
INNER JOIN [cars-dataset]
ON [input].[vehicleTypeID] = [cars-dataset].[vehicleTypeID]
WHERE [cars-dataset].[CAR_MAKE]='Audi'
GROUP BY TumblingWindow(minute,1)
OUTPUT:
=========================== END OF QUERY 1 ===========================
2. Query 2
In a hopping window of 3 minutes, for each color, calculate the total number of cars
that passed through a police speed limit camera. Repeat every 90 seconds.
CODE:
SELECT [colors].[color_name], COUNT([input].[vehicleTypeID]) as Total_Cars
INTO [output]
FROM [input]
INNER JOIN [colors]
ON [input].[colorID]=[colors].[color_code]
WHERE [input].[spotType]='Speed_Limit_Camera'
GROUP BY [colors].[color_name], HoppingWindow(Duration(minute,3) ,
Hop(second,90))
OUTPUT:
=========================== END OF QUERY 2 ===========================
3. Query 3
In a tumbling window of 20 seconds, for each color, find the oldest car that passed
through a toll station.
CODE:
WITH table1 as
(SELECT [colors].[color_name] as color, [input].[colorID] as colorid,
System.Timestamp t, min([cars-dataset].[CAR_MODEL_YEAR]) as years
FROM [input] TIMESTAMP BY EventEnqueuedUtcTime
INNER JOIN [colors]
ON [input].[colorID] = [colors].[color_code]
INNER JOIN [cars-dataset]
ON [input].[vehicleTypeID] = [cars-dataset].[vehicleTypeID]
WHERE [input].[spotType] = 'Toll_Station'
GROUP BY [colors].[color_name],[input].[colorID], TumblingWindow(second,20))
SELECT table1.color, table1.years, [input].[vehicleTypeID], System.Timestamp
t
INTO [output]
FROM [input] TIMESTAMP BY EventEnqueuedUtcTime
INNER JOIN [cars-dataset]
ON [input].[vehicleTypeID] = [cars-dataset].[vehicleTypeID]
INNER JOIN table1
ON datediff(second, [input], table1) between 0 and 20
AND [input].[colorID] = table1.colorid
AND [cars-dataset].[CAR_MODEL_YEAR] = table1.years
OUTPUT:
=========================== END OF QUERY 3 ===========================
4. Query 4
In a sliding window of 60 seconds, calculate the speed limit camera spots where the
most violations happened.
CODE:
SELECT COUNT([input].[checkpointID]) as Over_the_speed_limit
INTO [output]
FROM [input]
INNER JOIN [speed-camera-spots]
ON [input].[checkpointID] = [speed-camera-spots].[checkpointID]
WHERE [input].[spotType] = 'Speed_Limit_Camera' AND
CAST([input].[speed] as bigint) > [speed-camera-spots].[SPEED_LIMIT]
GROUP BY SlidingWindow(second, 60)
OUTPUT:
=========================== END OF QUERY 4 ===========================
Query 5
In a sliding window of five minutes, for each color and car model, display the total
number of cars that break the speed limit.
CODE:
SELECT [cars-dataset].[CAR_MODEL], [colors].color_name, COUNT(*)
INTO [output]
FROM [input] TIMESTAMP BY EventEnqueuedUtcTime
INNER JOIN [speed-camera-spots]
ON [input].[checkpointID] = [speed-camera-spots].[checkpointID]
INNER JOIN [cars-dataset]
ON [input].[vehicleTypeID] = [cars-dataset].[vehicleTypeID]
INNER JOIN [colors]
ON [input].[colorID] = [colors].[color_code]
WHERE [input].[spotType] = 'Speed_Limit_Camera'
5. GROUP BY SlidingWindow(minute,
5),[cars-dataset].[CAR_MODEL],[colors].color_name
OUTPUT:
=========================== END OF QUERY 5 ===========================
Query 6
You have been given a list of the license plates of police’s most wanted criminals. In a
sliding window of 1 minute, display a list of all the cars that you spotted at any
checkpoint.
CODE:
SELECT
COUNT(A.[licensePlate]) as Spotted,
A.[licensePlate] as Wanted_Car,
A.[spotType] as Spot_Type,
A.[checkpointID] as Check_point
INTO [output]
FROM [input] as A
INNER JOIN [wanted-cars] as B
ON A.[licensePlate] = B.[licensePlate]
WHERE A.[licensePlate] = B.[licensePlate]
GROUP BY SlidingWindow(minute, 1), A.[spotType], A.[licensePlate],
A.[checkpointID]
OUTPUT:
6. Comment:Originally there was no wanted car in the data sample, so we put one in manually,
to check if the query works.
=========================== END OF QUERY 6 ===========================
Query 7
In a sliding window of 1 minute, display a list of fake license plates. Check if the same
license plate has passed through any type of checkpoint twice in the same time
window.
CODE:
SELECT
COUNT(ALL [licensePlate]) AS FakeLicense,
[licensePlate] AS LicensePlate,
[checkpointID] AS First_Chechpoint,
LAG([checkpointID]) OVER (PARTITION BY [licensePlate] LIMIT DURATION(minute,
1)) AS Second_Checkpoint
FROM
[input]
WHERE
[checkpointID] <> LAG([checkpointID]) OVER (PARTITION BY [licensePlate] LIMIT
DURATION(minute, 1))
GROUP BY [licensePlate], [checkpointID], SlidingWindow(minute,1),
LAG([checkpointID]) OVER (PARTITION BY [licensePlate] LIMIT DURATION(minute,
1))
OUTPUT:
7. CODE (2nd Approach):
SELECT
CASE WHEN COUNT(*) = 1 THEN CONCAT([input].[licensePlate], ': 1 time' )
ELSE CONCAT([input].[licensePlate], ': ',CAST(COUNT(*) AS NVARCHAR(MAX)), '
times')
END AS CarsPassed
INTO [output]
FROM [input]
GROUP BY [input].[licensePlate],SlidingWindow(minute, 1)
HAVING COUNT([input].[licensePlate])>1
OUTPUT (2nd Approach):
Comment: Originally there was no duplicate car in the data sample, so we put one in
manually, to check if the query works.
=========================== END OF QUERY 7 ===========================
Query 8
In a tumbling window of 2 minutes, calculate the percentage of BMW drivers that
break the speed limit.
CODE:
8. with Numerator as
(SELECT COUNT([input].[licensePlate]) as up
FROM [input]
INNER JOIN [cars-dataset]
ON [input].[vehicleTypeID]=[cars-dataset].[vehicleTypeID]
INNER JOIN [speed-camera-spots]
ON [input].[checkpointID] = [speed-camera-spots].[checkpointID]
WHERE [cars-dataset].[CAR_MAKE] = 'BMW' AND
CAST([input].[speed] AS bigint)>[speed-camera-spots].[SPEED_LIMIT] AND
[input].[spotType]='Speed_Limit_Camera'
GROUP BY TumblingWindow(minute, 2)),
Denomenator as
(SELECT COUNT([input].[licensePlate]) as down
FROM [input]
INNER JOIN [cars-dataset]
ON [input].[vehicleTypeID]=[cars-dataset].[vehicleTypeID]
WHERE [cars-dataset].[CAR_MAKE] = 'BMW' AND
[input].[spotType]='Speed_Limit_Camera'
GROUP BY TumblingWindow(minute, 2))
select CONCAT('Out of all the BMW drivers that were identified in the last 2
minutes, ',
CEILING((Numerator.up) * 100.0 / Denomenator.down) , '%',' of the drivers
broke the speed limit' ) as Percentage
into [output]
from Numerator
join Denomenator on DATEDIFF(mi, Numerator, Denomenator) BETWEEN 0 AND 2
OUTPUT:
=========================== END OF QUERY 8 ===========================