1. Learning to adapt to sensor
changes and failures
Craig Knoblock
Yuan Shi
Minh Pham
University of Southern California
Information Sciences Institute
2. Introduction
• The Internet of Things will contain many sensors
• People will build applications that will rely on these sensors
• But with the large numbers of sensors, there will be failures
• So one important challenge is seamlessly handling these failures
3. Outline
• Learning to Replace a Failed Sensor
• Learning to Replace a Compound Sensor
• Assessing Adaptation Quality and Detecting Failures
• Related Work, Discussion, and Future Work
4. Example: Reconstructing a Missing Sensor
Temperature sensor
2015-04-25:15:07 33.292 118.541 35.2 26.2
2015-04-25:15:12 33.274 118.532 34.8 26.0
Reading
Reading
Reading
Location
timestamp
latitude
longitude
temperature
pressure
5. Example: Reconstructing a Missing Sensor
Temperature sensor
Reading
Reading
Reading
Location
timestamp
latitude
longitude
temperature
pressure
fNew sensor
2015-04-25:15:07 33.292 118.541 35.2 26.2
2015-04-25:15:12 33.274 118.532 34.8 26.0
6. Sensor Reconstruction without Overlapping Data
t
X1
X2
X3
model f(X1, X2 ,Y)
• We replace Xk with a new
sensor Y
• Learn a reconstruction function trained on the working sensors,
though there is no overlapping data between X and Y
Xkf( X1, X2, …, XK-1 , Y )
failed/target
sensor
working
sensors
Y
new
sensor
7. Notations of Individual Sensor Changes
1 2 … N
…
change point
old sensor
t
S1
S2
S3
SK-1
SK
…
8. 1 2 … N N+1 N+2 … N+M
… …
change point
old sensor
new sensor
t
S1
S2
S3
SK-1
SK
…
S1
S2
S3
SK-1
…
SK+1
SK+2
SK+P
SK is replaced by P new sensors: SK+1, … ,SK+P
Notations of Individual Sensor Changes
9. 1 2 … N N+1 N+2 … N+M
… …
change point
old sensor
new sensor
t
S1
S2
S3
SK-1
SK
…
S1
S2
S3
SK-1
…
SK+1
SK+2
SK+P
SK is replaced by P new sensors: SK+1, … ,SK+P
Source Domain Target Domain
X1 X2 XN Z1 Z2 ZM
… …
Notations of Individual Sensor Changes
10. Sensor-level Adaptation to Individual Sensor Changes
SK is replaced by P new
sensors SK+1…SK+P-1
Unexplored in previous work
Reconstruction function:
f(S1 S2 … SK-1 SK+1 … SK+P) SK
1 2 … N N+1 N+2 … N+M
X1 X2 XN Z1 Z2 ZM
… …
t
S1
S2
…
SK-1
SK SK+1
…
SK+P
11. Sensor-level Adaptation to Individual Sensor Changes
SK is replaced by P new
sensors SK+1…SK+P-1
Unexplored in previous work
Reconstruction function:
f(S1 S2 … SK-1 SK+1 … SK+P) SK
Challenge: no overlapping between SK and new sensors!
1 2 … N N+1 N+2 … N+M
X1 X2 XN Z1 Z2 ZM
… …
t
S1
S2
…
SK-1
SK SK+1
…
SK+P
12. Sensor-level Adaptation to Individual Sensor Changes
SK is replaced by P new
sensors SK+1…SK+P-1
Unexplored in previous work
Reconstruction function:
f(S1 S2 … SK-1 SK+1 … SK+P) SK
Challenge: no overlapping between SK and new sensors!
Intuition: S1, S2, …, SK-1 as the bridge
Assumption: S1, S2, …, SK-1 are correlated with SK, as well as SK+1, …, SK+P
1 2 … N N+1 N+2 … N+M
X1 X2 XN Z1 Z2 ZM
… …
t
S1
S2
…
SK-1
SK SK+1
…
SK+P
14. 1 2 … N N+1 N+2 … N+M
X1 X2 XN Z1 Z2 ZM
… …
t
X1 X2 XN
…
t
f
…
Sensor-level Adaptation to Individual Sensor Changes
1 2 … N N+1 N+2 … N+M
15. X1 X2 XN
… …
t
f
Two domains distribute similarly
Two sets of samples have similar distributions
Source Target
Sensor-level Adaptation to Individual Sensor Changes
1 2 … N N+1 N+2 … N+M
X1 X2 XN Z1 Z2 ZM
… …
t
1 2 … N N+1 N+2 … N+M
16. Two sets of samples have similar distributions
Two sets of samples mixed as much as possible
Xs’s k neighbors in the target domain ’s k neighbors in the source domain
Minimize cross-domain k-nearest neighbor distances
Sensor-level Adaptation to Individual Sensor Changes
X1 X2 XN
… …
t
Source Target
1 2 … N N+1 N+2 … N+M
17. ID Type Unit Range
1 Temperature °C 0.4 – 37.6
2 Dew point °C -9.4 – 18.4
3 Humidity % 11-90
4 Wind speed mph 0 – 38.6
5 Wind gust mph 0 – 46.7
Correlation between individual sensors by month
Five individual sensors from
WeatherUnderground
Two groups of patterns (Nov-Jan, Feb-Oct), suggesting
nonlinear models for modeling relationship among sensors
Relationships in Weather Data
18. Empirical Study
• Each station has 5-10 individual sensors, producing a sample every 5-10 minutes
• Sensor change simulation: an individual sensor is replaced by the same sensor at a
nearby station
• Source domain: Jan 2015-Aug 2015; target domain: Jan 2016-Aug 2016
• Adaptation errors: root mean square error between reconstructed signal and ground
truth
Sensor-level Adaptation to Individual Sensor Changes
19. Empirical Study
Ignoring new sensors: Regression on the
remaining old sensors
Missing value imputation: Predicting new
sensors’ readings on the source domain, then
do regression
Our approach: Learning
with previously Unseen
Features (LUF)
[Shi and Knoblock, ‘17]
Average improvement: 17.9%
Sensor-level Adaptation to Individual Sensor Changes
20. wind speed
reconstructed
pressure
Our approach (LUF) that
uses new sensors
Empirical Study
Sensor-level Adaptation to Individual Sensor Changes
Ignoring new sensors: Regression on the
remaining old sensors
22. Outline
• Learning to Replace a Failed Sensor
• Learning to Replace a Compound Sensor
• Assessing Adaptation Quality and Detecting Failures
• Related Work, Discussion, and Future Work
23. Example: Learning the device operation
Temperature sensor
Reading
Reading
Reading
Location
timestamp
latitude
longitude
temperature
pressure
2015-04-25:15:07 33.292 118.541 35.2 26.2
2015-04-25:15:12 33.274 118.532 34.8 26.0
24. Example: Automatically Adapting to Changes
Weather Station
Reading
Reading
Reading
Location
timestamp
latitude
longitude
temperature
pressure
2015-04-25:15:07 33.292 118.541 35.2 26
2015-04-25:15:12 33.274 118.532 34.8 27
New Weather Station
28-Apr-15 16:50:50 118 26 59 E 33 58 33 N 74 37.5
28-Apr-15 16:50:59 118 27 10 E 33 58 45 N 77 38.4
25. Example: Automatically Adapting to Changes
Weather Station
Reading
Reading
Reading
Location
timestamp
latitude
longitude
temperature
pressure
2015-04-25:15:07 33.292 118.541 35.2 26
2015-04-25:15:12 33.274 118.532 34.8 27
New Weather Station
28-Apr-15 16:50:50 118 26 59 E 33 58 33 N 74 37.5
28-Apr-15 16:50:59 118 27 10 E 33 58 45 N 77 38.4
Learn a tranformation program TT
26. Challenge: How to Automatically Adapt to a New
Sensor
• Problem
• The output of the new sensor is different than that of the original sensor that
software was designed to process
• Solution
• Synthesize a data adapter that transforms the new data into a format usable by the
software system
27. Identifying the Semantic Types of the Sensor Data
• Use machine learning techniques to learn to recognize different types of data
[Pham et al., 2016]
Type A
118.519
119.117
Unknown Type
34.6
33.5
Pairwise
similarity
features
Random
Forest
Yes
(Same type)
No
(Same type)
Unkown Type = A
X
28. Different similarity features of data
firstName
...
...
...
First Name
...
...
...
Similarity in
attribute names
Name
Gary Cahill
Juan Mata
De Gea
Player
Juan Quin
Tim Cahill
Metsul Ozeil
Similarity in values
# games
played
1
2
...
8
number of
games
2
3
...
10
Similarity in ranges of values
Value range similarityValue similarity
Attribute name similarity
29. Different similarity features of data
position
1
4
3
2
Player
GK
MF
DF
FW
Similarity in
historgram
# game
played
4
...
18
23
# goal
scored
3
...
11
22
No similarity
in distribution
Similarity in
value
No similarity
in value
Distribution similarity Histogram similarity
30. Evaluation
Number of
labeled
sources
1 2 3
Train on soccer 89.75 95.08 97.73
Train on
museum
89.75 95.08 97.73
Train on city 91.86 96.59 97.73
SemanticTyper 85.22 92.04 95.45
MRR performances of our approach on weather data (trained on different domains)
32. Transformation Learning (Future Work)
• Example: 3 => Mar
Query Webtable
database
Transformation inference
Transform
Transform data
to correct
format
? Jul
? May
7 ?
3 ?
1 January Jan
2 February Feb
... ... ...
7 Jul
5 May
7 Jul
3 Mar
33. Preliminary result for format changes
• Evaluation:
• 38 datasets including date/time, names, stress addresses, telephone numbers, dimensions
• Only contains cases that can be solve with just replacement
• Measurements: accuracy, average edit distance (compared with groundtruth)
• Some examples that work well:
Accuracy Avg edit
distance
Original avg
edit distance
Improvement
on edit distance
0.58 3.5 18.21 81%
Format change Accuracy Avg edit distance Original avg edit
distance
Improvement on
edit distance
dd mm yyyy dd.mm.yy 1 0 4 100%
[middle_name] last_name; first_name [(c)]
first_name [middle_name] last_name [(c)]
0.862 0.91 5.8 84%
height” [H] x weight” [W] x [depth” [D]] => weight 0.967 0.11 17.79 99%
34. Outline
• Learning to Replace a Failed Sensor
• Learning to Replace a Compound Sensor
• Assessing Adaptation Quality and Detecting Failures
• Related Work, Discussion, and Future Work
35. Adaptation Performance Estimation and Sensor Change Detection
How Good is An Adaptation?
• Provide upper-layer software with an estimation of adaptation error
• Select optimal adaptation strategy
Approach
• Simulate sensor failures
• Simulate failures of one or multiple sensors at random time point from
historical data
• Compute the adaptation error for each adaptation strategy and store into library
• New sensor failure: match the most similar case from library
adaptation strategies
f1(S1) S3
f2(S2) S3
f3(S1,S2) S3
36. Adaptation Performance Estimation and Sensor Change Detection
adaptation strategies:
f1(S1) S3
f2(S2) S3
f3(S1,S2) S3
S3 = 2S1 + 3S2 – 0.5, error = 0.2
error bound can be derived:
(2S1 + 3S2 – S3 – 0.5)2 < 0.22
Can we use it to detect sensor changes?
37. Adaptation Performance Estimation and Sensor Change Detection
Sensor Change Detection
change or not?
Error bounds derived from adaptation strategies
S1 S2 S3
(S1, S2) (S1, S3) (S2, S3)
(S1, S2, S3)
Violated: at least one sensor was changed
Using logical inference: S1 changes | both S2 and S3 change
simpler
38. Outline
• Learning to Replace a Failed Sensor
• Learning to Replace a Compound Sensor
• Assessing Adaptation Quality and Detecting Failures
• Related Work, Discussion, and Future Work
39. • Detecting Sensor Failures and Changes
• Change point detection [Aminikhanghahi and Cook ‘16] [Pimentel et al., ‘14]
• Distribution-based [Kawahara and Sugiyama, ‘12] [Harchaoui et al., ‘09] [Yamanishi and
Takeuchi, ‘02]
• Reconstruction-based [Crook et al., ‘02] [Singh and Markou, ‘04] [Ide and Tsuda, ‘07]
[Chatzigiannakis et al., ‘06]
• Probabilistic [Adams and MacKay, ‘07] [Saatci et al., ‘10] [Dereszynski and Dietterich, ‘12]
[Dietterich et al. ‘12]
• Distance-based [Angiulli and Pizzuti, ‘02] [Bay and Schwabacher, ‘03] [Chawla and Sun, ‘06]
[Keogh et al., ‘01] [Budalakoti et al., ‘06] [Chen et al., ‘15]
• Reconstruction of Sensor Readings
• Most detection methods do not address how to automatically recover
• Some probabilistic methods [Dereszynski and Dietterich, ‘12] [Dietterich et al. ‘12] can be used
to reconstruct changed sensor, but cannot leverage new sensors
• FFX [McConaghy ‘11] is applied to extract sensor-specific transformations
Related Work
41
Our approach explores multiple nonlinear relationship among sensors, and can
potentially detect sensor changes with significantly higher accuracy
Our approach can adapt to new sensors, which are not possible by existing approaches
40. Related Work:
• String transformation: Most existing approaches requries one-to-one mapping in
training data to work.
• Singh, Rishabh, and Sumit Gulwani. "Transforming spreadsheet data types using
examples." ACM SIGPLAN Notices. Vol. 51. No. 1. ACM, 2016.
• Wu, Bo, and Craig A. Knoblock. "An Iterative Approach to Synthesize Data
Transformation Programs." IJCAI. 2015.
• Semi-auto data cleaning: Most existing approaches requires human interaction to
provide training data and curate the generated results.
• Scaffidi, Christopher. Topes: Enabling end-user programmers to validate and reformat data.
Diss. University of Nebraska-Lincoln, 2009.
• Raman, Vijayshankar, and Joseph M. Hellerstein. "Potter's wheel: An interactive data
cleaning system." VLDB. Vol. 1. 2001.
41. Discussion
• Presented techniques for
• Reconstructing numeric sensor values
• Reconstructing an compound failed sensor from new sensor
• Assessing the accuracy of a reconstructed sensor and identifying failures
• Many applications where these techniques could be applied
• Geoscientists collecting data about the earth
• Medical devices where information is missing
• Sensors on mobile phones where sensors may be too costly to run
• Etc…
42. Conference Talk on Thursday
• Learning with Previously Unseen Features
Yuan Shi & Craig A. Knoblock
• Thursday, August 24 16:30-18:00 (Yuan will talk at 17:15)
• ML-TAML3 – Transfer, Adaptation, Multi-Task Learning 3 (212)