Explaining Potential Risks in Traffic Scenes by Combining Logical Inference and Physical Simulation
1. Explaining Potential Risks in Traffic Scenes
by Combining Logical Inference and
Physical Simulation
Ryo Takahashi*1, Naoya Inoue*1,
Yasutaka Kuriya*2, Sosuke Kobayashi*1,
and Kentaro Inui*1
*1: Tohoku University
*2: DENSO CORPORATION
2. Predict potential risks in the traffic scene in the city
12/09/16 2
Research goal
① Woman raises
her hand
② Taxi suddenly stops
③ Ego-vehicle
collides with the taxi
3. Predict potential risks in the traffic scene in the city
12/09/16 3
Research goal
① Woman raises
her hand
② Taxi suddenly stops
③ Ego-vehicle
suddenly stops
④ Ego-vehicle collides with
green truck from behind
4. Predict potential risks in the traffic scene in the city
12/09/16 4
Research goal
① Woman raises
her hand ③ Ego-vehicle
suddenly stops
② Taxi suddenly stops
④ Ego-vehicle collides with
green truck from behind
③ Ego-vehicle
collides with the taxi
Recognize potential risks with causal chains
Refer to as “explanation”
5. 12/09/16 5
Definition of traffic risk recognition
Input
Information obtained from sensors
(e.g., vision cameras and LIDARs)
• Quantitative data
(shape, position, velocity)
• Qualitative data
(a scene description of a traffic scene)
Output
Explanations of the risks with the
likelihood score
(divide into two stages)
Score Entity-action
0.8 Taxi(T) stops
0.5 Pedestrian(P)
cross the road
Scene description:
Pedestrian(P), Taxi(T),
Me(Me), LeftFrontOf(T, P), …
(shape, position, velocity)
Score Entity-action Explanation
0.8 Taxi(T) stops Woman raises
her hand, Taxi
suddenly stops,
then the ego-
vehicle collides.
0.5 Pedestrian(P)
cross the road
...• Pair of a colliding entity and its action
• Whole explanation
6. 12/09/16 6
Key idea
Combine logical inference and physical simulationRecognizing Potential Traffic Risks through Logic-based Deep Scene Understanding
cases in real-life applications.
The main idea of our approach is as follows. We keep the prediction rules as general as
possible, and try to find the best explanation for the condition of general prediction rules
based on multiple pieces of evidence at different levels of abstraction. One can view the pro-
cess of finding the best explanations as the projection of observed information onto a latent
space of features useful for risk prediction. Introducing a latent feature space has been proved
effective for a wide range of AI tasks [4, 8].
In our task, the latent information we particularly consider useful includes the beliefs,
desires, and intentions (BDI) of each traffic object appearing in a given traffic scene. If the
BDI information of objects is available, it can then be used to predict their next actions, which
leads to a better prediction of potential risks. BDI information is not observable, but may
sometimes be inferable from the observable behavior of the object and the observable infor-
Abductive!
Reasoner
Image!
Description
Knowledge Bases
旬
者
陰 一 =
歩 行
一 = 画
』
4 0 k m/ h で進行して
います。どのような
ことに注意して運転
しますか。
□①学撰が獣飛び曲してくるかもしれないので、速度を落としていつでも止まれるようにする。
そ く ど お と
□②孝養が割こ柔び曲してくるかもしれないので、鰹な齢をあけるため対向車線にはみ出して
たいこうしやせん
そうこう
走行する。
n③字撰は畠軍に美符いて飛び曲してこないと思うので、このままの速度で走行する。
お も そ く ど そ う こ う
17
1「in'y
F
が
B O k m / h で進行して
います。どのような
ことに注意して運転
しますか。
□①ポー ル は艶の若櫛に毒がっていきそうなので、 左端によってこのままの雛で雛する。
ひだり芯し
お こ ど も と だ と そ く ど
□②ボールを追いかけて子供が飛び出してくるかもしれないので、いつでも止まれるように速度を
曇 と す 。 ” い … “お こ ど 巷 と だ つ う か
□③ボールを追いかけて子供が飛び出してこないよう、警音器を鳴らしてこのまま通過する。
ロ①孝養も麦樫も自重に鍋いて飛び息してこないと窯うので、このまま走行する。
そうこう
□②学鴬が乗ぴ曲さないように女性が声をかけてくれているので、す速く通過する。
じ よ せ い こ え ば や つ う か
、③学撰は崇に謡いていないかもしれないので、 大きく右によけて通過する。
お お み ざ つ う か
極 駒
『
ー
■
、1
、
﹃
ず
r臼
﹃
︷
﹃
y3 0 k m/ h で進行して
います。どのような
ことに注意して運転
しますか。
L
デー息請「 同黙
糞で 藷
r
画
弓 一
F1寓一 | 陸
問題③
問題2
問題.ロー ‐ ヴ ー ー ー ー
X follows Y.!
X does not recognize me.!
The scene has!
a potential risk.!
explains!
via (c)!
explains!
via (a)
unification
There is a person X.
X will rush out.!
X is a child.!
There is a ball Y.!
Z does not recognize me.!
There is a person Z.
explains!
via (b)
unification (hypothesize X=Z)
BDI-level!
representation
Ground-level!
representation
(a) Causality (b) Ontological!
knowledge
(c) Risk pattern
Best Explanation!
(Implicit information &!
potential risks)
Z is behind the wall.!
Z is a child.
explains!
via (b)!
(A)
(B)
(C)(D)
(E)
(F)
Image!
Description! wall(Wall), soccer-ball(Ball), car(Me), car(RedCar), left-front-of(Now, Wall, Me), ...
Fig 2. Overall framework. Arrows with alphabets represent information flow.
Logical Inference
(apply symbolic rules to a scene description and
infer new facts)
[Armand+ 14, Inoue+ 15, Mohammad+ 15, Zhao+ 15, etc.]
PROS: Easy to take causal chain into account
PROS: Can combine human writing knowledge
CONS: The prediction is not based on a precise
physical prediction
[Althoff+ 09]
[Inoue+ 15]
00 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 2, JUNE
Fig. 2. Stochastic reachable sets of traffic participants.
Physical Simulation
(simulate a scene and detect future collisions)
[Broadhurst+ 05, Althoff+ 09, etc.]
PROS: Precise prediction grounded in physical
quantities
CONS: Difficult to take causal chain into account
7. 12/09/16 7
Key idea
Combine logical inference and physical simulationRecognizing Potential Traffic Risks through Logic-based Deep Scene Understanding
cases in real-life applications.
The main idea of our approach is as follows. We keep the prediction rules as general as
possible, and try to find the best explanation for the condition of general prediction rules
based on multiple pieces of evidence at different levels of abstraction. One can view the pro-
cess of finding the best explanations as the projection of observed information onto a latent
space of features useful for risk prediction. Introducing a latent feature space has been proved
effective for a wide range of AI tasks [4, 8].
In our task, the latent information we particularly consider useful includes the beliefs,
desires, and intentions (BDI) of each traffic object appearing in a given traffic scene. If the
BDI information of objects is available, it can then be used to predict their next actions, which
leads to a better prediction of potential risks. BDI information is not observable, but may
sometimes be inferable from the observable behavior of the object and the observable infor-
Abductive!
Reasoner
Image!
Description
Knowledge Bases
旬
者
陰 一 =
歩 行
一 = 画
』
4 0 k m/ h で進行して
います。どのような
ことに注意して運転
しますか。
□①学撰が獣飛び曲してくるかもしれないので、速度を落としていつでも止まれるようにする。
そ く ど お と
□②孝養が割こ柔び曲してくるかもしれないので、鰹な齢をあけるため対向車線にはみ出して
たいこうしやせん
そうこう
走行する。
n③字撰は畠軍に美符いて飛び曲してこないと思うので、このままの速度で走行する。
お も そ く ど そ う こ う
17
1「in'y
F
が
B O k m / h で進行して
います。どのような
ことに注意して運転
しますか。
□①ポー ル は艶の若櫛に毒がっていきそうなので、 左端によってこのままの雛で雛する。
ひだり芯し
お こ ど も と だ と そ く ど
□②ボールを追いかけて子供が飛び出してくるかもしれないので、いつでも止まれるように速度を
曇 と す 。 ” い … “お こ ど 巷 と だ つ う か
□③ボールを追いかけて子供が飛び出してこないよう、警音器を鳴らしてこのまま通過する。
ロ①孝養も麦樫も自重に鍋いて飛び息してこないと窯うので、このまま走行する。
そうこう
□②学鴬が乗ぴ曲さないように女性が声をかけてくれているので、す速く通過する。
じ よ せ い こ え ば や つ う か
、③学撰は崇に謡いていないかもしれないので、 大きく右によけて通過する。
お お み ざ つ う か
極 駒
『
ー
■
、1
、
﹃
ず
r臼
﹃
︷
﹃
y3 0 k m/ h で進行して
います。どのような
ことに注意して運転
しますか。
L
デー息請「 同黙
糞で 藷
r
画
弓 一
F1寓一 | 陸
問題③
問題2
問題.ロー ‐ ヴ ー ー ー ー
X follows Y.!
X does not recognize me.!
The scene has!
a potential risk.!
explains!
via (c)!
explains!
via (a)
unification
There is a person X.
X will rush out.!
X is a child.!
There is a ball Y.!
Z does not recognize me.!
There is a person Z.
explains!
via (b)
unification (hypothesize X=Z)
BDI-level!
representation
Ground-level!
representation
(a) Causality (b) Ontological!
knowledge
(c) Risk pattern
Best Explanation!
(Implicit information &!
potential risks)
Z is behind the wall.!
Z is a child.
explains!
via (b)!
(A)
(B)
(C)(D)
(E)
(F)
Image!
Description! wall(Wall), soccer-ball(Ball), car(Me), car(RedCar), left-front-of(Now, Wall, Me), ...
Fig 2. Overall framework. Arrows with alphabets represent information flow.
Logical Inference
(apply symbolic rules to a scene description and
infer new facts)
[Armand+ 14, Inoue+ 15, Mohammad+ 15, Zhao+ 15, etc.]
PROS: Easy to take causal chain into account
PROS: Can combine human writing knowledge
CONS: The prediction is not based on a precise
physical prediction
[Althoff+ 09]
[Inoue+ 15]
00 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 2, JUNE
Fig. 2. Stochastic reachable sets of traffic participants.
Physical Simulation
(simulate a scene and detect future collisions)
[Broadhurst+ 05, Althoff+ 09, etc.]
PROS: Precise prediction grounded in physical
quantities
CONS: Difficult to take causal chain into account
Complementary
• How to combine the two
components?
• No previous work addresses
this issue
8. 12/09/16 8
Pipeline re-ranking model
Scene
Description
Knowledge Base
Logical
Inference
Physical
Simulation
Quantitative
Data
Given
Given
Explanations
of the Risks
Output
Abductive Reasoner
[Inoue+ 15]
• Generate candidate risks using
qualitative information
• “What entity will take how risky
action in this scene?”
Action-based Physical
Simulation
• Re-score the candidates of risks
using quantitative information
• “Is the risk candidate really
risky?”
9. 12/09/16 9
Evaluation | Dataset and task setting
Risk recognition system
Risky entity-action
ranking
+metadata Original dataset
• Drive Recorder Database by the Tokyo
University of Agriculture and Technology
• Over 100,000 video recordings of near-miss
accidents
• Scale of knowledge base:
• “such situation it might act like this”
knowledge: 13 types
• Is-a knowledge: 211 types
2-d bird view map
• Two seconds before the actual risk
• Manually created 379 scenes
• training : test : validation = 3 : 1 : 1
10. 12/09/16 10
Evaluation | Models 1/3
Compare three models
Scene
Description
Knowledge Base
Quantitative
Data
Given
Given
Explanations
of the Risks
Output
ML-based
Ranker
Baseline (SVM):
• Directly models between scene
description and a risky
entity-action pair
• Trained by using ranking-SVM
[Joachims 02]
Proposed1 (Inference):
• Uses only qualitative information
Proposed2 (Inf+PhySim):
• The full model
11. 12/09/16 11
Evaluation | Models 1/3
Compare three models
Baseline (SVM):
• Directly models between scene
description and a risky
entity-action pair
• Trained by using ranking-SVM
[Joachims 02]
Proposed1 (Inference):
• Uses only qualitative information
Proposed2 (Inf+PhySim):
• The full model
Scene
Description
Knowledge Base
Quantitative
Data
Given
Given
Explanations
of the Risks
Logical
Inference
Output
12. 12/09/16 12
Evaluation | Models 1/3
Compare three models
Baseline (SVM):
• Directly models between scene
description and a risky
entity-action pair
• Trained by using ranking-SVM
[Joachims 02]
Proposed1 (Inference):
• Uses only qualitative information
Proposed2 (Inf+PhySim):
• The full model
Scene
Description
Knowledge Base
Quantitative
Data
Given
Given
Explanations
of the Risks
Logical
Inference
Physical
Simulation
Output
13. 12/09/16 13
Qualitative evaluation
BASELINE 52.8 (38/72) 80.6 (58/72) 90.3 (65/72) 55.6 (40/
INFERENCE 51.4 (37/72) 80.6 (58/72) 90.3 (65/72) 58.3 (42/
INF+PHYSIM 51.4 (37/72) 81.9 (59/72) 90.3 (65/72) 58.3 (42/
Fig. 4: Example trajectories our physics simulator output. Scene ID
is 1397 on the database. The boxes represent vehicles, and the lines
that drawn from vehicles represent trajectories.
machine learning-based ranker or classifier, it is relatively harder to
predict this kind of richer explanations.
Finally, we compare INFERENCE with INF+PHYSIM. The results
indicate that physics simulation did not improve the results of
qualitative inference. In future work, we will conduct a deeper
analysis on the results of physics simulation and refine the entire
framework to improve the results.
[5] Mahmud Ab
Rossitza Setc
scenes using
Based and In
Singapore, 7
[6] Lihua Zhao,
Kakinami, a
uncontrolled
Vehicles Sym
2015, pages
[7] Naoya Inoue
Recognizing
Understandin
[8] Kiken-yosoku
[9] A. Broadhur
reasoning. In
pages 319–3
[10] Matthias Al
probabilistic
Intelligent Tr
[11] Amol Amba
Nicolescu.
EURASIP J.
[12] Andreas Ess
A mobile v
IEEE Compu
Actual risk: Car will change lanes to avoid ParkingCar
① There is a parked
car in front of Car
② There is a lane on
the right side of Car
③ Car changes
lane to the right
④ Me collides with
Car in 3.6 seconds
Model Predicted risky
entity-action
Explanation Quantitative
prediction
Baseline Car will stop X X
Proposed1 Car will change lanes ✔ X
Proposed2 Car will change lanes ✔ ✔
Model Predicted risky
entity-action
Explanation
Baseline Car will stop X
Proposed1 Car will change lanes ✔
Proposed2 Car will change lanes ✔
Model Predicted risky
entity-action
Baseline Car will stop
Proposed1 Car will change lanes
Proposed2 Car will change lanes
14. 12/09/16 14
Quantitative evaluation
• Adding abductive reasoning
and physical simulation has
not yet affected prediction
accuracy positively
• Future work: improve the
accuracy
Baseline (SVM):
• Directly models between scene
description and a risky
entity-action pair
• Trained by using ranking-SVM
[Joachims 02]
Proposed1 (Inference):
• Uses only qualitative information
Proposed2 (Inf+PhySim):
• The full model
cture and traffic objects (position and direction of
t). The physical simulator uses this information to
he physical data. The logical representation of each
ene is automatically generated from the map
to Section IV-A1. In this experiment, we assume
acy of sensor technologies to be perfect, in order to
exploring the methodology of risk prediction.
luate our results by using a task-ranking framework.
Acc@ k (Accuracy at k ) as a metric, which is the
f problems for which the correct prediction is made
nk k .
et
a database of near-miss accidents published by the
niversity of Agriculture and Technology. The
Fig. 4. Example trajectories produced as output by our physics sim
Scene ID is 1397 on the database. The boxes represent vehicles, and th
drawn from vehicles represent trajectories.
directly models a mapping between observed inform
and a risky entity-action tuple. As mentioned in Se
IV-A2, the weight vector wc
is trained by using a ran
SVM [20]; therefore, the result of this baseline indicate
basic performance of a statistical machine-learning-b
TABLE III: ACCURACY OF RISK PREDICTION MODELS
Validation Test
l Acc@1 Acc@3 Acc@5 Acc@1 Acc@3 Acc@5
LINE 52.8 (38/72) 80.6 (58/72) 90.3 (65/72) 55.6 (40/72) 77.8 (56/72) 93.1 (67/72)
RENCE 51.4 (37/72) 80.6 (58/72) 90.3 (65/72) 58.3 (42/72) 77.8 (56/72) 91.7 (66/72)
PHYSIM 51.4 (37/72) 81.9 (59/72) 90.3 (65/72) 58.3 (42/72) 77.8 (56/72) 91.7 (66/72)
road structure and traffic objects (position and direction of
movement). The physical simulator uses this information to
simulate the physical data. The logical representation of each
traffic scene is automatically generated from the map
according to Section IV-A1. In this experiment, we assume
the accuracy of sensor technologies to be perfect, in order to
focus on exploring the methodology of risk prediction.
We evaluate our results by using a task-ranking framework.
We use Acc@ k (Accuracy at k ) as a metric, which is the
fraction of problems for which the correct prediction is made
within rank k .
B. Dataset
We use a database of near-miss accidents published by the
Tokyo University of Agriculture and Technology. The
TABLE III: ACCURACY O
Validation
Model Acc@1 Acc@3 Acc@5
BASELINE 52.8 (38/72) 80.6 (58/72) 90.3 (65/
INFERENCE 51.4 (37/72) 80.6 (58/72) 90.3 (65/
INF+PHYSIM 51.4 (37/72) 81.9 (59/72) 90.3 (65/
15. • Combined logical inference and physical
simulation to explain risky situation
– Very first step
• Built a benchmark dataset consisting of real-life
traffic scenes
• Future work:
– Extend the knowledge base
– Extend the scale of evaluation
– Automatize describing scene
12/09/16 15
Summary
16. • [Armand+ 14] A. Armand, D. Filliat, and J. I. Guzman, “Ontology based context
awareness for driving assistance systems,” In 2014 IEEE Intelligent Vehicles Symp.
Proc., Dearborn, MI, USA, June 8-11, 2014, pp. 227–233, 2014.
• [Inoue+ 15] N. Inoue, Y. Kuriya, S. Kobayashi, and K. Inui, “Recognizing Potential
Traffic Risks through Logic-based Deep Scene Understanding,” In Proc. 22nd ITS
World Congress, 2015.
• [Mohammad+ 15] M. A. Mohammad, I. Kaloskampis, Y. Hicks, and R. Setchi,
“Ontology-based framework for risk assessment in road scenes using videos,” In
19th Int. Conf. in Knowledge Based and Intelligent Information and Engineering
Systems, KES 2015, Singapore, 7-9 September 2015, pp. 1532–1541, 2015.
• [Zhao+ 15] L. Zhao, R. Ichise, T. Yoshikawa, T. Naito, T. Kakinami, and Y. Sasaki,
“Ontology-based decision making on uncontrolled intersections and narrow
roads,” In 2015 IEEE Intelligent Vehicles Symp., IV 2015, Seoul, South Korea, June
28 - July 1, 2015, pp. 83–88, 2015.
• [Broadhurst+ 05] A. Broadhurst, S. Baker, and T. Kanade, “Monte Carlo road safety
reasoning,” In Proc. IEEE Intelligent Vehicles Symp., 2005, pp. 319–324, June 2005.
• [Althoff+ 09] M. Althoff, O. Stursberg, and M. Buss, “Model-based probabilistic
collision detection in autonomous driving,” IEEE Trans. Intell. Transport. Syst., vol.
10, no. 2, pp. 299–310, 2009.
• [Joachims 02] T. Joachims, “Optimizing search engines using clickthrough data,”
In Proc. Eighth ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining,
July 23-26, 2002, Edmonton, Alberta, Canada, pp. 133–142, 2002.
12/09/16 16
References