Imitation Learning for Vision-based Lane Keeping Assistance ITSC’17 Workshop on Deep Learning for Autonomous Driving
Christopher Innocenti∗ , Henrik Lind´en∗ , Ghazaleh Panahandeh∗ , Lennart Svensson† , Nasser Mohammadiha∗† October 16-19, 2017 - Yokohama, Japan Zenuity AB∗ , Chalmers University of Technology† , Gothenburg, Sweden
Outline
1. Introduction 2. Proposed Methods 3. Experimental Results 4. Summary
1
Introduction
Introduction
Problem Statement and Motivation
1. How to predict lateral control signals from camera images? • Modular system approach • Explicit control of data flow through sub-modules
• Holistic system approach • Intermediate levels of processing abstracted away • Don’t require explicit feature engineering • Don’t require object/semantics annotated data for learning
2. How to evaluate control signals and performance in a good way? • Vehicle tests • Realistic
• Closed loop simulation • Fast, inexpensive, safe
2
Introduction
Background and Contributions
• End to end learning for self-driving cars [Bojarski et al., 2016b] • PilotNet, 9 layer CNN with ∼250k trainable parameters • Images captured from 3 front cameras for training • Lane following with 98% level of autonomy
• Our contributions • Experimentally show that data augmentation might not be necessary for learning LKA functionality (using a model based on PilotNet) • Propose 2 metrics for numeric evaluation based on safety (positioning) and comfort (trajectory smoothness)
3
Proposed Methods
Proposed Methods
Approach
• Model a driving policy πθ as a convolutional neural network • Find θ∗ = argminθ E(s,a)∼dπ∗ [l(a, πθ (s))] by supervised learning • Learn a mapping from states to actions using state–action pairs sampled from regular highway driving, without data augmentation Expert action: a
State: s
Policy: πθ
Action: πθ (s)
− Loss: l(a, πθ (s))
Policy adjustment
4
Proposed Methods
Data
• 640×480 images from Volvo Cars test expeditions • Dataset selection of approximately 2.5M images sampled at 20Hz
5
Proposed Methods
Actions
• Actions (and states) pruned to be more ”uniformly” distributed • Resulting dataset of approximately 1.4M state–action pairs (27h) • Steering wheel angle → curvature
−1
−0.5
0
0.5
SWA [rad]
1
−1
−0.5
0
0.5
1
SWA [rad] 6
Proposed Methods
CNN Architecture
• Based on PilotNet [Bojarski et al., 2016b] • 264k trainable parameters • Trained for 9 epochs of the dataset • Performance?
1
10
FC4
50
FC3
100
FC2
1216
FC1
FLAT.
C5
1 × 16 × 76
C4
3 × 18 × 64
C3
5 × 20 × 48
C2
14 × 43 × 36
C1
32 × 89 × 24
68 × 182 × 1
IMAGE
PRE.
1 r
7
Proposed Methods
Positioning Penalty
• Penalty width w and shape factor β road and situation dependent
ep (d; w , β) =
1
d <0 d w
(βw ) − βd 0
0≤d ≤w d >w
Positioning penalty
wL
wl
wL0
wr
β=1 β = 0.1 β = 0.01
1 0.5 0 0
w /2
w
Distance to lane marking: d
vehicle
dl
dr wv
8
Proposed Methods
Discomfort Penalty
• Based on penalty from a vehicle motion model [S¨ orstedt et al., 2011] 2 3 • Comfort level g ≈ 1.8 m/s (or 1.8 m/s ) [Felipe and Navin, 1998, Xu et al., 2015] 2 y2 g ed (y ; g ) = 5+ 6
if x < g 2
y 6g 2
6
if x ≥ g
Level of discomfort
10 Comfortable Uncomfortable
5
0 0
1 2 Lateral acceleration
m s2
3 or jerk sm3
vehicle
9
Experimental Results
Experimental Results
Reality Gap
Example: Visual backpropagation [Bojarski et al., 2016a] • Pixel regions that contribute most to control decision/prediction
Volvo (real)
CarMaker (synthetic)
Unity (synthetic)
10
Experimental Results
Discomfort
s2
acceleration
m
Example: Lateral acceleration and jerk
0 −2
πθ π∗
2,900
3,000
3,100
3,200
3,300
s3
jerk
m
4
3,400 πθ π∗
2 0 −2 2,900
3,000
3,100
3,200
3,300
3,400
Road distance [m] 11
Experimental Results
Performance
Example: 34km road geometry, β = 0.01, wl = wr = 0.4m, g = 1.8 • Positioning: badly positioned only ∼ 2% of the time • Acceleration: ∼ 9% more uncomfortable, but still comfortable • Jerk: ∼ 279% more uncomfortable, but still comfortable πθ
π∗
πθ /π ∗
0.006 0.013
0 0
– –
Avg ed (yacc ; g ) Max ed (yacc ; g )
0.152 13.283
0.140 11.890
1.086 1.117
Avg ed (yjerk ; g ) Max ed (yjerk ; g )
0.064 19.816
0.023 11.230
2.783 1.765
Penalty Metric Avg ep (dl ; wl , β) Avg ep (dr ; wr , β)
12
Summary
Summary
Conclusions
• The learnt policy seems to provide robust behaviour in simulated environments without data augmentation. • Instantaneous decisions provides noisy behaviour, filtering based on previous decisions improves the driving behaviour. • More research on safety and verification aspects by understanding the internal representations of networks needed. More videos at: http://goo.gl/MKKnuF
13
References i
Bojarski, M., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., Muller, U., and Zieba, K. (2016a). Visualbackprop: visualizing cnns for autonomous driving. arXiv preprint arXiv:1611.05418. Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. (2016b). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316. Felipe, E. and Navin, F. (1998). Canadian researchers test driver response to horizontal curves. Road Management & Engineering Journal TranSafety, Inc, 1.
References ii
S¨ orstedt, J., Svensson, L., Sandblom, F., and Hammarstrand, L. (2011). A new vehicle motion model for improved predictions and situation assessment. IEEE Transactions on Intelligent Transportation Systems, 12(4):1209–1219. Xu, J., Yang, K., Shao, Y., and Lu, G. (2015). An experimental study on lateral acceleration of cars in different environments in sichuan, southwest China. Discrete Dynamics in Nature and Society, 2015.