The document discusses using randomized recurrent neural networks and signature-based methods for machine learning in finance. It proposes splitting the input-output map of a dynamical system into a "reservoir" part and a linear "readout" part. The signature of the input signal provides a natural candidate for the reservoir, as it is point-separating and linear functions on the signature can approximate continuous functionals via the universal approximation theorem. The goal of the talk is to prove how dynamical systems can be approximated using randomized recurrent networks, with precise convergence rates, and to view randomized deep networks through this lens.
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Machine Learning in Finance via Randomization
1. Machine Learning in Finance via Randomization
Josef Teichmann
(based joint work with Lukas Gonon, Christa Cuchiero, Lyudmila
Grigoryeva, and Juan-Pablo Ortega)
ETHZ
Universität Klagenfurt
Josef Teichmann (ETHZ) Randomized signature May 2022 1 / 30
2. Introduction Motivation
Data driven models
Key question: how to gain a quantitative understanding in dynamic decision
making in Finance or Economics?
Classical approach: specify a model pool with well understood
characteristics depending on as few as possible parameters (Occam’s razor),
calibrate them to data and solve optimization problems thereon.
Machine Learning approach: highly overparametrized function families are
used instead of a few parameter families for constructing model pools,
strategies for optimization problems, etc. Apply learning procedures and
data (real or artificially generated) to obtain parameter configuration which
perform well (e.g. Deep Hedging).
Machine Learning relies on different kinds of universal approximation
theorems for, e.g.
I Feed forward networks, Recurrent neural networks, LSTMs, Neural
(stochastic) differential equations, Signature-based models, etc,
I and on training models on data.
Josef Teichmann (ETHZ) Randomized signature May 2022 2 / 30
3. Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
4. Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
5. Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
6. Introduction Motivation
Training ...
... is an enigmatic procedure:
⇒ it often means applying an algorithm and not necessarily solving a
well defined problem.
Architectures, training hyperparameters, training complexity, training
data have to be carefully adjusted with a lot of domain knowledge to
make things finally work. Good habits of classical modeling are left
behind (no convergence rates, overparametrization, no sophisticated
training algorithms, large amounts of sometimes quite different data,
etc).
Results are in several cases fascinating and far reaching.
Robustness is almost too good to be true (Deep Hedging, Deep
Optimal Stopping).
Josef Teichmann (ETHZ) Randomized signature May 2022 3 / 30
7. Introduction Motivation
Randomization and Provability
Key question: how to gain a quantitative understanding why artificial
traders work so well?
First step: Trained randomized networks as a rough role model for what
might be a result of early stopped training but with a perspective of
provability. See, e.g., works of Jakob Heiss and Hanna Wutte in my working
group.
Second step: Randomized networks can appear as almost optimal choice
(Reservoir Computing).
Third step: Reduction of training complexity is a feature (sustainable
machine learning).
Josef Teichmann (ETHZ) Randomized signature May 2022 4 / 30
8. Introduction Motivation
Randomized networks: an older perspective? Ideas from
Reservoir computing
... to ease training procedures
... going back to Herbert Jäger with many contributions from Claudio
Gallicchio, Lyudmila Grigoryeva and Juan-Pablo Ortega, et al.
Typically an input signal is fed into a fixed (random) dynamical
system, called reservoir, which maps the input usually to higher
dimensions.
Then a simple (often linear) readout mechanism is trained to read the
state of the reservoir and map it to the desired output.
The main benefit is that training is performed only at the readout
stage while the reservoir is fixed and untrained.
Reservoirs can in some cases be realized physically and learning the
readout layer is often a simple (regularized) regression.
Josef Teichmann (ETHZ) Randomized signature May 2022 5 / 30
9. Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
10. Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
11. Introduction Motivation
Main paradigm and a motivating example
Main paradigm of reservoir computing
Split the input-output map, e.g. from initial values and the driving signal to the
observed dynamics into a
generic part, the reservoir, which is not or only in small parts trained
a readout part, which is accurately trained and often linear.
To illustrate this methodology let us consider as motivating example the following
task from finance:
Goal: learn from observations of time series data of a stock price its
dependence on the driving noise, e.g. Brownian motion
As it is easy to simulate Brownian motion, the learned relationship from the
driving Brownian motion to the price data, allows to easily simulate stock
prices, e.g. for risk management.
⇒ Market generators
Josef Teichmann (ETHZ) Randomized signature May 2022 6 / 30
12. Introduction Motivation
Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:
Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
13. Introduction Motivation
Reservoir computing for market generation
Assume that N market factors (e.g. prices, volatilities, etc.) are described by
dYt =
d
X
i=0
V i
(Yt)dBi
(t) , Y0 = y ∈ RN
with V i
: RN
→ RN
and d independent Brownian motions Bi
, i = 1, . . . , d.
From now on the 0-th component shall always be time, i.e. dB0
(t) = dt.
We want to learn the map
F : (input noise B) → (solution trajectory Y ),
without knowing V . This is a generically complicated map.
Idea: Split the map in two parts:
I a universal (fixed) reservoir X, no
dependence on the specific
dynamics of Y
I linear readout W that needs to be
trained such that Y ≈ WX
B
F //
R
Y
X
W
??
Josef Teichmann (ETHZ) Randomized signature May 2022 7 / 30
14. Introduction Motivation
Learning from market observations
R
W
??
Reservoir Linear Readout
Josef Teichmann (ETHZ) Randomized signature May 2022 8 / 30
15. Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
16. Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
17. Introduction Motivation
Randomized recurrent neural networks
Can we understand why randomized recurrent neural networks are
excellent feature extractors (or reservoirs)?
There is a natural candidate, namely the infinite dimensional
signature of the driving signal, which serves as (universal) linear
regression basis for continuous path functionals.
Signature goes back to K. Chen (’57) and plays a prominent in rough
path theory (T. Lyons (’98), P. Friz N. Victoir (’10), P. Friz
M. Hairer (’14)).
In the last few years there have been many papers (e.g. Levin, Lyons,
and Ni (2016)) showing how to apply rough path theory and
signature methods to machine learning and time series analysis.
Josef Teichmann (ETHZ) Randomized signature May 2022 9 / 30
18. Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
19. Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
20. Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
21. Introduction Motivation
Learning the dynamics via signature
Signature is point-separating:
I The signature of a d-dimensional (geometric rough) path, including in particular
Brownian motion and smooth curves, uniquely determines the path up to tree-like
equivalences. These tree-like equivalences can be avoided by adding time.
Linear functions on the signature form an algebra that contains 1:
I Every polynomial on signature may be realized as a linear function via the so-called
shuffle product .
⇒ By the Stone-Weierstrass theorem continuous (with respect of to a variation
distance of the path) path functionals on compact sets can be uniformly
approximated by a linear function of the time extended signature.
⇒ Universal approximation theorem (UAT).
⇒ This yields a natural split in spirit of reservoir computing into
I the signature of the input signal being the generic reservoir;
I a linear (readout) map.
Josef Teichmann (ETHZ) Randomized signature May 2022 10 / 30
22. Introduction Motivation
Goal of the talk
Prove in the case of randomized recurrent neural networks how
dynamical systems can be approximated with precise convergence
rates.
Take randomized recurrent networks as a role model for randomized
deep networks.
Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
23. Introduction Motivation
Goal of the talk
Prove in the case of randomized recurrent neural networks how
dynamical systems can be approximated with precise convergence
rates.
Take randomized recurrent networks as a role model for randomized
deep networks.
Josef Teichmann (ETHZ) Randomized signature May 2022 11 / 30
24. Randomized signature and reservoir computing Setting
Mathematical Setting
We consider here the simplest setting with smooth input signals (controls)
and output signals taking values in RN
, but it works with more general
drivers (e.g. semimartingale) and by replacing RN
by so-called convenient
vector spaces.
Consider a controlled ordinary differential equation (CODE)
dYt =
d
X
i=0
Vi (Yt)dui
(t) , Y0 = y ∈ RN
(CODE)
for some smooth vector fields Vi : RN
→ RN
, i = 0, . . . , d and d smooth
control curves ui
. Notice again that du0
(t) = dt
We observe the controls u (input) and Y (output), but do not have access
to the vector fields Vi .
The goal is to learn the dynamics and to simulate from it conditional on
(new) controls u, i.e. we aim to learn the map
input control u 7→ solution trajectory Y .
Josef Teichmann (ETHZ) Randomized signature May 2022 12 / 30
25. Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.
Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
26. Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – notations
The signature takes values in the free, nilpotent algebra generated by
d indeterminates e1, . . . , ed given by
T((Rd
)) := {a =
∞
X
k=0
d
X
i1,...,ik =1
ai1...ik
ei1 · · · eik
}.
Sums and products are defined in the natural way.
We consider the complete locally convex topology making all
projections a 7→ ai1...ik
continuous on Ad , hence a convenient vector
space.
Josef Teichmann (ETHZ) Randomized signature May 2022 13 / 30
27. Randomized signature and reservoir computing Preliminaries on signature
Signature in a nutshell – definitions
Signature of u is the unique solution of the following CODE in T((Rd ))
d Sigs,t =
d
X
i=1
Sigs,t ei dui
(t), Sigs,s = 1.
and is apparently given by
Sigs,t(a) = a
∞
X
k=0
d
X
i1,...,ik =0
Z
s≤t1≤···≤tk ≤t
dui1
(t1) · · · duik
(tk) ei1 · · · eik
.
Josef Teichmann (ETHZ) Randomized signature May 2022 14 / 30
28. Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Signature and its connection to reservoir computing
The following “splitting theorem” is the precise link to reservoir computing. We
suppose here that (CODE) admits a unique global solution given by an smooth
evolution operator Evol such that Yt = Evolt(y).
Theorem
Let Evol be a smooth evolution operator on RN
such that (Evolt(y))t satisfies
(CODE). Then for any smooth function g : RN
→ R and for every M ≥ 0 there is
a time-homogenous linear map W depending on (V1, . . . , Vd , g, M, y) from
TM
(Rd
) → R such that
g Evolt(y)
= W πM(Sigt)
+ O tM+1
,
where πM : T((Rd
)) → TM
(Rd
) is the canonical projection.
Remark
For the proof see e.g. Lyons (1998). It can however be proved in much more
generality, e.g. on convenient vector spaces.
Josef Teichmann (ETHZ) Randomized signature May 2022 15 / 30
29. Randomized signature and reservoir computing A “splitting result” in spirit of reservoir computing
Is signature a good reservoir?
This split is not yet fully in spirit of reservoir computing, since unlike
a physical systems where the evaluations are ultrafast, computing
signature up to a high order can take a while, in particular if d is large.
Moreover, regression on signature is the analog on path space of a
polynomial approximation, which can have several disadvantages.
Remedy: information compression by Johnson-Lindenstrauss
projection.
Josef Teichmann (ETHZ) Randomized signature May 2022 16 / 30
30. Randomly projected universal signature dynamics The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss (JL) lemma
We here state the classical version of the Johnson-Lindenstrauss Lemma.
Lemma
For every 0 1, and every set Q consisting of an N points in some Rn
, there
is a linear map f : Rn
→ Rk
with k ≥ 24 log N
32−23 such that
(1 − )kv1 − v2k
2
≤ kf (v1) − f (v2)k
2
≤ (1 + )kv1 − v2k
2
for all v1, v2 ∈ Q, i.e. the geometry of Q is almost preserved after the projection.
The map f is called (JL) map and it can be drawn randomly from a set of
linear projection maps.
Indeed, take a k × n matrix A of with iid standard normal entries. Then
1
√
k
A satisfies the desired requirements with high probability .
We apply this remarkable result to obtain “versions of signature” in lower
dimensional spaces.
Josef Teichmann (ETHZ) Randomized signature May 2022 17 / 30
31. Randomly projected universal signature dynamics Randomized signature
Towards randomized signature
We look for (JL) maps on TM
(Rd
) which preserve its geometry encoded in some
set of (relevant) directions Q. In order to make this program work, we need the
following definition:
Definition
Let Q be any (finite or infinite) set of elements of norm one in TM
(Rd
) with
Q = −Q. For v ∈ TM
(Rd
) we define the function
kvkQ := inf
n X
j
|λj |
32.
33. X
j
λj vj = v and vj ∈ Q
o
.
We use the convention inf ∅ = +∞ since the function is only finite on span(Q).
The function k.kQ behaves precisely like a norm on the span of Q.
Additionally kvkQ1
≥ kvkQ2
for Q1 ⊂ Q2.
Josef Teichmann (ETHZ) Randomized signature May 2022 18 / 30
34. Randomly projected universal signature dynamics Randomized signature
Towards randomized signature – a first estimate
Proposition
Fix M ≥ 1 and 0. Moreover, let Q be any N point set of vectors with norm
one in TM
(Rd
). Then there is linear map f : TM
(Rd
) → Rk
(with k being the
above JL constant with N), such that
38. ≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.
Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
39. Randomly projected universal signature dynamics Randomized signature
Towards randomized signature – a first estimate
Proposition
Fix M ≥ 1 and 0. Moreover, let Q be any N point set of vectors with norm
one in TM
(Rd
). Then there is linear map f : TM
(Rd
) → Rk
(with k being the
above JL constant with N), such that
43. ≤ kv1kQkv2kQ ,
for all v1, v2 ∈ span(Q), where f ∗
: Rk
→ TM
(Rd
) denotes the adjoint map of f
with respect to the standard inner product on Rk
.
By means of this special JL map associated to a point set Q we can now
“project signature” without loosing too much information.
We can then solve the projected and obtain – up to some time – a solution
which is -close to signature.
By a slight abuse of notation we write Sigt for the truncated version
πM Sigt
in TM
(Rd
).
Josef Teichmann (ETHZ) Randomized signature May 2022 19 / 30
44. Randomly projected universal signature dynamics Randomized signature
Randomized signature is as expressive as signature
Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann)
Let u be a smooth control and f a JL map from TM
(Rd
) → Rk
where k is
determined via some fixed and a fixed set Q. We denote by r-Sig the smooth
evolution of the following controlled differential equation on Rk
dXt =
d
X
i=1
1
√
n
f (f ∗
(Xt)ei ) + (1 −
1
√
n
)f (Sigt ei )
dui
(t) , X0 ∈ Rk
,
where n = dim(TM
(Rd
)). Then for each w ∈ TM
(Rd
)
|hw, Sigt −f ∗
(r-Sigt(X0))i|
≤
48. + C
d
X
i=1
Z t
0
k Evol∗
r wkQk Sigr ei kQ dr ,
where Evol denotes here the evolution operator corresponding to
dZt =
Pd
i=1
1
√
n
(f ∗
◦ f )(Ztei )dui
(t) and C = sups≤r≤t, i
55. Randomly projected universal signature dynamics Randomized signature
Proof
As signature satisfies d Sigt =
Pd
i=1 Sigt ei dui
(t) , Sig0 = 1 , we have for the
difference
Sigt −f ∗
(Xt)
= 1 − f ∗
(X0) +
d
X
i=1
Z t
0
Sigr ei −
1
√
n
f ∗
(f (f ∗
(Xr )ei ))
dui
(r)
−
Z t
0
(1 −
1
√
n
)
d
X
i=1
f ∗
(f (Sigr ei )) dui
(r)
= 1 − f ∗
(X0) +
d
X
i=1
Z t
0
1
√
n
(f ∗
◦ f ) Sigr ei − f ∗
(Xr )ei
dui
(r)
+
d
X
i=1
Z t
0
Sigr ei − (f ∗
◦ f ) Sigr ei
dui
(r) .
Josef Teichmann (ETHZ) Randomized signature May 2022 21 / 30
56. Randomly projected universal signature dynamics Randomized signature
Proof and some remarks
This can be solved by variation of constants. Hence for every w ∈ TM
(Rd
)
we get
72. + C
d
X
i=1
Z t
0
k Evol∗
r wkQk Sigr ei kQ dr ,
where the last estimate follows from the above proposition.
Remarks:
I An appropriate choice for Q can be the standard basis TM
(Rd
) and its
negative so that it contains 2n points. In this case k ≥ 24 log(2n)
32−23 .
I In order to guarantee that
76. is indeed small and k
significantly smaller than n, we need to control the Q-norms
independently of n.
Josef Teichmann (ETHZ) Randomized signature May 2022 22 / 30
77. Randomly projected universal signature dynamics Randomized signature
Remarks on the estimate
For appropriate choices of Q, e.g. containing the standard basis of TM
(Rd
),
k Sigr ei kQ can be bounded independently of n.
Hence,
hw, Sigt −f ∗
(r-Sigt(X0))i
becomes small whenever X0 is chosen such that f ∗
(X0) ≈ 1 and when
k Evol∗
t wkQ can be bounded independently of n. The latter holds true if w
is sparsely populated.
Choosing w to be sparse is possible due to the following result:
Let w lie in the closed convex hull of a set K ⊂ Rn
such that K is bounded
by some R 0. Then for every m ≥ 1 and c R2
− kwk2
there is some wm
being a convex combination of m points of K such that
kw − wmk2
≤
c
m
.
(This goes back to Bernard Maurey.)
Josef Teichmann (ETHZ) Randomized signature May 2022 23 / 30
78. Randomly projected universal signature dynamics Randomized signature
Remarks on the estimate – estimating k Evol∗
t wkQ
Consider for simplicity d = 1 so that dim(TM
(R)) = M ≡ n.
Let f = 1
√
k
A where A ∈ Rk×n
with normally distributed entries.
Then for appropriate Q as above
k Evol∗
t wkQ ≈ k exp(t
1
√
n
A
A)k1,1kwk1,
where for B ∈ Rn×n
we have kBk1 = maxj
Pn
i=1 |bij |
Due to the scaling 1
√
n
and the central limit theorem k exp(t 1
√
n
A
A)k1 ≤ c
for some constant c (independent of n).
For sparse w, kwk1 is independent of n. Note that for general w we have
kwk1 ≤
√
nkwk2.
Hence, for sparse w
k Evol∗
t wkQ ≤ C̃
with some constant C̃ independent of n.
⇒ If n is large r-Sig compresses the information of Sig very well.
Josef Teichmann (ETHZ) Randomized signature May 2022 24 / 30
79. Randomly projected universal signature dynamics Randomized signature
r-Sig as random dynamical system
We can actually calculate approximately the vector fields which determine the
dynamics of r-Sig by generic random elements.
Theorem (Cuchiero, Gonon, Grigoryeva, Ortega, Teichmann)
For M → ∞ (and thus n → ∞) the entries of the matrix representation of the
linear maps
y 7→
1
√
n
f (f ∗
(y)ei )
for i = 1, . . . , d, are asymptotically normally distributed with independent entries.
The time dependent bias terms
(1 −
1
√
n
)f (Sigt ei )
are as well asymptotically normally distributed with independent entries.
Josef Teichmann (ETHZ) Randomized signature May 2022 25 / 30
80. Randomly projected universal signature dynamics Randomized signature
Randomized signature as reservoir
Practical implementation of randomized signature
Given a set of hyper-parameters θ ∈ Θ, and a dimension k, choose randomly
(often just by independently sampling from a normal distribution) matrices
M1, . . . , Md ∈ Rk×k
as well as (bias) vectors b1, . . . , bd .
Then one can tune the hyper-parameters and the dimension k such that
dXt =
d
X
i=1
(Mi Xt + bi )dui
(t), X0 = x
approximates the CODE Y locally in time via a linear readout W up to
arbitrary precision.
The process X will serve as reservoir. Note that again it does not depend on
the specific dynamics of Y which should be learned.
Josef Teichmann (ETHZ) Randomized signature May 2022 26 / 30
81. Applications for market generation
Deep Simulation - Summary
Split the map in two parts:
Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
82. Applications for market generation
Deep Simulation - Summary
Split the map in two parts:
I a universal reservoir X, no
dependence on the specific
dynamics
I linear readout W that needs to be
trained such that Y ≈ WX
B
F //
R
Y
X
W
??
Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
83. Applications for market generation
Deep Simulation - Summary
Split the map in two parts:
I a universal reservoir X, no
dependence on the specific
dynamics
I linear readout W that needs to be
trained such that Y ≈ WX
What should we take as reservoir?
B
F //
R
Y
X
W
??
Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
84. Applications for market generation
Deep Simulation - Summary
Split the map in two parts:
I a universal reservoir X, no
dependence on the specific
dynamics
I linear readout W that needs to be
trained such that Y ≈ WX
What should we take as reservoir?
I Signature process of Brownian
motion works, but
computationally expensive
B
F //
R
Y
X
W
??
X Z
0t1···tk t
◦dBi1
(t1) · · · ◦ dBin
(tk )e
Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
85. Applications for market generation
Deep Simulation - Summary
Split the map in two parts:
I a universal reservoir X, no
dependence on the specific
dynamics
I linear readout W that needs to be
trained such that Y ≈ WX
What should we take as reservoir?
I Signature process of Brownian
motion works, but
computationally expensive
I Random projections (implicit
inclusion of high order signature
terms)
B
F //
R
Y
X
W
??
X Z
0t1···tk t
◦dBi1
(t1) · · · ◦ dBin
(tk )e
Randomized signature
dXt =
Pd
i=1(Mi Xt + bi ) ◦ dBi
t,
X0 ∈ Rk
, Mi , bi randomly chosen
Josef Teichmann (ETHZ) Randomized signature May 2022 27 / 30
86. Applications for market generation
Example - SABR model
Let us consider as example the SABR stochastic volatility model.
The process Y consists of two components (Y 1, Y 2).
Y 1 corresponds to the price process and Y 2 to the stochastic
volatility process:
dY 1
t = Y 1
t Y 2
t (ρdB1
t +
q
1 − ρ2dB2
t )
dY 2
t = αY 2
t dB2
t ,
where B1 and B2 are two independent Brownian motions and α 6= 0
and ρ ∈ [−1, 1].
Given a trajectory of 1000 time points we now learn the map
(B1
t∈[0,1000], B2
t∈[0,1000]) 7→ (Y 1
t∈[0,1000], Y 2
t∈[0,1000])
Josef Teichmann (ETHZ) Randomized signature May 2022 28 / 30
87. Applications for market generation
Training and prediction results
Is it possible to predict the future evolution of the market environment given new
input Brownian motions?
Josef Teichmann (ETHZ) Randomized signature May 2022 29 / 30
88. Applications for market generation
Training and prediction results
Is it possible to predict the future evolution of the market environment given new
input Brownian motions?
Training is done on the first 1000 time points
Prediction works for 3000 time points further. The first graph is Y 1
and the
second Y 2
, each time the predicted (blue) versus the true one (green).
Josef Teichmann (ETHZ) Randomized signature May 2022 29 / 30
89. Applications for market generation
Training and prediction results
Is it possible to predict the future evolution of the market environment given new
input Brownian motions?
Training is done on the first 1000 time points
Prediction works for 3000 time points further. The first graph is Y 1
and the
second Y 2
, each time the predicted (blue) versus the true one (green).
In practice, the past market Brownian motions have to be extracted for
learning, prediction is done by generating new ones.
Josef Teichmann (ETHZ) Randomized signature May 2022 29 / 30
90. Conclusion
Conclusion
We show that the time evolution of controlled differential equations
can be arbitrarily well approximated by regressions on a certain
randomly chosen dynamical system of moderately high dimension,
called randomized signature (randomized rNNs).
This is motivated by paradigms of reservoir computing and widely
applied signature methods from rough paths theory.
We apply the method for market generation/simulation in
finance/randomized Longstaff Schwartz/provable machine learning.
Josef Teichmann (ETHZ) Randomized signature May 2022 30 / 30