Ali | Ronak | Jovo | CEP | Hayden | Jayanta
EPX,Y[(h(X)−Y)2]
RPX,Y(h)
EPX,Y,S[R(h^)]
The learner takes in the data set S to output an estimated hypothesis. f(S)=h^
The performance then is really some function of EPX,Y,S[R(f(S))]
min EPX,Y,S[R(f(S))]
such that f∈F
LEft(S0,S)=Eft(S)Eft(S0)
With continual learning, we want to incorporate the time-dependent, streaming nature of the problem in our performance metrics. Given a task t, let S<t be the set of data points up to and including the last data point from task t.
We define forward transfer to be LEft(St,S<t)=Eft(S<t)Eft(St)
And we define backward transfer to be LEft(S<t,S)=Eft(S)Eft(S<t)
Where S is all of the data and St is the task t data.
Assume we have some model of distributions P and source distribution P. Loosely, we say P is weakly OOD learnable with target data of size n if given enough source data, we can perform better than base performance (i.e. with just target data) with arbitrarily high probability.
Assume we have some model of distributions P and source distribution P. We say P is strongly OOD learnable with target data of size n if given enough source data, we can perform arbitrarily well (i.e. arbitrarily close to the Bayes risk) with arbitrarily high probability.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |