### Lifelong Learning:
Theory and Practice and Coresets PI: Joshua T. Vogelstein, [JHU](https://www.jhu.edu/)
Co-PI: Vova Braverman, [JHU](https://www.jhu.edu/)
Jayanta Dey, Will LeVine, Hayden Helm, Ali Geisa, Ronak Mehta, Carey E. Priebe ![:scale 30%](images/neurodata_blue.png) --- Biological agents progressively build representations to transfer both forward & backward ![:scale 90%](images/learning_schema_darpa.png) 1. We learn representations independently for each task, and ensemble the representations both past and future tasks 3. We illustrate it achieves SOTA forward, and unique backward transfer, and uniquely monotonically transfer on CIFAR 10x10 4. It can be applied to any sequential task-aware classification task --- ### Potential Collaborations ### [http://proglearn.neurodata.io/](http://proglearn.neurodata.io/) ![:scale 90%](images/proglearn_webpage.png) --- ### Key Innovations 1. Modernized statistical decision theory to explicitly incorporate a .ye[learning task] 2. Introduced .ye[learning efficiency] 3. Formalized and unified .ye[learning metrics] 4. Identified .ye[partition & vote] equivalence of Deep Networks and Decision Forests 5. Developed .ye[Progressive Learning Forests and Networks] 6. Illustrated the value of .ye[coresets] for lifelong learning (vova) --- ### Learning Task Definition | Component | Notation | Examples | | :--- | :--- | :--- | Query Space | $\mathcal{Q}$ | is this a cat? | Action Space | $\mathcal{A}$ | A,B, ←, →, ↑, ↓ | Measurement Space | $\mathcal{Z}$ | 8-bit images, 256 x 256 | Statistical Model | $\mathcal{P}$ | Gaussian | Hypotheses | $\mathcal{H}$ | linear classifiers | Risk | $R$ | expected loss | Algorithm Space | $\mathcal{F}$ | Random Forests | True & Unknown Distribution | $P$ | $\mu=0$, $\sigma=1$ --- ### What is learning? .ye[$f$] learns from .ye[data] $\mathbf{Z}_n$ with respect to .ye[task] $t$ when its .ye[performance] at $t$ improves due to $\mathbf{Z}_n$. - Define .ye[generalization error] $\mathcal{E}_n(f) := \mathbb{E}_P[R(f(\bold{Z}_n))]$ - $\bold{Z}_0$ corresponds to no data. - Define .ye[learning efficiency]: $$LE_n(f) := \frac{\mathbb{E}_P[R(f(\bold{Z}_0))]}{\mathbb{E}_P[R(f(\bold{Z}_n))]} = \frac{\mathcal{E}_0(f)}{\mathcal{E}_n(f)}$$
$f$ learns from $\mathbf{Z}_n$ with respect to task $t$ when $LE_n(f) > 1$. --- ### What is forward learning? - Let $n\_t$ be the last occurence of task $t$ in $\mathbf{Z}\_n$ - Let $\mathbf{Z}\_n^{< t} = \lbrace Z\_1, Z\_2, \ldots, Z\_{n_t} \rbrace$ .ye[Forward] learning efficiency is the improvement on task $t$ resulting from all data .ye[preceding] task $t$ $$FL^t\_{\mathbf{n}}(f) := \frac{\mathbb{E}[R^t(f(\mathbf{Z}^{t}\_n))]}{\mathbb{E}[R^t(f(\mathbf{Z}^{< t}\_n))]} =\frac{\mathcal{E}\_{Z\_n^t}(f)}{\mathcal{E}\_{Z\_n^{< t}}(f)}.$$
$f$ .ye[forward learns] if $FL_{\mathbf{n}}(f) > 1$. --- ### What is backward learning? .ye[Backward] learning efficiency is the improvement on task $t$ resulting from all data .ye[after] task $t$ $$ BL^t\_{\mathbf{n}}(f) := \frac{\mathbb{E}[R^t(f(\mathbf{Z}^{< t}\_n))]}{\mathbb{E}[R^t(f(\mathbf{Z}\_n))]} =\frac{\mathcal{E}\_{Z\_n^{< t}}(f)}{\mathcal{E}\_{Z\_n}(f)}. $$
$f$ .ye[backward learns] if $BL_{\mathbf{n}}(f) > 1$. --- ### Unification of Learning Metrics Each of the previous definitions are all special cases of $LE^t(\mathbf{Z}\_A, \mathbf{Z}\_B; f)$, for specific choices of $\mathbf{Z}\_A$ and $\mathbf{Z}\_B$ - Learning: $\mathbf{Z}\_A=\mathbf{Z}\_0$ and $\mathbf{Z}\_B=\mathbf{Z}\_n$. - Transfer learning: $\mathbf{Z}\_A=\mathbf{Z}\_n^t$ and $\mathbf{Z}\_B=\mathbf{Z}\_n$. - Multitask learning: for each $t$, $\mathbf{Z}\_A=\mathbf{Z}\_n^t$ and $\mathbf{Z}\_B=\mathbf{Z}\_n$. - Forward learning: $\mathbf{Z}\_A=\mathbf{Z}\_n^t$ and $\mathbf{Z}\_B=\mathbf{Z}\_n^{< t}$. - Backward learning: $\mathbf{Z}\_A=\mathbf{Z}\_n^{< t}$ and $\mathbf{Z}\_B=\mathbf{Z}\_n$. Conjecture: All learning metrics we care about are functions of learning efficiency for a specific $\mathbf{Z}\_A$ and $\mathbf{Z}\_B$. --- ### Deep Nets and Decision Forests ![:scale 100%](images/PolytopesFig.svg) - Both learn convex polytope partitions of feature space, with affine activation functions - We can easily swap between the two empirically and theoretically --- #### Progressive Learning Forests and Networks ![:scale 70%](images/learning-schemas-simple.svg) 1. Representers can be forest, networks, etc. 1. Seperate representers learned for each task 2. Voters leverage both past and future representers --- ### Key Strength ![:scale 65%](images/L2N-CIFAR-benchmarks.svg) - SOTA forward transfer - Unique backward transfer using 500 training samples (SOTA) - Monotonically increasing backward transfer (SOTA) --- ### Key Limitations 1. Only works for task-aware (not task-unaware) 1. Only works for classification (not regression) 2. Only batched data (not streaming, not RL) 3. Theorems not yet with dotted i's and crossed t's --- ![:scale 100%](images/vova_slide1.jpg) --- ![:scale 100%](images/vova_slide2.png) --- ### Acknowledgements
yummy
lion
baby girl
family
earth
milkyway
##### JHU
Carey Priebe
Meghana Madhya
Ronak Mehta
Jayanta Dey
Will LeVine
Hayden Helm
Richard Gou
Ali Geisa
##### Microsoft Research
Chris White
Weiwei Yang
Jonathan Larson
Bryan Tower
##### DARPA L2M: All code open source and reproducible from [proglearn.neurodata.io/](http://proglearn.neurodata.io/) {[BME](https://www.bme.jhu.edu/),[CIS](http://cis.jhu.edu/), [ICM](https://icm.jhu.edu/), [KNDI](http://kavlijhu.org/)}@[JHU](https://www.jhu.edu/) | [neurodata](https://neurodata.io)
[jovo@jhu.edu](mailto:j1c@jhu.edu) |
| [@neuro_data](https://twitter.com/neuro_data) --- background-image: url(images/l_and_v.jpeg) .footnote[Questions?]