Learning

<img src=
  "images/jhu.png" 
    alt="jhu logo" 
    align="right"
    width = "140"
    height= "65">
</div>
<br>
###Lifelong Learning: Theory and Practice

PI: Joshua T. Vogelstein, [JHU](https://www.jhu.edu/)<br>
Jayanta Dey, Ali Geisa, Hayden Helm, Ronak Mehta, Will LeVine,
Carey E. Priebe <br>
Co-PI: Vova Braverman, [JHU](https://www.jhu.edu/) <br>
Haoran Li, Aditya Krishnan, Jingfeng Wu <br>
SGs: SRI, ARGONE, HRL

<center>
![:scale 35%](images/neurodata_blue.png)
</center>
---
### Summary
.ye[Research Question:] Why is LL difficult, and how can we design algorithms/datasets to solve it?

.ye[Approach:]
- Introduced out-of-distribution (OOD) learning theory framework for theoretical analysis of lifelong learning
- Introduced ensembling representations

.ye[Accomplishments:]
- Proved various OOD weak learner theorems
- Achieved consistent positive forward and backward transfer (synergistic learning) in practice

.ye[Key Take-Away:] LL is fundamentally harder than classical ML, and ensembling representations can synergistically learn

---
### Result 1: OOD Learning Theory

We uncouple the evaluation distribution from training data distributions

![:scale 100%](images/learning-schematics.png)

---
### Putting LL within OOD Framework

![:scale 100%](images/learning-table.png)

---
### Defining/Quantifying Learning & Forgetting

![:scale 100%](images/learning-efficiency.png)

Using non-task data to improve performance over what it could achieve using only task data

- Learning: $\mathbf{S}^A=\mathbf{S}\_0$ and $\mathbf{S}^B=\mathbf{S}\_n$.
- Transfer learning: $\mathbf{S}^A=\mathbf{S}^1$ and $\mathbf{S}^B=\mathbf{S}\_n$.
- Multitask learning: for each $t$, $\mathbf{S}^A=\mathbf{S}^t$ and $\mathbf{S}^B=\mathbf{S}\_n$.
- Forward learning: $\mathbf{S}^A=\mathbf{S}^t$ and $\mathbf{S}^B=\mathbf{S}^{< t}$.
- Backward learning: $\mathbf{S}^A=\mathbf{S}^{< t}$ and $\mathbf{S}^B=\mathbf{S}\_n$.

---
### Result 2:  Proving novel properties of OOD learning

<!-- ![:scale 100%](images/weak-ood-learnability.png)

basically, using non-task data to improve performance at all

![:scale 100%](images/strong-ood-learnability.png)

basically, using non-task data to perform arbitrarily well -->

Classical theory:
- Weak learning: can do better than chance on some task with sufficient data 
- Strong learning: can do arbitrarily close to optimal on some task with sufficient data
- Weak Learner Theorem: if a problem is weakly learnable, it is also strongly learnable

OOD learning theory
- Training distribution is uncoupled from evaluation distribution

---
### More data is inadequate for LL

Theorem 1: With *only* out-of-distribution data, there exists some problems that are weakly, but not strongly, learnable.

- This implies that OOD learning is different *in kind* from in-distribution learning.
- Lifelong learning is a special case of OOD learning 
- Getting .ye[more] data is *not* guaranteed to improve performance arbitrarily in LL, we need .ye[better] data

---
### Learning efficiency is a fundamental notion of learning

Theorem 2: Weak OOD learnability implies transfer learnability (i.e., learning efficiency > 1).  That is, if one can weakly learn, one can also transfer learn, but not necessarily vice versa.

- This implies that transfer learnability is a fundamental property of learning problems
- In other words, inability to transfer is equivalent to inability to learn at all.

---
### Result 3: Ensembling representations achieves synergistic learning

![:scale 100%](images/learning_schema_new.png)

---
### Omnidirectional Algorithms Show Forward Transfer

CIFAR 10x10

![:scale 100%](images/cifar_exp_fte.png)

---
### Omnidirectional Algorithms Uniquely Show Backward Transfer for Each Task
![:scale 100%](images/cifar_exp_bte.png)

---
### Future Directions/ Transitions

- omnidirctional algorithm code continues to improve [http://proglearn.neurodata.io/](http://proglearn.neurodata.io/)
- streaming forest for streaming lifelong learning setup [https://sdtf.neurodata.io](https://sdtf.neurodata.io)
![:scale 80%](images/streaming_forest.png)

---
### Kernel Density Networks/Forests generate well calibrated posteriors 
- [https://github.com/neurodata/kdg](https://github.com/neurodata/kdg)
-  KDG on Guassian XOR simulation data

![:scale 100%](images/kdn_kdf.png)
<br>
---
### Deep Networks are the worst model of the mind

<img src=
  "images/nn_rf_jong.gif" 
    alt="jong" 
    width = "700"
    height= "250">  
---
<br>
<br>
<br>
<img src=
  "images/vova1.png" 
    alt="vova 1" 
    width = "800"
    height= "400">
---
<br>
<br>
<br>
<img src=
  "images/vova2.png" 
    alt="vova 2" 
    width = "800"
    height= "400">
---
### Acknowledgements

<!-- <div class="small-container">
  <img src="faces/ebridge.jpg"/>
  <div class="centered">Eric Bridgeford</div>
</div>

<div class="small-container">
  <img src="faces/pedigo.jpg"/>
  <div class="centered">Ben Pedigo</div>
</div>

<div class="small-container">
  <img src="faces/jaewon.jpg"/>
  <div class="centered">Jaewon Chung</div>
</div> -->

<div class="small-container">
  <img src="faces/yummy.jpg"/>
  <div class="centered">yummy</div>
</div>

<div class="small-container">
  <img src="faces/family.jpg"/>
  <div class="centered">family</div>
</div>

<div class="small-container">
  <img src="faces/earth.jpg"/>
  <div class="centered">earth</div>
</div>

<div class="small-container">
  <img src="faces/milkyway.jpg"/>
  <div class="centered">milkyway</div>
</div>

##### JHU

<div class="small-container">
  <img src="faces/cep.png"/>
  <div class="centered">Carey Priebe</div>
</div>

<!-- <div class="small-container">
  <img src="faces/bruce_rosen.jpg"/>
  <div class="centered">Bruce Rosen</div>
</div>

<div class="small-container">
  <img src="faces/kent.jpg"/>
  <div class="centered">Kent Kiehl</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/mim.jpg"/>
  <div class="centered">Michael Miller</div>
</div>

<div class="small-container">
  <img src="faces/dtward.jpg"/>
  <div class="centered">Daniel Tward</div>
</div> -->

<!-- <div class="small-container">
  <img src="faces/vikram.jpg"/>
  <div class="centered">Vikram Chandrashekhar</div>
</div>

<div class="small-container">
  <img src="faces/drishti.jpg"/>
  <div class="centered">Drishti Mannan</div>
</div> -->

<div class="small-container">
  <img src="faces/jesse.jpg"/>
  <div class="centered">Jesse Patsolic</div>
</div>

<div class="small-container">
  <img src="faces/meghana.png"/>
  <div class="centered">Meghana Madhya</div>
</div>

<div class="small-container">
  <img src="faces/hayden.png"/>
  <div class="centered">Hayden Helm</div>
</div>

<div class="small-container">
  <img src="faces/rguo.jpg"/>
  <div class="centered">Richard Gou</div>
</div>

<div class="small-container">
  <img src="faces/ronak.jpg"/>
  <div class="centered">Ronak Mehta</div>
</div>

<div class="small-container">
  <img src="faces/jayanta.jpg"/>
  <div class="centered">Jayanta Dey</div>
</div>

<div class="small-container">
  <img src="faces/will.jpg"/>
  <div class="centered">Will LeVine</div>
</div>

##### Microsoft Research

<div class="small-container">
  <img src="faces/chwh-180x180.jpg"/>
  <div class="centered">Chris White</div>
</div>

<div class="small-container">
  <img src="faces/weiwei.jpg"/>
  <div class="centered">Weiwei Yang</div>
</div>

<div class="small-container">
  <img src="faces/jolarso150px.png"/>
  <div class="centered">Jonathan Larson</div>
</div>

<div class="small-container">
  <img src="faces/brtower-180x180.jpg"/>
  <div class="centered">Bryan Tower</div>
</div>

##### DARPA L2M

{[BME](https://www.bme.jhu.edu/),[CIS](http://cis.jhu.edu/), [ICM](https://icm.jhu.edu/), [KNDI](http://kavlijhu.org/)}@[JHU](https://www.jhu.edu/) | [neurodata](https://neurodata.io)
<br>
[jovo@jhu.edu](mailto:j1c@jhu.edu) | <http://neurodata.io/talks> | [@neuro_data](https://twitter.com/neuro_data)

</div>

---
background-image: url(images/l_and_v.jpeg)

.footnote[Questions?]

---
class: middle 
# .center[Appendix]

---

.small[
### Publications

1. A. Geisa et al. [Towards a theory of out-of-distribution learning](https://arxiv.org/abs/2109.14501), arXiv, 2021. 
1. J. T. Vogelstein et al. [Omnidirectional Transfer for Quasilinear Lifelong Learning](https://arxiv.org/abs/2004.12908), arXiv, 2021.
1. Xu, Haoyin, et al. [Streaming Decision Trees and Forests](https://arxiv.org/abs/2110.08483), arXiv, 2021.
1. C. E. Priebe et al. [Modern Machine Learning: Partition and Vote](https://doi.org/10.1101/2020.04.29.068460), 2020. 
1. R Guo, et al. [Estimating Information-Theoretic Quantities with Uncertainty Forests](https://arxiv.org/abs/1907.00325). arXiv, 2019.
1. R. Perry, et al. [Manifold Forests: Closing the Gap on Neural Networks](https://openreview.net/forum?id=B1xewR4KvH). arXiv, 2019.
1. C. Shen and J. T. Vogelstein. [Decision Forests Induce Characteristic Kernels](https://arxiv.org/abs/1812.00029). arXiv, 2019.
1. M. Madhya, et al. [Geodesic Learning via Unsupervised Decision Forests](https://arxiv.org/abs/1907.02844). arXiv, 2019.
1. M. Madhya, et al. [PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment](https://arxiv.org/abs/2011.05383). arXiv, 2020.

### Conferences 
1. J.T. Vogelstein et al. A biological implementation of lifelong learning in the pursuit of artificial general intelligence.  NAISys, 2020.
2. B. Pedigo et al.  A quantitative comparison of a complete connectome to artificial intelligence architectures. NAISys, 2020.
]

---
### Biological learning is on top

![:scale 100%](images/learning-table.png)
---
### Omnidirectional Algorithms can Transfer Between XOR and XNOR

![:scale 100%](images/xor_xnor_exp.png)
---
### Spoken Digit dataset
.pull-left[
- *Spoken Digit* contains recording from 6 different speakers. 
- Each digit has 50 recordings (3000 total recordings).
- For each recording spectrogram was extracted using using Hanning windows of duration 16 ms with an overlap of 4 ms.
-  The spectrograms were resized down to 28×28. 
]
.pull-right[
<img src="images/spectrogram.png" style="position:absolute; left:500px; width:400px;"/>
]
---
### Omnidirectional Algorithms on Spoken Digit Task

![:scale 105%](images/spoken_digit.png)