### Taxonomy of Learning Paradigms
--- ### New Measures for Learning Efficacy Learning Efficiency - $\mathcal{E}_f(\mathbf{S})$ is error of hypothesis outputted by $f$ trained on dataset $\mathbf{S}$. $$ LE_f(\mathbf{S}^A, \mathbf{S}^B) = \frac{\mathcal{E}_f(\mathbf{S}^A)}{\mathcal{E}_f(\mathbf{S}^B)} $$ Weak OOD Learning - $\mathbf{S}_{m, n}$ amalgamated dataset. $m$ out-of-task data points, and $n$ target task data points. - $ f(\mathbf{S}\_{m, n}) = \hat{h}_{m, n} $ is hypothesis obtained from amalgamated dataset. - $ f(\mathbf{S}\_{n}) = \hat{h}_{n}$ is hypothesis obtained from just target data. $$ P\_{\mathbf{S}\_{m, n}}[R\_{X, Y}(\hat{h}\_{m, n}) < R\_{XY}(\hat{h}\_{n}) - \varepsilon ] \geq 1 - \delta $$ --- ### New Measures for Learning Efficacy We weakly OOD learn if above true for all $\delta > 0$, $m \geq M$, $n \geq N$, all distributions $P_{\mathbf{S}_n, X, Y}$, for some $\varepsilon > 0$ and algorithm $f$. Strong OOD Learning - $R^*$ is Bayes optimal risk. $$ P\_{\mathbf{S}\_{m, n}}[R\_{X, Y}(\hat{h}\_{m, n}) < R^* + \varepsilon ] \geq 1 - \delta $$ We strongly OOD learn if above true for all $\varepsilon, \delta > 0$, $m \geq M$, $n \geq N$, all distributions $P_{\mathbf{S}_n, X, Y}$, for some algorithm $f$. --- ### Theoretical Results - Weak OOD learning does not imply strong OOD learning. - Weak OOD learning implies transfer learning (learning efficiency > $1$, meaning we do better with out-of-task data).