.footnote[[jovo@jhu.edu](mailto:jovo@jhu.edu) | | [@neuro_data](https://twitter.com/neuro_data)]
---
#### Motivation
--
- "Understand" the relationship between physical (brain) and mental properties
--
- Question 1: are the two related at all?
--
- Question 2: how are they related?
--
- Note: many efforts dedicated to Q1, fewer to Q2.
- We will formally address Q1, and address Q2
---
class: left
#### One often desires to test for independence
| \\(X\\) | \\(Y\\) |
| :---: | :---: |
| clouds | grass wetness |
| brain connectivity | creativity |
| brain shape | health |
| CLARITY | condition |
| gene expression | cancer |
--
| anything | anything else |
---
#### Statistics Background
- $X$ is a random variable (some measurement).
--
- $F_X$ is the distribution of $X$.
--
- This means $F_X(a) = P(X \leq a)$.
- This is denoted:
$$X \sim F_X$$
---
#### Statistics Background
- For two random variables $X$ and $Y$, $F_{XY}$ is called the joint distribution.
--
- This means that $F_{XY} (a, b) = P(X \leq a\ \mathrm{and}\ Y \leq b)$,
or,
$$(X, Y) \sim F_{XY}$$
---
#### Independence
$X$ and $Y$ are independent if neither contains information about the other.
In other words,
$$F_{XY} = P(X \leq a\ \mathrm{and}\ Y \leq b) = P(X \leq a) \times P(Y \leq b) = F_X F_Y$$
$$ F_{XY} = F_X F_Y$$
Note: These ideas and notation generalize for multivariate $X$ and $Y$.
---
#### Informal Definition of Hypothesis Testing
- **Null Hypothesis**: The conventional belief about a phenomenon of interest, written $H_0$.
- **Alternative Hypothesis**: An alternate belief about the same phenomenon, written $H_A$.
- **p-value**: The probability (under the null) of measurements more extreme than what was observed.
---
#### Formal Definition of Independence Testing
$$(X\_i,Y\_i) \sim F\_{XY} = F\_{X|Y} F\_Y, \quad i \in \{1,\ldots,n\}$$
$$H\_0: F\_{XY} = F\_X F\_Y $$
$$H\_A: F\_{XY} \neq F\_X F\_Y $$
---
class: top, left
## Outline
- intuition
- simulations
- real data
- extensions
- theory
- discussion
---
class: middle
# .center[intuition]
---
class: top, left
#### Intuitive Desiderata of Testing Procedure
1. Performant under *any* joint distribution
- low- and high-dimensional
- Euclidean and structured data (eg, sequences, images, networks, shapes)
- linear and nonlinear relationships
6. Reveals the "geometry" of dependence
5. Is computational efficiency
Provides a tractable algorithm that addresses the two motivating questions:
- Question 1: are the two related at all?
- Question 2: how are they related?
---
class: center
### correlation coefficient
$$r\_{XY}^2 = \frac{(\sum\_{i=1}^n ( x\_i - \bar{x}) (y\_i - \bar{y}))^2}{\sum\_{i=1}^n (x\_i - \bar{x})^2 \sum\_{i=1}^n (y\_i- \bar{y})^2} $$
---
class: center
### **mantel** correlation coefficient
$$r\_{XY}^2 = \frac{(\sum\_{i=1}^n ( x\_i - \bar{x}) (y\_i - \bar{y}))^2}{\sum\_{i=1}^n (x\_i - \bar{x})^2 \sum\_{i=1}^n (y\_i- \bar{y})^2} $$
$$d\_{XY}^2 = \frac{(\sum\_{i,\color{red}{j}=1}^n ( x\_i - \color{red}{x\_j}) (y\_i - \color{red}{y\_j}))^2}{\sum\_{i,\color{red}{j}=1}^n (x\_i - \color{red}{x\_j})^2 \sum\_{i,\color{red}{j}=1}^n (y\_i- \color{red}{y\_j})^2} $$
---
class: center
### **generalized** correlation coefficient
$$r\_{XY}^2 = \frac{(\sum\_{i=1}^n ( x\_i - \bar{x}) (y\_i - \bar{y}))^2}{\sum\_{i=1}^n (x\_i - \bar{x})^2 \sum\_{i=1}^n (y\_i- \bar{y})^2} $$
$$d\_{XY}^2 = \frac{(\sum\_{i,\color{red}{j}=1}^n ( x\_i - \color{red}{x\_j}) (y\_i - \color{red}{y\_j}))^2}{\sum\_{i,\color{red}{j}=1}^n (x\_i - \color{red}{x\_j})^2 \sum\_{i,\color{red}{j}=1}^n (y\_i- \color{red}{y\_j})^2} $$
$$c\_{XY}^2 = \frac{(\sum\_{i,j=1}^n \color{red}{\sigma\_x}(x\_i,x\_j) \color{red}{\sigma\_y}(y\_i,y\_j))^2}{\sum\_{i,j=1}^n \color{red}{\sigma\_x}(x\_i,x\_j)^2 \sum\_{i,j=1}^n \color{red}{\sigma\_y}(y\_i,y\_j)^2} $$
---
### local distance correlation
--
--
dcorr(X,Y)=0.15, p-val < 0.001
MGC(X,Y)=0.15, p-val < 0.001
--
--
--
dcorr(X,Y)=0.01, p-val 0.3
MGC(X,Y)= .r[0.13], p-val < .r[0.001]
---
### multiscale distance correlation
- compute local dcorr **at all scales**
- find scale with **max** smoothed test statistic
- permutation test to determine p-value
---
class: top, center
### Multiscale Generalized Correlation (MGC)
--
--
---
class: top, left
#### Intuitive Desiderata of Testing Procedure
1. Performant under *any* joint distribution
- low- and high-dimensional
- Euclidean and structured data (eg, sequences, images, networks, shapes)
- linear and nonlinear relationships
6. Reveals the "geometry" of dependence
5. Is computational efficiency
Provides a tractable algorithm that addresses the two motivating questions:
- Question 1: are the two related at all?
- Question 2: how are they related?
---
class: middle, center
# .center[simulations]
---
class: top, left
### 20 Different Functions (1D version)
---
class: top, left
### Definitions
- **power** is the probability of rejecting the null when the alternative is true
- $\beta_n(t)$ = power of test statistic $t$ given $n$ samples
- **relative power** power of one approach minus power of another
- $\beta_n(mgc)- \beta_n(t)$
---
class: top, left
### 1D Relative Power (MGC nearly dominates)
---
class: top, left
### HD Relative Power (MGC nearly dominates)
---
class: top, left
### MGC Requires 1/2 Samples to Achieve Power
---
class: top, left
#### MGC Reveals Geometry of Dependence
---
class: middle, center
# .center[real data]
---
class: top, left
### Real Data Desiderata
1. when we believe dependence, MGC obtains a small p-value
2. MGC provides insight into the geometry of real dependence
3. when there is no dependence, MGC correcly controls FPR
---
class: top, left
### MGC Discovers Relationships between Brain & Mental Properties
---
class: top, left
### MGC Discovers Pancreatic Cancer Biomarkers
---
### MGC Does Not Make Too Many False Discoveries
---
class: middle
# .center[extensions]
---
#### Categorical Features
- All testing depends on measuring similarity/distance between observations
- Let $Y=$ {normal, condition 1, condition 2}
- $d(y_i,y_j) = 0$ if $y_i = y_j$
- $d(y_i,y_j) = 1$ if $y_i \neq y_j$
- Then everything just works :)
#### Mixed Data
- Define similarity/distance appropriately for each feature, then combine.
- In theory, the normalization procedure will work appropriately, though not yet tested.
#### Missing Data & Outliers
- *Missing* Does not yet work, though see RerF talk later today for ideas
- *Outliers* Natively handled, probably best to include them, otherwise, the statistics get (more) complicated. But haven't tested carefully.
---
### Computational Considerations
- n is # of samples
- T is # of threads
| Method | Complexity |
| :---: | :---: |
| Dcorr | n2 /T |
| HHG | n2 log n / T |
| MGC | n2 log n / T |
- Python, MATLAB and R code at [http://neurodata.io/mgc](http://neurodata.io/mgc)
- Note: none of the implementations are parallelized
---
### Fast MGC
---
### K-Sample Tests are Independence Tests
Recall this definition of independence testing
$$(X\_i,Y\_i) \sim F\_{XY} = F\_{X|Y} F\_Y, \quad i \in \{1,\ldots,n\}$$
$$H\_0: F\_{XY} = F\_X F\_Y $$
$$H\_A: F\_{XY} \neq F\_X F\_Y $$
--
Here is the formal definition of 2-sample testing
$$U\_i \sim F_U, \, i \in \{1,\ldots,n\} \quad V\_i \sim F\_V, \, i \in \{1,\ldots,m\}$$
$$H\_0: F\_{U} = F\_V $$
$$H\_A: F\_{U} \neq F\_V $$
--
Let $X = [U\_1, \ldots U\_n | V\_1,\ldots, V\_m]$ and $Y=[0 \ldots 0 | 1 \cdots 1 ]$.
Now, testing whether $X$ and $Y$ are independent is equivalent to testing whether $F_U$ and $F_V$ are the same.
---
class: top, left
### 20 Different 2-Sample Functions (2D version)
---
class: top, left
### Relative Power vs Sample Size for 2D Rotations
---
class: top, left
### Relative Power vs Angle for 2D Rotations
---
class: top, left
### Relative Power vs Dimension for Rotations
---
### Graph Equality Testing
For example, are the left and right hemispheres of the larval Drosophila Mushroom Body sampled from the same distribution?
$G=(V_1,E_1) \sim F_X, \quad H = (V_2, E_2) \sim F_Y$
$H\_0: F\_{X} = F\_y , \quad H\_A: F\_{X} \neq F\_Y $
---
### Graph Equality Testing
For example, are the left and right hemispheres of the larval Drosophila Mushroom Body sampled from the same distribution?
$G=(V_1,E_1) \sim F_X, \quad H = (V_2, E_2) \sim F_Y$
$H\_0: F\_{X} = F\_Y , \quad H\_A: F\_{X} \neq F\_Y $
---
### Graph Independence Testing
For example, in the Hermaphrodite *C. elegans*, is the chemical synapse connectome independent of the gap junction connectome?
$G=(V,E_1, Z_v) \sim F_U, \quad H = (V, E_2, Z_v) \sim F_V$
$H\_0: F\_{XY|Z} = F\_{X|Z} F\_{Y|Z} , \quad H\_A: F\_{XY|Z} \neq F\_{X|Z} F\_{Y|Z} $
---
### Graph Independence Testing
For example, in the Hermaphrodite *C. elegans*, is the chemical synapse connectome independent of the gap junction connectome?
$G=(V,E_1, Z_v) \sim F_U, \quad H = (V, E_2, Z_v) \sim F_V$
$H\_0: F\_{XY|Z} = F\_{X|Z} F\_{Y|Z} , \quad H\_A: F\_{XY|Z} \neq F\_{X|Z} F\_{Y|Z} $
---
### Graph Topology vs Vertex Attribute Testing
For example, in the *C elegans*, each node/neuron has an attribute (e.g., location), and we desire to understand whether connectivity and cell type are independent.
$G=(V,E, Z\_v) \sim F\_{UZ}$
$H\_0: F\_{EZ} = F\_{E} F\_{Z} , \quad H\_A: F\_{EZ} \neq F\_{E} F\_{Z} $
---
### Time Series Testing
For example, is a given ROI independent of another ROI?
Consider a strictly stationary time series $\{(X\_t,Y\_t)\}_{t=0}^{n}$.
Choose some $M$, the "maximum lag" hyperparameter. We wish to test the following hypothesis.
$$ H\_0: F\_{X\_t,Y\_{t-j}} = F\_{X\_t} F\_{Y\_{t-j}} \text{ for each } j \in \{0, 1, ..., M\}, \ \forall t$$
$$ H\_A: F\_{X\_t,Y\_{t-j}} \neq F\_{X\_t} F\_{Y\_{t-j}} \text{ for some } j \in \{0, 1, ..., M\}, \ \forall t$$
---
### Application: Functional Connectomics
---
class: middle, center
# .center[theory]
---
class: top, left
### Definitions
- $t$ is sample MGC
- $T$ is population MGC
- $t^\prime$ is sample Dcorr
- $t^*$ is sample oracle MGC
- $\beta_n(t)$ = power of test statistic $t$ given $n$ samples
---
### Theoretical Desiderata
| name | principle |
| --- | --- |
| boundedness | $0 \leq t,T \leq 1 $|
| symmetric | $T(X,Y) = T(Y,X)$ |
| 1-linear. | $ T= 1 \Leftrightarrow y = A x + b$ |
| 0-indep. | $T=0 \Leftrightarrow H(X \lvert Y) = H(X)$ |
| ortho. invar. | $T(X,Y) = T(a_1 + b_1 C_1 X, a_2 + b_2 C_2 Y)$ |
--
| univ. consist. | $\beta\_n(t) \to 1, \quad \forall \, F\_{XY}$ |
| dominance | $\beta\_n(t^\prime) \geq \beta\_n(t), \, \forall n,\, t^\prime \in \mathcal{T}, \forall F\_{XY}$ |
| convergence | $t \to T$ as $n \to \infty$ |
---
class: top, left
#### MGC is a Reasonable Dependence Statistic
Thm: Oracle MGC has the following properties:
- 0 ≤ MGC ≤ 1
- MGC = 0 only under independence
- MGC is symmetric
- MGC = 1 only under linear relationship
- MGC is invariant to rotation, translation, and scale of X and/or Y
---
class: left
### MGC has power 1 for all FXY
Lemma: \\( \beta_n (t) \to 1 \\) as \\( n \to \infty \\) whenever \\( E [X], E [Y] < \infty. \\)
---
class: left
### Linear: Local = Global
Lemma: If $X$ is linearly dependent on $Y$, then it always holds that
$$ \beta_n(t^*) = \beta_n(t^\prime) $$
for any $n$.
---
class: left
### (Certain) Nonlinear: Local > Global
Lemma: There exists certain nonlinear relationships between $X$ and $Y$, and $k$ and $l$ such that,
$$\beta_n( t^{k,l} ) > \beta_n(t^\prime). $$
---
class: left
### Oracle MGC > Dcorr
Thm: Oracle MGC statistically dominates dcorr, that is,
$$\beta_n( t^* ) \geq \beta_n(t^\prime). $$
---
class: left
### Sample MGC Converges
Thm: For any $F_{XY}$ with finite 1st moment,
$\beta_n(t) \to 1$ as $n \to \infty$
if and only if distance functions $\sigma_X$ and $\sigma_Y$ are of strong negative type.
---
class: top, left
### Theoretical Desiderata
| name | principle |
| --- | --- |
| boundedness | ✅|
| symmetric | ✅|
| linear. | ✅ |
| 0-indep. | ✅ |
| ortho. invar. | L$_p$ |
| univ. consist. | ✅ |
| dominance | ✅ |
| convergence | ✅ |
---
### Overall Summary
- Oracle MGC theoretically dominates, even in finite samples
- MGC empirically nearly dominates on extensive simulations
- Visual quantitative characterization of arbitrary relationships
- MGC reveals geometry of dependence in real data
- MGC mitigate "post selection inference" problems
---
### References
- MGC for scientists [[1]](https://elifesciences.org/articles/41690)
- Foundational theory for MGC [[2]](https://www.tandfonline.com/doi/full/10.1080/01621459.2018.1543125)
- MGC for independence between graph topology & attributes [[3]](https://arxiv.org/abs/1703.10136)
- MGC for signal subgraph detection [[4]](https://arxiv.org/abs/1801.07683)
- MGC for clustering (ish) [[5]](https://arxiv.org/abs/1710.09859)
- Energy and kernel tests are equivalent [[6]](https://arxiv.org/abs/1806.05514)
- MGC + RF [[7]](https://arxiv.org/abs/1812.00029)
- mgcpy [[8]](https://arxiv.org/abs/1907.02088)
- graph independence testing [[9]](https://arxiv.org/abs/1906.03661)
- MGCX [10]
### Forthcoming (drafts available upon request)
- fast mgc [11]
- graph two sample testing [12]
- multivariate nonpar k-sample tests [13]
---
### Next Steps
- Get you going with it ([download here](https://neurodata.io/mgc))
- Apply it to your data
---
### Acknowledgements
[jovo@jhu.edu](mailto:jovo@jhu.edu) | [neurodata.io](https://neurodata.io) | [@neuro_data](https://twitter.com/neuro_data)
Carey Priebe
Eric Bridgeford
Drishti Mannan
Jesse Patsolic
Benjamin Falk
Kwame Kutten
Eric Perlman
Alex Loftus
Brian Caffo
Minh Tang
Avanti Athreya
Vince Lyzinski
Daniel Sussman
Youngser Park
Cencheng Shen
Shangsi Wang
Tyler Tomita
James Brown
Disa Mhembere
Ben Pedigo
Jaewon Chung
Greg Kiar
Satish
Jesus
Ronak
bear
Brandon
Richard Guo
Sambit Panda
Jeremias Sulam
♥, 🦁, 👪, 🌎, 🌌
---