name:opening ## Statistical Foundations for Connectomics Joshua T. Vogelstein
.foot[[jovo@jhu.edu](mailto:jovo@jhu.edu) |
| [@neuro_data](https://twitter.com/neuro_data)] --- ### Why do we rely on statistics for scientific evidence? some (old dead) dudes/women: 0. had an interesting biological question (eg, lady tasting tea) 1. invented some wrong but useful models (eg, Gaussian, Poisson) 2. derived some estimators (eg, minimal mean squared error) 3. proved some theorems (eg, t-test works when data are Gaussian) 4. conducted extensive simulated and real data experiments substantiating theorems --- .center[
] --- ### while you guys were heroically 1. designing microscopes 2. getting better data 3. building computational infrastructure 4. developing machine vision we've been doing statistics for networks, and hopefully have enough stuff that some of it might be useful for you --- ### What are interesting biological questions? Sydney Brenner the two questions are: “.r[genetic specification of nervous systems] and ... .purple[nervous systems work to produce behavior].” -- Formalizing the above, given that we model the brain as a network - .r[ Pr[ Connectome | Genome ]] - .purple[ Pr [ Behavior | Connectome]] --- ### Following Brenner from a stats perspective motivates: 0. have an interesting biological question (eg, bilateral homology) 1. invent some wrong but useful models (eg, DC-SBM) 2. derive some estimators (eg, minimal mean squared error) 3. prove some theorems (eg, two-sample tests work) 4. conduct extensive simulated and real data experiments --- ### Larval Drosophila Mushroom Body
--- ### Erdős–Rényi
- all edges are independent and identically distributed --- ### Stochastic Block Model (SBM)
- all edges are independent and identically distributed **within a block** --- ### Degree-Corrected SBM
- all edges are independent and identically distributed **within a block**, and - each node has a *promiscuity* parameter --- ### Random Dot Product Graph
- all nodes have a latent position in some d-dimensional space - the probability of an edge between a pair of nodes is equal to the dot product of their latent positions --- ### Inhomogeneous Erdős–Rényi
- all edges are independent and **not identically distributed** - a perfect (over-) fit --- ### Model Complexities
--- ### Conclusions so far - We have models, estimators, theorems, numerical experiments - Questions we can ask next: - are left and right different? if so how? where? - are left and right independent? - how does training change? where does it change? ### References 1. Vogelstein JT, et al. [*Connectal Coding: Discovering the Structures Linking Cognitive Phenotypes to Individual Histories.*](https://github.com/neurodata/neurodata.io/raw/deploy/source/docs/Connectal_Coding.pdf) Current Opinion in Neurobiology, April 2019. 2. Chung, J et al. [*GraSPy: Graph Statistics in Python.*](https://arxiv.org/abs/1904.05329) arXiv, 2019. .center[https://neurodata.io/graspy/] --- class: middle, inverse ### .center[your slide here!] --- ### Acknowledgements
Carey Priebe
Randal Burns
Michael Miller
Daniel Tward
Eric Bridgeford
Vikram Chandrashekhar
Drishti Mannan
Jesse Patsolic
Benjamin Falk
Kwame Kutten
Eric Perlman
Alex Loftus
Brian Caffo
Minh Tang
Avanti Athreya
Vince Lyzinski
Daniel Sussman
Youngser Park
Cencheng Shen
Shangsi Wang
Tyler Tomita
James Brown
Disa Mhembere
Ben Pedigo
Jaewon Chung
Greg Kiar
Jeremias Sulam
♥, 🦁, 👪, 🌎, 🌌
--- class:center
--- ### Drosophila Brain Networks
--- ### Latent Structure Model
--- ### Geodesic Learning Drosophila Brain
--- ### Drosophila Optic Medulla
.foot[Takemura et al., 2013]