name:opening **An Introduction to Graph Statistics**
Joshua Vogelstein | {[BME](https://www.bme.jhu.edu/),[CIS](http://cis.jhu.edu/), [KNDI](http://kavlijhu.org/)}@[JHU](https://www.jhu.edu/)
.foot[[jovo@jhu.edu](mailto:j1c@jhu.edu) |
| [@neuro_data](https://twitter.com/neuro_data)] --- class: center, middle ## .center[https://neurodata.io/graspy/] --- ## Outline - Background - Statistical Models of Connectomes - Statistical Models of Populations of Connectomes - Applications - Discussion --- class: middle ## .center[.k[Background]] --- ### What is a Connectome? - .r[Network] of a brain, at a spatiotemporal precision & extent - .r[Nodes] are distinct biophysical entities - .r[Edges] indicate the presence of a connection/communication between nodes - .r[Attributes] of the network, nodes, or edges are possible --
- Example nodes: cells, cellular compartment, cellular ensembles - Example edges: synapses, gap junction, fiber bundles --- ## Many adjacency matrices, one graph Drosophila larva connectome, Eichler et al. 2017
--- ### Definitions .r[Neural (activity) coding]: inferring the relationships between neural *activity* and past, present, or future events, states, or traits -- .r[Connectal coding]: inferring the relationships between neural *connectivity* and past, present, or future events, states, or traits -- .s[events]: genetic, developmental, experiential (stimuli/behavior) .s[traits]: IQ, sex, personality, learning disabled .s[states]: happy, resting, manic --- ### Why Use Statistical Models? - Connectome estimates are noisy - Connectomes, and their relationships to events, states, or traits, can be complicated - We wish to - quantify uncertainty - incorporate domain knowledge to the extent possible - summarize natural phenomena in a "simple" way - understand limitations of analyses ### Implications - All inferences about population .r[depend on model] --- ### Connectome Analysis Styles - Bag of edges - Bag of features - Bag of parameters --- ### Bag of Edges - Treat each edge as independent - Implicit model: independent edge model --- ### Bag of Features - Choose $m$ features and compute them per graph - Characterize connectome with this set of "parameters" - Implicit model: exponential random graph model --- ### Bag of Parameters - Build a **statistical parametric model** of brain network - Can encode some domain knowledge explicitly - Can model edges, nodes, communities - Implicit model: latent structure model --- ### Limitations of approaches --- ### Limitations of Bag of Edges - Completely ignores graph structure of data - Too simple for many questions --- #### Sometimes the signal is in the node
--- #### Edge-wise stats show no significance
--- #### Node-wise test finds the signal
--- ### Limitations of Bag of Features - how do I choose which features (hint: arbitrary)? - how many features are possible given a graph with $n$ nodes (hint: many)? - do these features characterize the brain (hint: no)? - can we make causal claims using these features (hint: no)? - are these features independent (hint: no)? - least well understood of the approaches, but very common --- ### Same Stats, Different Graphs
- num vertices = 12 - num edges = 21 - number of triangles = 10 - global clustering coefficient = 0.5
.foot[[Chen et al.](https://link.springer.com/chapter/10.1007/978-3-030-04414-5_33)] --- ### Distribution of Features, n=10
--- ### Condition on "close" to base graph
.footnote[(edges=31, threshold=3, n=200k)] --- ### Limitations of Bag of Parameters - Conceptually less intuitive --- class: middle ## .center[.k[Statistical Models of Connectomes]] --- ### Erdos-Renyi (ER) - akin to assuming a neuron's spike rate is Poisson with a fixed rate. - all edges independent - all edges sampled from identical distribution - only 1 parameter: prob of an edge - $\mathbb{P}[A_{i,j}] = p$ Notes -
Simplest random graph model; lacks descriptive power
--- ### Drosophila Connectome ER
- p = 0.166 --- ### Degree Corrected Erdos-Renyi (DCER) - edges are independent - edges are sampled from .r[different] distributions - .r[n+1 parameters]: degree correction for each node - $\mathbb{P}[A_{i,j}] = \theta_i\theta_jp$ Notes - n+1 paramers is much larger than 1 - still ignores structure --- ### Drosophila Connectome DCER
--- ### Stochastic Block Model (SBM) - akin to assuming a neuron's are in different states, which determine Poisson rate. - edges are .r[conditionally] independent - each node has a class assignment - $\mathbb{P}[A_{i,j}]$ = $B$(class i, class j) Notes - simplest >2 parameter model --- ### Drosophila Connectome SBM
--- ### Degree-corrected Stochastic Block Model (DCSBM) - edges are .r[conditionally] independent - each node has a class assignment - $\mathbb{P}[A_{i,j}]$ = $\theta_i\theta_jB$(class i, class j) Notes - simplest >2 parameter model --- ## Drosophila Connectome DCSBM
--- ### Random Dot Product Graphs (RDPG) - akin to latent state models in population coding - edges are conditionally independent - each node has a .r[latent position in d-dimensions] - $\mathbb{P}[A_{i,j}]$ = f(latent position i, latent position j) - for example, $\mathbb{P}[A_{i,j}]$ is the dot product of latent positions Notes - generalizes previous models --- ### Latent positions allow for more general relationships
--- ### Drosophila Connectome RDPGs
--- ### Model considerations/extensions - Directed extensions exist - Loopy extensions exist - Weighted extensions exist (mostly) --- class: middle ## .center[.k[Statistical Models of] .r[Populations of] .k[ Connectomes]] --- ### Joint Random Dot Product Graphs - All nodes have a latent position in d-dimensional space - Each graph has a latent position matrix
--- ### Common Subspace Independent Edge Graph (COSIE) - Common latent position matrix shared across all graphs - Individual graphs are transformation of the common matrix
--- class: middle ## .center[.k[Application of Population Models]] --- ### Mouse Connectomes From Same Genotype are Similar
--- ### Structural Connectomes are Heritable
- MRDPG model --- ### COSIE Model Can Recover Bilateral Separation
- HNU1 Dataset --- ### COSIE Model Can Recover Clusters of Same Test-Retest Scans
- HNU1 Dataset --- ### COSIE Model Can Perfectly Classify of Subjects
--- class: middle ## .center[.k[Discussion]] --- ## Summary and Next Steps - Connectomes are the mechanistic link: .center[.r[genotype --> phenotype]] - Extend ideas from coding theory to support these analyses - Connectomes, genetic and phenotypic data are available --- ### References - Connectal Coding [[1]](https://doi.org/10.1016/j.conb.2019.04.005) - Description of GraSPy [[2]](https://arxiv.org/abs/1904.05329) - Statistics on RDPG [[3]](https://dl.acm.org/citation.cfm?id=3242083) - Two-sample hypothesis testing for RDPG [[4]](https://arxiv.org/abs/1403.7249) - Two-sample hypothesis testing for two random graphs [[5]](https://projecteuclid.org/euclid.bj/1489737619) - COSIE model and estimation [[6]](https://arxiv.org/abs/1906.10026) - Omnibus Embedding for JRDPG estimation [[7]](https://ieeexplore.ieee.org/document/8215766) - Mouse Connectome Heritability [[8]](https://www.biorxiv.org/content/10.1101/701755v1) - Connectome smoothing [[9]](https://ieeexplore.ieee.org/document/8570772) --- ### Acknowledgements
Joshua Vogelstein
Jaewon Chung
Ben Pedigo
Eric Bridgeford
Hayden Helm
Jesus Arroyo
Ronak Mehta
Carey Priebe
Randal Burns
Michael Miller
Daniel Tward
Vikram Chandrashekhar
Drishti Mannan
Jesse Patsolic
Benjamin Falk
Alex Loftus
Brian Caffo
Minh Tang
Avanti Athreya
Vince Lyzinski
Daniel Sussman
Youngser Park
Cencheng Shen
Shangsi Wang
Ronan Perry
Vivek Gopalakrishnan
Tommy Athey
Heather Patsolic
Bijan Varjavand
♥, 🦁, 👪, 🌎, 🌌
--- class: middle ## .center[.k[Additional Information]] --- ## Adjacency Spectral Embedding - Method for estimating parameter for RDPG model (for single graph) - $\hat{X} = UD^{1/2}$ where $U, D, V = SVD(A)$. --- ## Omnibus Embedding - Method for estimating parameters for Joint RDPG model
--- ## Multiple Adjacency Spectral Embedding (MASE) - Method for estimating parameters for COSIE model
--- ### Genotype, Phenotype, Connectotype - .r[Phenotype]: a description of an individual's properties with regard to a phenomenon of interest - .r[Genotype]: a set of genes and associated variants associated with that phenotype - .r[Connectotype]: a set of nodes, edges, and their properties that are associated with that phenotype --
.center[Genotype --> Connectotype --> Phenotype] **Connectotypes are the implementation-level mechanisms linking genotypes to phenotypes** ---