name:opening # NeuroData (Science) Joshua T. Vogelstein
.foot[[jovo@jhu.edu](mailto:jovo@jhu.edu) |
| [@neuro_data](https://twitter.com/neuro_data)] --- ### What is Neural Data Science? A field that develops and applies statistical models, algorithms, and (database / machine learning) systems to - manage - visualize - wrangle - summarize - generalize - control neural data. --- ### Why is it hard? 1. volume: terabytes 2. variety: multi-modal (images, atlases, annotations, networks) 4. veracity: noisy Subject matter expertise is required to mitigate these challenges, including - computer science - statistical machine learning - brain science --- ### NeuroData Approach 1. build computational tools for manage & visualize 2. build statistical tools wrangle, summarize, and control 3. apply them to answer hard & important questions! --- ### Manage Data management systems enable users to create, read, edit, and delete "records" - stores multiple modalities (e.g., images & annotations) - scales to terabytes - atlases - annotations - networks --- #### Example: NeuroData Cloud
- 200+ teravoxels - 100+ public & private datasets - 30+ collaborators - All 3D+ data & annotations (no ephys, etc.) - Largest public open neuroscience data repository in the world! .footnote[https://neurodata.io/ndcloud/] --- ### Visualize Data visualization systems generate maps, charts, and tables to highlight/illustrate insightful perspectives on the data. - pan - zoom - overlay multiple channels - overlay annotations - manually annotate --- #### Example: NeuroGlancer
.footnote[https://github.com/neurodata/neuroglancer] --- ### Wrangle Data wrangling consists of any operation one applies to the data that maintains its representation, such as outlier detection, missing value imputation, and deconvolution - bias field correction - motion correction - nonlinear multi-modal registration --- #### Example: COBALT
- Large deformation diffeomorphic metric mapping (LDDMM) - Fully automatic (no landmarks) - Modalities: iDisco, CLARITY, MRI, histology, etc., - Species: human, rat, mouse, zebrafish... .footnote[https://neurodata.io/ndreg/] --- ### Summarize Data summaries include point estimates, confidence intervals, clusters, principle components analysis, etc. - network estimation from MRI - spike detection from calcium imaging - synapse detection - cell body detection from CLARITY images --- #### Example: COBALT
.footnote[https://neurodata.io/ndreg] --- ### Generalize Infer properties of a "population" (typically very high-dimensional) - hypothesis testing - modeling - simulation ---
---
.footnote[https://neurodata.io/mgc] --- ### Predict Prediction the something about a future sample given a data corpus - classify: X is of type A - regress: given X, Y is expected to be 6 - forecast: X is expected to be 6 tomorrow --- ### Example: RerF
- generalization of random forests - significantly improve over best machine learning algs on >100 benchmark problems .footnote[https://neurodata.io/rerf/] --- class: middle ## .center[Applications in Connectomics] --- ### Human Connectome Heritability
.footnote[https://neurodata.io/graspy] --- ### Modeling Drosophila Mushroom Body
.footnote[https://neurodata.io/graspy] --- class: inverse ### Batch Effects in Connectomics
- 20 datasets, >3000 scans .footnote[https://neurodata.io/ndmg] --- class: inverse ### Batch Effects in Connectomics
- 20 datasets, >3000 scans - sex effect is smaller than site effect .footnote[https://neurodata.io/ndmg] --- ### Papers 1. Vogelstein et al. *Nature Methods* (2018) [[manage & viz]](https://rdcu.be/banSS) 2. Kutten et al. *MICCAI* (2018) [[wrangle]](https://link.springer.com/chapter/10.1007%2F978-3-319-66182-7_32) 3. Vogelstein et al. *eLife* (2019) [[generalize]](https://elifesciences.org/articles/41690) 4. Tomita et al. *arXiv* (2015) [[predict]](https://arxiv.org/abs/1506.03410) 5. Athreya et al. *JMLR* (2018) [[graphs]](http://jmlr.org/papers/v18/17-448.html) 6. Kiar et al. *bioRxiv* (2018) [[MRI]](https://www.biorxiv.org/content/10.1101/188706v6) .pull-left[ ##### Tools - [ndcloud](https://neurodata.io/ndcloud) - [COBALT](https://neurodata.io/ndreg) - [MGC](https://neurodata.io/mgc) - [RerF](https://neurodata.io/rerf) - [GrasPy](https://neurodata.io/graspy) - [ndmg](https://neurodata.io/ndmg) ] .pull-right[ ##### Collaborations - JHU Neuro, CS, AMS, - JHSPH Biostats - Stanford (dlab) - Child Mind Institute - Allen Institute - Janelia Research Campus - Applied Physics Lab - Google ] --- ### Acknowledgements
Carey Priebe
Randal Burns
Michael Miller
Daniel Tward
Eric Bridgeford
Vikram Chandrashekhar
Drishti Mannan
Jesse Patsolic
Benjamin Falk
Kwame Kutten
Eric Perlman
Alex Loftus
Brian Caffo
Minh Tang
Avanti Athreya
Vince Lyzinski
Daniel Sussman
Youngser Park
Cencheng Shen
Shangsi Wang
Tyler Tomita
James Brown
Disa Mhembere
Ben Pedigo
Jaewon Chung
Greg Kiar
Jeremias Sulam
♥, 🦁, 👪, 🌎, 🌌
--- class:center