class: left, name:opening ### A Community-Developed Open-Source Computational Ecosystem for Big Neuro Data Joshua T. Vogelstein
.foot[[jovo@jhu.edu](mailto:jovo at jhu dot edu) |
] --- class: center ###
What is NeuroData?
###
Goal: Build tools that work on .r[your] data
--- ### Why is it hard? - raw data are large collections of noisy 2D images - too big to load into ImageJ, MATLAB, or even Python - maintaining local cluster is a pain - sample sizes are small (e.g., 1) - real data are nonlinear --- ### What is our solution? - stop trying to build all the software ourselves - glue modular things together - get industry to build some of the hardest parts - data management (APL), visualization (Google) - get data out of institutional resources - AWS - focus on answering specific neurobiology questions with data where n>0 --- ### What is our software stack?
.footnote[https://neurodata.io/help/] --- ### What is an example use case? - 20 CLARITY brains, from two different conditions - we desire to know are the two classes different? - and if so, how? .center[
] --- ### What is the workflow? 1. TeraStitcher 2. NDPush 3. bossDB 4. NeuroGlancer 5. NDPull 6. NDReg 7. MGC 8. LOL 8. RerF 8. graphstats --- ### TeraStitcher - align & stitch collects of 2D images into 3D volumes - uses linear registration
.footnote[https://abria.github.io/TeraStitcher/] --- ### NDPush & NDPull data to/from cloud - images: uint8 and uint16 - annotations: uint32 - file formats: png, tiff, jpg, numpy, etc. - z-slice: flexible naming conventions - multi-channel - pip installable* .footnote[https://github.com/neurodata/ndpush and https://github.com/neurodata/ndpull] --- ### bossDB to store/manage data in cloud - based on Open Connectome Project code from 2011 - ported to AWS for scalability - stores small cuboids - organized into space filling curve - random access to arbitrary cutouts - downsamples - experimental metadata storage - authentication
.footnote[https://github.com/jhuapl-boss/boss] --- ### NeuroGlancer to Visualize .pull-left[ - Google's NeuroGlancer' - 3D pan, zoom, & rotate - Multi-channel overlays - Select individual ROIs - 30 minutes to load once ] .pull-right[
] .footnote[https://github.com/google/neuroglancer] --- ### NDReg to Register CLARITY to ARA - only ~1 hr per brain - fully automatic (no landmarks) - works on iDisco and other species too
.footnote[https://github.com/neurodata/ndreg] --- class: middle #
[demo](https://tinyurl.com/yco8h787)
--- class: middle ###
pre-processing over, on to statistics
--- ### Multiscale Graph Correlation - are CLARITY control brains different from conditioned brains? - if so, how are they different? - MGC implements universally consistent hypothesis testing - and can reveal the geometry of the relationship - key: compare "correlations" between all .r[local] pairs of points - must define a metric on CLARITY brains (eg, distortion) .footnote[https://github.com/neurodata/mgc] --- ### MGC Empirically Dominates
--- ### MGC Reveals Relationship Geometry
--- ### LOL for Dimensionality Reduction Given pairs of high-dimensional X and corresponding class labels Y, find the best low-dimensional representation of X -- #### Main Idea .r[LOL]: Use the means and PCA of .r[each class separately] .footnote[https://github.com/neurodata/lol] --- ### LOL Empirically Dominates
--- ### Random Projection Forests Given pairs of high-dimensional X and corresponding class labels Y, find the best discriminant boundary -- #### Main Idea .r[RerF]: Find many good .r[sparse] random projections of data .footnote[https://github.com/neurodata/RerF] --- ### Random Projection Forests Empirically Dominates .pull-left[ - random forests (RF) were best - recent extensions improve - we are even better ] .pull-right[
] --- ### Connectome Coding via Graph Stats
Characterizing the relationship between the *past environment* and the *present neural connectivity* .footnote[https://github.com/neurodata/graphstats] --- class: middle #
Discussion
--- class: top, left ### Acknowledgements
Carey Priebe
Randal Burns
Michael Miller
Brian Caffo
Michael Milham
Daniel Tward
Minh Tang
Avanti Athreya
Vince Lyzinski
Daniel Sussman
Yichen Qin
Youngser Park
Cencheng Shen
Shangsi Wang
Greg Kiar
Eric Bridgeford
Vikram Chandrashekhar
Tyler Tomita
James Brown
Disa Mhembere
Drishti Mannan
Jesse Patsolic
Benjamin Falk
Kwame Kutten
Eric Perlman
--- ### Questions? Now hiring! .pull-left[ | task | link | | --- | --- | | testing | [MGC](https://github.com/neurodata/mgc) | | dim red | [LOL](https://github.com/neurodata/LOL) | | classify | [RerF](https://github.com/neurodata/R-RerF/) | | graph stats | [graphstats](https://github.com/neurodata/graphstats/) | registration | [ndreg](https://github.com/neurodata/ndreg) | | email | [jovo@jhu.edu](mailto:jovo@jhu.edu) | | web | [neurodata.io](http://neurodata.io/) | | startup | [gigantum.com](http://gigantum.com/) |
♥, 🦁, 👪, 🌎, 🌌
]