name:opening ### NeuroData:
A community-developed open-source computational ecosystem for big neuro data Joshua T. Vogelstein
.foot[[jovo@jhu.edu](mailto:jovo@jhu.edu) |
| [@neuro_data](https://twitter.com/neuro_data)] --- ### Origin: Hosting public data (2011-)
.footnote[
] --- ### ... but, it's hard. Don’t work alone! - We set out to revolutionize data management - And we have had mixed results... - ...but excited to have seeded a second generation --- ### New Role: A community-developed... Shift from "build your own" to "see what's available; supplement where necessary." - Collaborative: across organizations (JHU, Allen, Janelia, Google) - Cloud computing ecosystem - End-to-end: data management from microscope to publication - Spanning paradigms and modalities (EM, AT, MRI, Light Sheet) - Visualize data at all stages - Reproducible, extensible analysis ([Jupyter](http://jupyter.org/), [Gigantum](https://gigantum.com/)) --- ### What is our software stack?
.footnote[https://neurodata.io/help/overview/] --- ### Spatial Database: BossDB - Ported to AWS for scalability - Random access to arbitrary cutouts - Downsamples - Experimental metadata storage (e.g. JSON) - Authentication - Spatial queries on annotations
.footnote[https://github.com/neurodata/boss] --- ### NDeX: data to/from cloud - images: uint8 and uint16 - annotations: uint64 - file formats: png, tiff, jpg, numpy, etc. - z-slice: flexible naming conventions - multi-channel - pip installable .footnote[https://github.com/neurodata/ndex] --- ### NeuroGlancer for Visualization .pull-left[ NeuroData Modifications include: - Boss support - Multi-color support - json backend - Ontology info ] .pull-right[
] .footnote[https://github.com/neurodata/neuroglancer] --- ### NDReg for Light sheet registration
- LDDMM - Fully automatic (no landmarks) - Modalities: iDisco, CLARITY, MRI, histology, etc., - Species: human, rat, mouse, zebrafish... .footnote[https://github.com/neurodata/ndreg] ---
--- ### Current Data Hosting Strategy - 200+ teravoxels - 100+ public & private datasets - 30+ collaborators - All 3D+ data (no ephys, etc.) - Costs us cloud pricing - Investigating distributed model - Open/excited to collaborate .footnote[https://neurodata.io/data] --- ### Statistical Machine Learning - High-Dimensional (non-Euclidean) and Low-Sample Size Data - [LOL](https://github.com/neurodata/lol), [LumberJack](https://github.com/neurodata/lumberjack), [MGC](https://github.com/neurodata/mgc), [GrasPy](https://github.com/neurodata/graspy), [knor](https://github.com/neurodata/knorR)... .center[
] .footnote[https://neurodata.io/tools/] --- ### Work In Progress - [FlyWheel](https://flywheel.io/) for data management - [CloudVolume](https://github.com/seung-lab/cloud-volume) (from Seung Lab) - Convolutional Dictionary Learning (with [J Sulam](https://sites.google.com/view/jsulam)) - Neuroglancer support for time-series data - Something with you? -- ### References 1. Vogelstein et al. *Nature Methods* (2018) [[1]](https://rdcu.be/banSS) 2. Kutten et al. *MICCAI* (2018) [[2]](https://link.springer.com/chapter/10.1007%2F978-3-319-66182-7_32) --- ### Acknowledgements
Carey Priebe
Randal Burns
Michael Miller
Daniel Tward
Eric Bridgeford
Vikram Chandrashekhar
Drishti Mannan
Jesse Patsolic
Benjamin Falk
Kwame Kutten
Eric Perlman
Alex Loftus
Brian Caffo
Minh Tang
Avanti Athreya
Vince Lyzinski
Daniel Sussman
Youngser Park
Cencheng Shen
Shangsi Wang
Tyler Tomita
James Brown
Disa Mhembere
Ben Pedigo
Jaewon Chung
Greg Kiar
Jeremias Sulam
♥, 🦁, 👪, 🌎, 🌌
--- class:center