class: center, middle name:opening ## Learning a Data-Driven Nosology: ## Progress, Challenges & Opportunities --- ## Steps
1. Acquire data 2. Build and run pipelines 3. Develop and apply clustering methods --- ### Progress: Datasets Acquired
--- ### Progress: Pipeline Built!
--- class: center, middle ## Clustering Methods: In Progress -- 1) Don't find spurious clusters
2) Do find legitimate clusters --- ### Challenge #1: Spurious Clusters
--- ### Challenge #1: Spurious Clusters
--- ### Challenge #1: Spurious Clusters
--- ### Challenge #1: Spurious Clusters
--- ### Challenge #1: Spurious Clusters
--- ### Challenge #1: Spurious Clusters
--- ### Challenge #1: Spurious Clusters
--- ### Challenge #1: Spurious Clusters
--- class:center, middle ### Opportunity #1: Spurious Clusters
Standard tools find spurious clusters
Fancy tools require too much data to be practical
.w[**Gap**: Clustering tools with appropriate statistical guarantees] --- ### Challenge #2: Legitimate Clusters
--- ### Challenge #2: Legitimate Clusters
--- class: center, middle ### Opportunity #2: Legitimate Clusters
Thus far, harmonzing processing does not remove batch effect
Can further harmonizing processing mitigate batch effect?
If not, can harmonizing acquisition mitigate batch effect?
.w[**Gap**: Pipelines that harmonize across datasets, and data to evaluate pipelines] --- ### Future Outlook - Microarrays never mitigated batch effects sufficiently - Neither microarrays nor MRI "count" - fMRI has another degree of freedom: stimulus -- ### Much More To Do!
(i love this guy)
jovo@jhu.edu
--- #### Challenge #3: Overlapping Clusters
--- #### Challenge #3: Overlapping Clusters
--- #### Challenge #3: Overlapping Clusters
--- #### Challenge #3: Overlapping Clusters
--- #### Challenge #3: Overlapping Clusters
--- #### Challenge #3: Overlapping Clusters
--- #### Challenge #3: Overlapping Clusters
--- class: center, middle ### Opportunity #3: Overlapping Clusters
Any tree with thresholds will put arbitrarily close individuals arbitrarily far away
Trees that put individuals near boundary in multiple groups could mitigate this issue
.w[**Gap**: Tree learning tools with appropriate statistical guarantees]