About Me

Disa Mhembere

I am a PhD candidate in Computer Science at the Johns Hopkins University, researching in the The Institute for Data Intensive Engineering and Science. My primary advisor is Randal Burns, PhD. I obtained an MSE in Computer Science (2015) and an MSE in Engineering Management (2013), both from Hopkins.

Awards

I was awarded both the Hopkins Computer Science Graduate (Paul V. Renoff) Fellowship, and the UPE Special Recognition award in 2014.
In 2017 I received the UPE Academic Achievement award and the Best Presentation award at High-Performance Parallel and Distributed Computing (HPDC) Conference.

Interests

I love to work on highly-scalable tools (generally in C++ and Python) for data science and machine learning applications.

I enjoy backend web-service building and strongly believe in the web-service paradigm for user interaction with tools.

I also enjoy studying the effect of advancements in computing on business and their co-evolution as computing becomes ubiquitous in all industries.

Selected Experience

Kyndi Inc

Developing C++ and Python framework for Cognitive Memory explainable AI graph engine and database.

2017 - Present
Software Engineer II: Aritificial Intelligence Cognitive Memory group

IBM Research

Develop MPI based parallel/distributed matrix-sketching-infused machine learning routines, including novel clustering techniques.

May. 2016
Research Intern: Data Analytics Group

Johns Hopkins University Research

2012 - Present
Research Assistant: Storage Systems and parallel computing

Johns Hopkins University Teaching

2012 - Present
Lecturer, Teaching Assistant

Massachusetts Institute of Technology (MIT)

Design and analysis of a Medium Access Control protocol for Ultra-Wide Band (UWB) embedded units in MATLAB then C.

Summer 2009
Developer intern: Lab. for Information and Decision Systems

Selected Skills

Parallel & Distributed Computing


C++


Python


Web Dev.


Linux/Shell


Management

My Projects

NeuroData GraphDB

Development and maintainance of back and front-end web-services that leverage Django. The project sits at the intersection of big-data and computational neuroscience as we deliver connectome building and exploratory graph anaysis tools.

FlashGraph Algorithms

Development of vertex-centric, semi-external memory, parallel machine learning algorithms on graphs. FlashGraph, even in semi-external memory mode, is extremely fast and scales to billion-node graphs on a single commodity server!

knor: Parallel, Semi-External and Distributed Memory k-means

Development of a parallelized, NUMA-architecture aware k-means. We utilize methods such as informed thread binding, NUMA-aware task scheduling and other memory access latency reduction techniques. We achieve a 10-100 X speedup when compared with commercial products such as Spark's MLlib, Turi (Formerly Dato, GraphLab) and H2O.

Brain-Lab CI

BrainLab CI (BLCI), a continuous integration environment for collaborative, community experiments with data-quality controls and full provenance. Users can run code within a pipeline which may have code or data dependenciesds. All side effects of the new code are propagated to both code and data artifacts within a repo.


Teaching

I teach a Seminar in Emerging Technologies for the MSEM program. We discuss, evaluate and develop concrete business plans for emerging and disruptive technologies. We also invite several CEOs and owners of businesses to speak.


I teach an Excel and Python bootcamp for the MSEM program. Students learn (i) essential (advanced) skills in MS Excel, (ii) fundamental programming skills in Python to allow students to reason about how softare is developed.


I have been a teaching assistant for Parallel Programming several times. We tackle topics and projects using OpenMP, Java Threads, Hadoop!/MapReduce, Spark, Message Passing Interface (MPI) and GPU programming via CUDA.

Tennis

I received a full scholarship to compete at NCAA D1 level for Morgan State University as an undergraduate.


I am certified as a USTA P1 (highest level) high performance tennis coach. I have trained players who have gone on to compete in NCAA D1 tennis at UPenn, Ohio State University, and Loyola University.

Publications

  1. FlashR: parallelize and scale R for machine learning using SSDs. Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey Priebe, and Randal Burns. Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2018 (Best Paper Nomination).
  2. knor: A NUMA-optimized In-memory, Distributed and Semi-external-memory k-means Library. Disa Mhembere, Da Zheng, Carey Priebe, Joshua T. Vogelstein, and Randal Burns. High-Performance Parallel and Distributed Computing (HPDC) Proceedings 2017 (Best Presentation Award).
  3. FlashMatrix: Parallel, Scalable Data Analysis with Generalized Matrix Operations using Commodity SSDs. Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, and Randal Burns. Under Review: 2017.
  4. Semi-External Memory Sparse Matrix Multiplication on Billion-node Graphs in a Multicore Architecture. Da Zheng, Disa Mhembere, Vince Lyzinski, Joshua Vogelstein, Carey E. Priebe, and Randal Burns. IEEE Transactions on Parallel and Distributed Systems 2017.
  5. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. Da Zheng, Disa Mhembere, Alexander Szalay, Randal Burns. FAST 2015.
  6. Computing Scalable Multivariate Glocal Invariants of Large (Brain-) Graphs. Disa Mhembere, William Gray Roncal, Daniel Sussman, Carey E. Priebe, Rex Jung, Sephira Ryman, R. Jacob Vogelstein, Joshua T. Vogelstein, Randal Burns. Global Conference on Signal and Information Processing IEEE, 2013.
  7. MIGRAINE: MRI Graph Reliability Analysis and Inference for Connectomics. William Gray Roncal, Zachary H Koterba, Disa Mhembere, Joshua T. Vogelstein, Randal Burns, R. Jacob Vogelstein. Global Conference on Signal and Information Processing IEEE, 2013.

Conference Abstracts

  1. Spectral Clustering For Billion-Node Graphs. Da Zheng, Disa Mhembere, Youngser Park, Joshua T. Vogelstein, Carey E. Priebe, Randal Burns. SIAM Workshop on Network Science 2016.
  2. MR Graph with Rich attribUTEs DataBase (Mr. GruteDB). Gregory Kiar, William Gray Roncal, Disa Mhembere, Eric Bridgeford, Shangsi Wang, Carey E. Priebe, Randal Burns, Joshua T. Vogelstein. Proceedings of the Organization for Human Brain Mapping conference (OHBM) 2016.
  3. Multivariate Invariants from Massive Brain-Graphs. Disa Mhembere, Sephira Ryman, Daniel Sussman, Rex Jung, Joshua Vogelstein, R. Jacob Vogelstein, Carey Priebe, Randal Burns. Proceedings of the Organization for Human Brain Mapping conference (OHBM) 2013.
  4. Massive Diffusion MRI Graph Structure Preserves Spatial Information. Daniel L Sussman, Disa Mhembere, Sephira Ryman, Rex Jung, R Jacob, Vogelstein, Randal Burns, Joshua Vogelstein, and Carey E Priebe. Proceedings of the Organization for Human Brain Mapping conference (OHBM) 2013.
  5. Robust Vertex Clustering of Massive Brain-Graphs via Lq-Likelihood. Yichen Qin, Disa Mhembere, Sephira Ryman, Rex Jung, R. Jacob Vogelstein, Randal Burns, Joshua T. Vogelstein, Randal Burns, Carey Priebe. Proceedings of the Organization for Human Brain Mapping conference (OHBM) 2013.
  6. Feature Clustering from a Brain Graph for Voxel-to-Region Classification. N. Sismanis, D. L. Sussman, J. T. Vogelstein, W. Gray, R. J. Vogelstein, E. Perlman, Disa Mhembere, S. Ryman, R. Jung, R. Burns, C. E. Priebe, N. Pitsianis and X. Sun. Proceedings of the 2013 Greek Society of Biomedical Technology (ELEVIT) conference.
  7. Automatic cardiac & respiratory cycle detection of self-gated cardiac cine MRI navigator projections. Disa Mhembere, Liheng Guo, J. Andrew Derbyshire, Elliot R. McVeigh, Daniel A. Herzka. Proceedings of 2010 Meeting of the Biomedical Engineering Society conference (BMES).