Benjamin LeRoy

Email: benjaminpeterleroy [at] gmail [dot] com



Benjamin LeRoy

Email: benjaminpeterleroy [at] gmail [dot] com

About


I am currently a Data Scientist at Nike in Global Sourcing and Manufacturing. I received my Ph.D. from the Statistics and Data Science Department at Carnegie Mellon University in December 2021 under the supervision of Professor Chad Schafer. My thesis focused on simulator-enabled conformal prediction. As a graduate student, my research interests broadly reflected interests in quantifying uncertainty within applications of machine learning, leveraging optimization to solve novel statistic problems, capturing complex trends with smart data visualization, and more.

I’ve developed and helped develop multiple R and python packages and attempt to encourage best coding practices when I can.

Research


Conformal Prediction for Simulation Models
We proposed an approach for conformal based prediction regions when one has a simulator, and observe pairs (X, Y) exchangeable. We use split conformal and nested conformal inference and tools from set estimation to provide prediction regions even for complex outcome spaces with only a distance measure and some notation of “small ball” structure.

Conformal Prediction for Simulation Models,
Benjamin LeRoy and Chad Schafer. ICML Workshop on “Distribution-free Uncertainty Quantification” July 2021. Paper: local version (pre-publication)


Conformal Prediction for Simulation Models
We proposed an approach for conformal based prediction regions when one has a simulator, and observe pairs (X, Y) exchangeable. We use split conformal and nested conformal inference and tools from set estimation to provide prediction regions even for complex outcome spaces with only a distance measure and some notation of “small ball” structure.

Conformal Prediction for Simulation Models,
Benjamin LeRoy and Chad Schafer. ICML Workshop on “Distribution-free Uncertainty Quantification” July 2021. Local version (pre-publication):pdf



Practical Local Conformal Inference
This is on-going work on defining local partitions of the X space to use with local conformal inference to get as close as possible to conditional conformal inference. We utilize recent work on model diagnostics to partition the X space, allowing for application with poor CDE fits (and it also applicable to high dimensional X spaces).

MD-split+: Practical Local Conformal Inference in High Dimensions,
Benjamin LeRoy* and David Zhao* (*equal contribution). ICML Workshop on “Distribution-free Uncertainty Quantification” July 2021. ArXiv: 2107.03280


Practical Local Conformal Inference
This is on-going work on defining local partitions of the X space to use with local conformal inference to get as close as possible to conditional conformal inference. We utilize recent work on model diagnostics to partition the X space, allowing for application with poor CDE fits (and it also applicable to high dimensional X spaces).

MD-split+: Practical Local Conformal Inference in High Dimensions,
Benjamin LeRoy* and David Zhao* (*equal contribution). ICML Workshop on “Distribution-free Uncertainty Quantification” July 2021. ArXiv: 2107.03280



Tropical Cyclone Prediction Bands
Using data relative to tracks of a little less than 1000 storms from National Oceanic and Atmospheric Administration (NOAA) we develop a fully data-driven statistical process for the creation of prediction bands around paths. In a parametric boostrap framework, first we simulate potential curves from a noisy extension to a linear model and then leverage statistical depth and geometric structures to create different version of prediction bands. This work is joint with Niccolò Dalmasso and Robin Dunn.

A Flexible Pipeline for Prediction of Tropical Cyclone Paths,
Niccolò Dalmasso*, Robin Dunn*, Benjamin LeRoy*, Chad Schafer (* equal contribution). ICML Workshop (RESEARCH Track) “Climate Change: How can AI Help?” June 1019. ArXiv: 1906.08832


View work on github, as well a R package: TCpredictionbands.
Tropical Cyclone Prediction Bands
Using data relative to tracks of a little less than 1000 storms from National Oceanic and Atmospheric Administration (NOAA) we develop a fully data-driven statistical process for the creation of prediction bands around paths. In a parametric boostrap framework, first we simulate potential curves from a noisy extension to a linear model and then leverage statistical depth and geometric structures to create different version of prediction bands. This work is joint with Niccolò Dalmasso and Robin Dunn.

A Flexible Pipeline for Prediction of Tropical Cyclone Paths,
Niccolò Dalmasso*, Robin Dunn*, Benjamin LeRoy*, Chad Schafer (* equal contribution). ICML Workshop (RESEARCH Track) “Climate Change: How can AI Help?” June 1019. ArXiv: 1906.08832


View work on github, as well a R package: TCpredictionbands.

Additional Research
A novel record linkage interface that incorporates group structure to rapidly collect richer labels, Kayla Frisoli, Benjamin LeRoy, Rebecca Nugent. In: 2019 IEEE International Conference on Data Science and Advanced Analysics (DSAA), Paper.

Immune cellular homeostasis in early life is determined by genetic variants of cellular production and turnover, Tania Dubovik, Elina Starosvetsky, Benjamin LeRoy, Rachelly Normand, Yasmin Admon, Ayelet Alpert, Yishai Ofran, Max G'Sell, Shai S. Shen-Orr, bioRxiv: 256073.

Software

cowpatch (python Package)
cowpatch brings plot aggregation like seen in R packages like cowplot, gridExtra and patchwork to python, specifically relative to the ggplot implimentation in plotnine. This package internally leverages svg objects to provide a flexibible but powerful framework to accomplish it's goals.

This package is in collaboration with Mallory Wang a Statistics Ph.D. student at the University of Michigan.

View package website at benjaminleroy.github.io/cowpatch, as well as the python package on github.
cowpatch (python Package)
cowpatch brings plot aggregation like seen in R packages like cowplot, gridExtra and patchwork to python, specifically relative to the ggplot implimentation in plotnine. This package internally leverages svg objects to provide a flexibible but powerful framework to accomplish it's goals.

This package is in collaboration with Mallory Wang a Statistics Ph.D. student at the University of Michigan.

View package website at benjaminleroy.github.io/cowpatch/, as well as the python package on github.

EpiCompare (R Package)
The goal of EpiCompare is to provide the epidemiology community with easy-to-use tools to encourage comparing and assessing epidemics and epidemiology models in a "Time-Free" manner. This package provides the user the ability to compare epidemics and epidemiology models types (across both the "Agent"/"Aggregate" paradigm and the specifical models). All tools attempt to adhere to tidyverse/ggplot2 style to enhance easy of use.

This package is in collaboration with Shannon Gallagher, Ph.D. at NIH's National Institute of Allergy and Infectious Diseases.

View package website at skgallagher.github.io/EpiCompare, or the R package on github.
EpiCompare (R Package)
The goal of EpiCompare is to provide the epidemiology community with easy-to-use tools to encourage comparing and assessing epidemics and epidemiology models in a "Time-Free" manner. This package provides the user the ability to compare epidemics and epidemiology models types (across both the "Agent"/"Aggregate" paradigm and the specifical models). All tools attempt to adhere to tidyverse/ggplot2 style to enhance easy of use.

This package is in collaboration with Shannon Gallagher, Ph.D. at NIH's National Institute of Allergy and Infectious Diseases.

View package website at skgallagher.github.io/EpiCompare, or the R package on github.

Talks/Poster Presentations

Teaching


Instructor
  • Summer 2019: 36-350, Statistical Computing, Class Documents & Syllabus
    • an undergraduate course on core programming concepts using R: data structures, functions, iteration, debugging, abstraction to writing code to assist in statistical analysis (visualization, modeling, version control, etc).
    • my main contributions: (1) introduced coding style and best coding practices throughout the course, (2) presented tidyverse style coding (ggplot2, dplyr, tidyr, ...), (3) introduced high level computing concepts like object oriented programming and how to make packages in R, (4) provided high level overviews of the `split-apply-combined' paradigm, parallel computing and deep learning in R.
  • Summer 2017: 36-315, Statistical Graphics and Visualization, Syllabus
    • an undergraduate class on best visualization practices (primarily in ggplot2) and visualization theory.
    • my main contributions: (1) developed course to taught for the first time over the summer - including revamping the course's lectures and assignments, and (2) introduced more visual theory to course
Advising and Mentoring
  • Fall 2018: Data Science Initiative (DSI) Fellow
  • Summer 2018: Summer Undergraduate Research Experience (SURE) Graduate Advisor
    • Advised a team of undergraduates analysis trends in Human Trafficking, github
Teaching Assistance & Course Development
  • Fall 2018: 46-926/927 MSCF's Statistics and Machine Learning I/II, Syllabus 926 / Syllabus 927
    • Head TA; Assisted in moving class from R to python.
  • Fall 2017: 36-315, Statical Graphics and Visualization, Syllabus
    • Head TA; Assisted in assignment and test development.
Teaching Assistant
  • Fall 2020: 36-617: Applied Linear Models, Syllabus
  • Spring 2020: 36-402, Advanced Methods for Data Analysis, Syllabus
  • Fall 2019: 46-668: Special Topics: Text Analysis, Syllabus
  • Spring 2019: 36-618, Experimental Design and Time Series, Syllabus
  • Fall 2018: 36-705, Intermediate Statistics, Syllabus
  • Spring 2016: 36-315; Statistical Graphics and Visualization Syllabus
  • Fall 2016: 36-401; Modern Statistics

Code-Centric + TA Resources:

During my time at CMU I served on the computing committee for the Statistics and Data Science Department. This committee started developing computing resources for statistics students, specifically for statistics Ph.D. at CMU. This wiki represents some of that work (of which a lot of the foundation was done by myself). Before this work, I developed other resources (some similar to those above), but also developed automization tools for TA which can be found here.

Memberships


  • ASA: American Statistical Association
  • IEEE: Institute of Electrical and Elecetronics Engineers
    • IEEE-CIS: IEEE's Computational Intelligence Society

CV


CV: Download (updated January 2022)