This page collects earlier projects from my bachelor’s and master’s studies. My current research agenda is described on the Research page.
Master’s-level projects
Bugni & Horowitz (2021): Permutation Tests for the Equality of Distributions of Functional Data
Summary
This thesis studies the Bugni-Horowitz (2021) permutation framework for testing whether two samples of functional observations are generated by the same distribution. It builds the Hilbert-space, functional-data, Cramér-von Mises, and permutation-test background needed for the procedure, then implements the original method, simulation studies, and an Adelaide electricity-demand application in R. The application compares cleaned, detrended, and deseasoned workday and Saturday demand curves; both the mean-focused and Cramér-von Mises-style components reject equality, making the application more a proof of concept than a borderline empirical case. The original extension develops a persistence-focused alternative aimed at differences in dependence structure between curves, pointing toward harder settings where distributional differences are not visible from mean shifts alone.
Basis Choice for Scalar-on-Function Regression with Applications to Near-Infrared Spectroscopy
Summary
This group project studies how basis and truncation choices affect scalar-on-function regression, where a scalar outcome is predicted from a functional covariate. It compares direct basis-expansion regression with functional principal component regression across Fourier, B-spline, and monomial bases, using cross-validated mean squared prediction error to choose smoothing and truncation levels. The simulations show the practical bias-variance and numerical-stability tradeoffs: Fourier performs well in the controlled designs, monomial bases run into collinearity and boundary behavior, and FPCR is less sensitive to the raw number of basis functions once the number of components is fixed. The gasoline application predicts octane values from 60 near-infrared spectroscopy curves; the best reported specification is FPCR with four functional principal components built from seven Fourier basis functions, closely followed by direct B-spline basis regression.
Outlier Detection in Sensor Data using Functional Depth Measures
Cooperation: Daimler AG
Summary
This project builds an unsupervised workflow for detecting abnormal production-process sensor curves, motivated by Daimler bolt-tightening torque data. It treats each recording as a functional observation and adapts depth-based outlier detection from Febrero, Galeano, and González-Manteiga (2008), rather than reducing the curve to a small set of tabular features. The repository includes a package-style R implementation, OutDetectR, for preparing irregular curves, aligning observations, approximating grids, and running the detection and update procedures. A notebook walks through the method and synthetic examples, while a local Shiny app visualizes candidate outliers and the underlying curves for interactive inspection.
Comparison of Variable Importance Feature Selection Methods in Continuous-Response Random Forest
Summary
This project examines how random-forest variable-importance measures behave when predictors differ in scale, category structure, noise, or relationship to the response. It contrasts CART-style and conditional-inference forests and compares permutation importances, null-importance procedures, and p-value based corrections inspired by the variable-selection literature. The simulations use controlled data-generating processes to show when standard importance rankings can prefer noisy or structurally advantaged variables rather than truly informative features. The application applies the same feature-selection workflow to diabetes data, turning the methodological comparison into a concrete example of how importance measures can change the substantive variables selected.
Bachelor’s-level projects
Malicious Intent and Multiple Testing in Regression Discontinuity Estimation
Summary
This thesis studies how undisclosed specification search can distort inference in sharp regression-discontinuity designs. It builds Monte Carlo data-generating processes with no true discontinuity and then constructs malicious-intent estimators that search across polynomial orders and selectively report specifications that maximize significance, effect size, or a preferred sign. The simulations quantify how this behavior inflates false positives, unreasonably small p-values, and reported effect sizes even before adding more realistic complications. Extensions with heteroskedastic errors and outliers show that the same incentives remain problematic outside the simplest homoskedastic setting, linking the exercise to broader concerns about publication bias and p-value-driven reporting.
The Effects of Component Permutation on Estimated Impulse Responses in a Three-Dimensional SVAR(1) Model
Summary
This seminar paper studies how variable ordering affects orthogonalized impulse-response analysis in a three-dimensional SVAR(1) model. It first shows analytically why Cholesky identification only has structural content when a recursive ordering is theoretically justified, then uses Monte Carlo simulations to compare impulse responses across component permutations. In the recursively identified case, incorrect orderings can move responses outside the bootstrap confidence bands of the correctly ordered model; in the non-recursive case, Cholesky-based responses can even reverse signs relative to the true structural responses. The paper closes by pointing to theory-driven identifying restrictions, generalized impulse responses, and local projections as alternatives when the data alone cannot justify a recursive ordering.