Projects created during my Master’s Program


Bugni and Horowitz (2021) Permutation Tests for the Equality of Distributions of Functional Data

– Master’s Thesis
– Supervisor: Prof. Dr. Dominik Liebl

My Master’s Thesis presents a test developed in a 2021 paper by Federico Bugni and Joel Horowitz for testing the equality of distributions of functional data and applies it to electricity data from Australia. The thesis provides the theoretical background to understand the construction of the test statistic, introduces the necessary concepts from functional data analysis, explores Cramér-von Mises tests in a scalar setting, and introduces the theoretical foundations of permutation testing. The main original contribution of my thesis is a variation of the test presented in Bugni and Horowitz 2021 with the goal of testing for specific violations of the null hypothesis relating to the persistence of the data-generating processes.


Basis Choice for Scalar-on-Function Regression with Applications to Near-Infrared Spectroscopy

– Final project for the Research Module in Econometrics & Statistics
– Group Project with Jonathan Willnow and Jonghun Baek
– Supervisor: Prof. Dr. Dominik Liebl

This paper explores the use of functional bases to represent and analyze functional data sets, which are collections of functions. We discuss the theory behind representing a function in terms of a basis, how this can be used for approximation and smoothing, and consider the problem of choosing an appropriate number of basis functions. We also introduce the Karhunen-Loeve expansion, which is a way to decompose a function into its principal components, and show how this can be used to construct an empirical eigenbasis. Then we apply these concepts to scalar-on-function regression, which is a statistical technique for predicting a scalar response variable based on functional predictors, and analyze a dataset generated by near-infrared spectroscopy of gasoline samples.


Outlier Detection in Sensor Data using Functional Depth Measures

– Final project for the courses Microeconometrics and Scientific Computing
– Supervisor: Prof. Dr. Philipp Eisenhauer
– Cooperation with Daimler AG

This project was made possible by a cooperation between the University of Bonn and Daimler AG. It develops a procedure to identify abnormal observations in sensor data collected during production processes. In this specific example, each data point is a function mapping from angle to torque in the process of tightening a bolt. But the problem can be more generally described as an outlier classification problem in functional data and, thus, as an unsupervised learning problem. The algorithm implemented in the Jupyter Notebook Project_main.ipynb uses the concept of functional depth measures and was taken from a 2008 paper by Febrero, M., Galeano, P. and González-Manteiga, W. and adapted for the purposes of this project.


Comparison of Variable Importance Feature Selection Methods in Continuous Response Random Forest

– Final project for the course Computational Statistics
– Supervisor: Marina Khismatullina, Ph.D.

This project deals with different variable importance measures employed in the context of continuous response random forests, biases associated with them, and possible remedies proposed in the literature. The main part of the project is a simulation that compares multiple variable importance measures in light of the biases explored in a theoretical section. An application to a real-world data set from diabetes research provides context for the usage of these methods when applied to an actual feature selection problem.


Projects created during my Bachelor’s Program


Malicious Intent and Multiple Testing in Regression Discontinuity Estimation

– Bachelor’s Thesis
– Supervisor: Prof. Dr. Joachim Freyberger

My Bachelor’s Thesis discusses potential problems with using regression discontinuity design in research if authors purposefully misuse the approach. To do so, I construct three different “Malicious Intent Estimators” intended to mimic behavior such as, for example, p-hacking and explore how using them can affect the results of a study. The thesis also highlights the importance of critical thinking when it comes to publication bias and the incentives created by the current state of publishing.


The Effects of Component Permutation on Estimated Impulse Responses in a Three-dimensional SVAR[1] Model

– Final Project for the course Applied Time Series Analysis with R
– Supervisor: Prof. Dr. Robinson Kruse-Becher

This paper explores the effects of changing the order of components in a structural vector-autoregressive time series model and how it can affect the results of impulse response analysis. I present a theoretical analysis and simulations to show that changing the order of components can lead to incorrect results and discuss some basic ways to deal with this problem, including using additional information outside of the data.