Written Prelim Passed!

Last week I passed my written prelim exam after revisions, so I’m posting my review paper, “Methods for Handling Unknown Transition Dynamics in Restless Multi-Armed Bandits”. Hopefully others may find it useful if they need to learn a little about RMABs.

For a brief simulation study for my prelim, I examined the effect of heterogeneous arms on algorithms that assume homogeneous arms and the effect of the RMAB’s budget size. I focused on binary RMAB algorithms that are (more-or-less) straightforward extensions of Q-learning. As expected, algorithms suffer from unaccounted heterogeneity among RMAB arms. More surprisingly, however, the Q-learning algorithms I studied performed markedly different for different budget scales. By this I mean that working with the same estimation algorithm, an RMAB with a budget of M=20 active arms out of N=100 total arms may perform significantly better under a budget of M=100 and N=500 arms, even though M/N = .2 in both cases. One easy explanation is that the larger number of arms could cause slower convergence rates to optimal performance. Additionally, if arms are assumed to be heterogeneous (as I did), we can’t share data across the arms as we’re estimating optimal Whittle Indices, so this may exacerbate this problem.

Conor Artman