Learning Series: SIGNs, Part 1(a) Reviewing Memory-less-ness

SIGNs: (S)tochastic Processes, (I)nfinitesimal (G)enerators, & Bayesian (N)etworks

EDIT: I dropped off of this series last year! As an update, I’ll probably pick this back up for fun, but I dropped the series because the group I was working with at IBM shifted focus from streaming event data to restless bandits.

In my last post, I talked about some of the practical motivations for choosing Dynamic Bayesian Networks to model some real-world processes. This time, we’ll start to talk about how we actually start to do this from a formal / mathematical perspective. Our end goal will be to build up to Event-Driven Continuous Time Bayesian Networks (ECTBNs).

Fair warning: at this point, if you’re not a statistics or related grad student, this may get reaaaally dry for you. Go here if you want to just to hit the high-level, satisfying stuff.

Now, like last time, let’s say you want to model a real-world process so that later, once we collect data, we can use those data to make inferences which help us answer important practical questions related to the causal structure of that process. As mentioned, we’re going to sometimes refer to the CityLink application from the ECTBN paper.

There are a few ways we could develop our model, but broadly speaking, people take two approaches: bottom-up or top-down. If you can intuitively spot how the math you know relates to the final model you’d like to use to make inferences, then a bottom-up approach is reasonable. More often, if you’re not so sure, I think a top-down approach of focusing on the high-level, abstract conceptual model and then boiling out what mathematical tools are relevant, and then selecting the ones that work best for your questions is reasonable and usually more common.

Because I already know where we want to end up and because I’ll be using these posts as an excuse to (re-)learn some math, we’ll be moving from the bottom up, and we’ll take pauses to clearly understand the math before continuing on to the next stage of model-building. Once we’ve fleshed out the model and the justifications for it, we’ll get to the estimation and inference problems that relate to our original practical problem.

In this post and the next few, we’ll start by reviewing and working through relevant definitions and concepts. (I will usually be passively assuming you have familiarity with them if I don’t go into great depth.)

Memory-Less-Ness, the Markov Property, the Semi-Group Property, & Exponential Time

At a high level, we’ll eventually see that the semi-group property, as related to infinitesimal generators, abstracts the Markov property in processes and is a generalized version of the memory-less property. In turn, this property induces exponentially distributed time structure—later, we’ll see that for (E)CTBNs, this will greatly simplify the derivation of a likelihood function for (E)CTBNs, which in turn will make our learning / estimation problem easier.

For now, the rest of this post will be a review and proof of relevant results related to the memory-less property, linked below (I haven’t figured out yet how to post LaTeX directly into a squarespace site, or how to display a PDF directly, but may adjust that later).

Here’s the mathematical exposition if you just want that.