# Senior IW

Just some links to material I need to keep track of:

### first meeting

the notes on variational inference:

and the papers:

### Meeting on 2/28/2012

In preparation for this meeting I read the notes on variational inference, and the draft of stochastic variational inference. Variational inference is a technique used to fit a model, i.e. when you have some data and a distribution model with some parameters it helps you find values for those paramenters.

The main points:

• you want to find the posterior distribution, i.e. the distribution of the parameters given the data.
• the form of the posterior is, for interesting problems, computationally intractable
• you use a variational approach: make a guess that includes some variational parameters, then you tweak those parameters until you get an optimal solution
• what’s optimal?
• when it’s closest to the distribution you’re looking for, i.e. the posterior
• KL divergence is used to measure the closeness of two distributions
• can’t minimize KL divergence directly, end up optimizing a function that is within a constant factor
• there are different ways to do this optimization:
• Coordinate ascent
• simply use the gradient to optimize. Need to iterate over data set, what if data set is infinite? i.e. streaming?
• Stochastic optimization with natural gradient
• gradient ascent uses eucledian distance
• can use a “Riemannian metric”, and take a randomized approach… allows us to update only using latest point!

There is a lot of detail in what was presented above, but I think I see the big picture.

The main benefit of today’s meeting was getting more concrete on the model I’ll need to work with: Factorial model, and possibly Factorial HMM.

So for my presentation next week I’ll need to understand the assumptions behind factorial models. An implementation for a factorial model in this context would be solid, an outstanding job would be a full factorial HMM treatment. Beyond the scope of a single semester is having a model that can have an adjustable number of additive distributions (e.g. appliances)

### 3/23/2012

• Ok, now plan is to follow the Kolter paper and apply to wattvision data. Need to:
• Write up EM to put in Background section.
• Thesis will start by introducing Hidden Markov Models, and dig into Factorial Hidden Markov model

### Update

Ok, so my iw has become mainly a close reading of the Kolter Factorial HMM paper, I’ll feed it wattvision data and see how things work out. Part of what I have to do is to create “empirical HMM” models of appliances running in a household. Here’s how Kolter and co came up some empirical HMMs from raw data.

First, smooth it: using total variation regularization . In a few words:

1. we want a transformation of the time points, how do we judge what a good transformation is?
2. we minimize the RSS, i.e. we want our transformation to be close to our original signal
3. we penalize proportional to the Total Variation, defined as the absolute value of the difference between successive time points:
$V(Y) = \sum_{1 < i < |Y|} |y_i – y_{i-1}|$
4. We end up minimizing:
$RSS(Y, X) + \lambda V(Y)$
• Notice the similarity to ridge and/or lasso regression. Kolter mentions that this is equivalent to placing a Laplacian prior on the data (… and then presumably performing MAP estimation…)