# Senior IW

Just some links to material I need to keep track of:

### first meeting

the notes on variational inference:

and the papers:

- Online Variational Inference for the Hierarchical Dirichlet Process
- Stochastic Variational Inference

### Meeting on 2/28/2012

In preparation for this meeting I read the notes on variational inference, and the draft of stochastic variational inference. Variational inference is a technique used to fit a model, i.e. when you have some data and a distribution model with some parameters it helps you find values for those paramenters.

The main points:

- you want to find the posterior distribution, i.e. the distribution of the parameters given the data.
- the form of the posterior is, for interesting problems, computationally intractable
- you use a variational approach: make a guess that includes some variational parameters, then you tweak those parameters until you get an optimal solution
- what’s optimal?
- when it’s closest to the distribution you’re looking for, i.e. the posterior
- KL divergence is used to measure the closeness of two distributions
- can’t minimize KL divergence directly, end up optimizing a function that is within a constant factor
- there are different ways to do this optimization:
- Coordinate ascent
- simply use the gradient to optimize. Need to iterate over data set, what if data set is infinite? i.e. streaming?

- Stochastic optimization with natural gradient
- gradient ascent uses eucledian distance
- can use a “Riemannian metric”, and take a randomized approach… allows us to update only using latest point!

- Coordinate ascent

There is a lot of detail in what was presented above, but I think I see the big picture.

The main benefit of today’s meeting was getting more concrete on the model I’ll need to work with: Factorial model, and possibly Factorial HMM.

So for my presentation next week I’ll need to understand the assumptions behind factorial models. An implementation for a factorial model in this context would be solid, an outstanding job would be a full factorial HMM treatment. Beyond the scope of a single semester is having a model that can have an adjustable number of additive distributions (e.g. appliances)

to read:

factorial hmm’s

additive factorial hmms

### Meeting on 3/6/2012

### 3/23/2012

- Ok, now plan is to follow the Kolter paper and apply to wattvision data. Need to:
- Write up EM to put in Background section.
- Thesis will start by introducing Hidden Markov Models, and dig into Factorial Hidden Markov model

### Draft due on 3/30/2012

### Update

Ok, so my iw has become mainly a close reading of the Kolter Factorial HMM paper, I’ll feed it wattvision data and see how things work out. Part of what I have to do is to create “empirical HMM” models of appliances running in a household. Here’s how Kolter and co came up some empirical HMMs from raw data.

First, smooth it: using total variation regularization . In a few words:

- we want a transformation of the time points, how do we judge what a good transformation is?
- we minimize the RSS, i.e. we want our transformation to be close to our original signal
- we penalize proportional to the Total Variation, defined as the absolute value of the difference between successive time points:

\[

V(Y) = \sum_{1 < i < |Y|} |y_i – y_{i-1}|

\] - We end up minimizing:

- Notice the similarity to ridge and/or lasso regression. Kolter mentions that this is equivalent to placing a Laplacian prior on the data (… and then presumably performing MAP estimation…)