Senior IW
Just some links to material I need to keep track of:
first meeting
the notes on variational inference:
and the papers:
- Online Variational Inference for the Hierarchical Dirichlet Process
- Stochastic Variational Inference
Meeting on 2/28/2012
In preparation for this meeting I read the notes on variational inference, and the draft of stochastic variational inference. Variational inference is a technique used to fit a model, i.e. when you have some data and a distribution model with some parameters it helps you find values for those paramenters.
The main points:
- you want to find the posterior distribution, i.e. the distribution of the parameters given the data.
- the form of the posterior is, for interesting problems, computationally intractable
- you use a variational approach: make a guess that includes some variational parameters, then you tweak those parameters until you get an optimal solution
- what’s optimal?
- when it’s closest to the distribution you’re looking for, i.e. the posterior
- KL divergence is used to measure the closeness of two distributions
- can’t minimize KL divergence directly, end up optimizing a function that is within a constant factor
- there are different ways to do this optimization:
- Coordinate ascent
- simply use the gradient to optimize. Need to iterate over data set, what if data set is infinite? i.e. streaming?
- Stochastic optimization with natural gradient
- gradient ascent uses eucledian distance
- can use a “Riemannian metric”, and take a randomized approach… allows us to update only using latest point!
- Coordinate ascent
There is a lot of detail in what was presented above, but I think I see the big picture.
The main benefit of today’s meeting was getting more concrete on the model I’ll need to work with: Factorial model, and possibly Factorial HMM.
So for my presentation next week I’ll need to understand the assumptions behind factorial models. An implementation for a factorial model in this context would be solid, an outstanding job would be a full factorial HMM treatment. Beyond the scope of a single semester is having a model that can have an adjustable number of additive distributions (e.g. appliances)
to read:
factorial hmm’s
additive factorial hmms
Meeting on 3/6/2012
3/23/2012
- Ok, now plan is to follow the Kolter paper and apply to wattvision data. Need to:
- Write up EM to put in Background section.
- Thesis will start by introducing Hidden Markov Models, and dig into Factorial Hidden Markov model
Draft due on 3/30/2012
Update
Ok, so my iw has become mainly a close reading of the Kolter Factorial HMM paper, I’ll feed it wattvision data and see how things work out. Part of what I have to do is to create “empirical HMM” models of appliances running in a household. Here’s how Kolter and co came up some empirical HMMs from raw data.
First, smooth it: using total variation regularization . In a few words:
- we want a transformation of the time points, how do we judge what a good transformation is?
- we minimize the RSS, i.e. we want our transformation to be close to our original signal
- we penalize proportional to the Total Variation, defined as the absolute value of the difference between successive time points:
\[
V(Y) = \sum_{1 < i < |Y|} |y_i – y_{i-1}|
\] - We end up minimizing:
- Notice the similarity to ridge and/or lasso regression. Kolter mentions that this is equivalent to placing a Laplacian prior on the data (… and then presumably performing MAP estimation…)