Victor: Factored Models

Problem Definition

Factored models arise when approximating low-rank matrices in multi-dimensional scaling, principal component analysis, and multi-task classification. In this class of problems, we simultaneously partition both the model and the data.

Data set

For this problem, we are using the MovieLens dataset. This data set contains ratings and tags applied to movies by users of the online movie recommender service MovieLens (see details).

Victor Model and Code

The following code shows the model specification and model instantiation for the Factored Model problem applied to the MovieLens data set.

-- This deletes the model specification

-- This creates the model specification
  model_type=(python) as (w),
  data_item_type=(int,int,float8) as (rowr,col,rating),

-- This instantiates the model
  EXAMPLES movielens1m(row,col,rating)
  MODEL SPEC low_rank_nopara
  INIT_FUNCTION examples.factor_simple.factor_simple.initialize_model
  STOP WHEN examples.factor_simple.factor_simple.stopping_condition

1. Model Specification

This specification creates a python-type "low_rank_nopara" model which is stored in the database as a byte array. The data items are composed of the 3 values: row, column, and rating which are stored as an integer, integer, and float respectively. We specify the loss function, and that the scores are going to be aggregated by the SUM aggregator. Finally, we define the gradient step for the model.

In the code section below, you can see the loss and gradient function that the user provides. Note that this code is defined in a few lines of python using the utilities that Victor provides.

def se_loss(m, row, col, rating):
   i   = row - 1 #matlab...
   j   = col - 1
   v      =[i], m.R[j]) - rating
   return v*v

def grad(m, row, col, rating):
   L   = m.L
   R   = m.R
   i   = row - 1 # matlab...
   j   = col - 1
   err =[i],R[j])  - rating
   e   = -(m.stepsize * err)
   tempLi = list(L[i])
   low_rank_helper.scale_and_add(tempLi, R[j],e)
   low_rank_helper.scale_and_add(R[j],   L[i],e)
   L[i] = tempLi
   m.set_L(i, L[i])
   m.set_R(j, R[j])
   return m

2. Model Instantiation

For instantiating the model, we specify how to initialize the model by giving it a function name. Also, we specify when we should stop refining the model. Again, these functions are written in a few lines of python code as seen below:

def initialize_model():
   nRows = 6040
   nCols = 3952 # TODO QUERY FOR THESE
   return LowRankModel(20, nRows, nCols, 1.5)

def stopping_condition(s, loss):
   if not (s.has_key('state')):
      s['state'] = 0
   s['state'] += 1
   return s['state'] > 5

3. Model Application

Coming soon.

Running the Example

Coming soon.