Victor: Conditional Random Fields

Problem Definition

Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data (Definition taken from here).

Dataset

For this problem we are using a set of loaded documents.

Victor Model and Code

The following code shows the model specification and model instantiation for the Conditional Random Fields problem applied to a set of documents. This model is used to label the documents in this set.

-- This deletes the model specification
DELETE MODEL SPECIFICATION crf;

-- This creates the model specification
CREATE MODEL SPECIFICATION crf (
   model_type=(python,python) as (w,template),
   data_item_type=(python) as (featurized_docs),
   objective=examples.CRFs.crf_psql.compute_objective_item, 
   objective_agg=SUM, 
   grad_step=examples.CRFs.crf_psql.gradient_step
);

-- This instantiates the model
CREATE MODEL INSTANCE crf_single_doc
   EXAMPLES documents_loaded(featurized_docs)
   MODEL SPEC crf
   INIT_FUNCTION examples.CRFs.crf_psql.initialize_model
   STOP WHEN examples.CRFs.crf_psql.stopping_condition
;

1. Model Specification

This specification creates a "crf" model whose type is defined as a (python,python) pair. The data items are featurized documents that are stored as byte arrays in the data base (as python type). We specify the loss function, and that the scores are going to be aggregated by the SUM aggregator. Finally, we define the gradient step for the model.

In the code section below, you can see the loss and gradient function that the user provides. Note that this code is defined in a few lines of python using the utilities that Victor provides.

# The Gradient step function.
# We are maximizing so the rule is
# w^{k+1} = w^{k} + \nabla L(w^{k}, x)
# where L is the log likelihood.

def gradient_step( (m,t), (labeled_doc,)):
   d = parse_template.template_single_document(labeled_doc, t)    
   m = simple_crf.take_gradient_step(m, d)
   return (m,t)


# The objective value is the log likelihood
# Expressed below.

def compute_objective_item( (m,t), (labeled_doc,) ):
   d = parse_template.template_single_document(labeled_doc, t)
   z = simple_crf.compute_normalization(m, d)
   v = simple_crf.compute_weight_of_labeled(m, d)
   return z - v

2. Model Instantiation

For instantiating the model, we specify how to initialize the model by giving it a function name. Also, we specify when we should stop refining the model. Again, these functions are written in a few lines of python code as seen below:

def internal_initialize_model(template_file_name, stepsize):
   t = parse_template.parse_template(template_file_name)
   m = simple_crf.build_model([],[],stepsize)
   return (m, t)

def initialize_model():
   return internal_initialize_model('< 100

3. Model Application

Coming soon.

Running the Example

Open VICTOR_SQL/examples/CRFs/crf_psql.py and make sure that it uses the right path to your VICTOR_SQL folder.

Run the following commands to run the example:

$ cd VICTOR_SQL/examples/CRFs
$ make
$ ../../bin/victor_front.py crf_spec