Logistic Regression

Problem and Data

We demonstrate how to run Logistic Regression on the DBLife dataset. The descriptions are the same as in this example. And the dataset can be downloaded from Bismarck Download. Running Support Vector Machine is very similar to this. The schema of the dblife table is as follows:

 Column |        Type        |                      Modifiers                       
 did    | integer            | not null default nextval('dblife_did_seq'::regclass)
 k      | integer[]          | 
 v      | double precision[] | 
 label  | integer            |

Python-Based Front-End

The spec file for this task is as given below (also available in the bin folder as sparse-logit-spec.py):

verbose = False
model = 'sparse_logit'
model_id = 22
data_table = 'dblife'
feature_cols = 'k, v'
label_col = 'label'
ndims = 41270
stepsize = 0.5
decay = 0.9
is_shmem = True

The stepsize and decay values were picked for this dataset after a grid search to get minimum loss value. To invoke the training, run the following command:

python bin/bismarck_front.py bin/sparse-logit-spec.py

SQL-Based Front-End

A SQL query for training the LR model is as follows:

SELECT sparse_logit('dblife', 22, 41270, 20, 1, 0.5, 0.9, 't', 't');

The same values are input here, in addition to iteration = 20, and mu = 1. The column names are implicitly assumed here to be the same as in the given schema. An alternate SQL query with implicit default values for many of the parameters (refer Using Bismarck) is as follows:

SELECT sparse_logit('dblife', 22, 41270);

Model Application

The trained model can be applied for prediction using the sparse_logit_pred function:

SELECT sparse_logit_init(22);
CREATE TABLE dblife_pred AS SELECT did, sparse_logit_pred(22, k, v) FROM dblife;
SELECT sparse_logit_clear(22);