Installing Bismarck

This Web page describes how to set up Bismarck and run a simple example. It is assumed that the target user of the system has basic familiarity with the Linux Operating System.

This documentation is created to be compatible with Red Hat or Ubuntu Linux Operating System running the bash shell. You might need to do minor modifications to the commands based on your environment. You do NOT need to have root access to your working machine in order to set up and run Bismarck.

Source code (.tar.gz or .zip) is available in the download page.

1. Dependencies

You need to install the following dependency packages in order to run Bismarck. The source code and examples in the Bismarck release are compatible with the versions in parentheses.

For example, to install PostgreSQL, unpack the tarball and let the base directory be $PGDIR. To install it without root version, perform the following steps:

./configure --prefix=$PGDIR/pgsql
make install

2. Set up the Database

Set up and start a PostgreSQL or Greenplum database in the usual manner. The following steps illustrate this for setting up a PostgreSQL database:

export PGHOME=$PGDIR/pgsql
mkdir SQL_DATA
bin/initdb -D SQL_DATA
bin/pg_ctl -D SQL_DATA -l logfile start

On success you should get a message that says server starting. If there are any problems in setting up the database, you may want to refer to the PostgreSQL Manual. Now, create a new user and a new database:

export PATH=$PATH:$PGHOME/bin/
 Enter name of role to add: bismarckvm
 Shall the new role be a superuser? (y/n) y
createdb test_bismarck

3. Environmental Variables

Adjust the following environmental variables in the file bismarck.path in the base folder as per your system settings:

  • Add the paths to the psql binary (and python binary if needed) to PATH
  • Adjust the PostgreSQL variables PGHOME, PGUSER, PGPORT and PGDATABASE (respectively for Greenplum)
  • Add the path to the PostgreSQL library to LD_LIBRARY_PATH

Effect the changes in the environmental variables.

source bismarck.path

4. Installation

Go to the base folder to compile and install Bismarck for PostgreSQL:

make pg 2>> install.err
make install-pg 2>> install.err

or for Greenplum:

make gp 2>> install.err
make install-gp 2>> install.err

If the install.err does not contain any 'ERROR' messages, the installation has successfully completed. Congratulations!

If the installation did not succeed, you can safely rerun it after solving the issue that caused the interruption.

5. Load Test Data

Sample data files are available in the separate bismarck_data folder. The .sql files also define the schema for the tables. Run these files in the usual way:

psql -f dblife.sql
psql -f forest.sql
psql -f mlens1m.sql
psql -f conll.sql

6. Run Bismarck

Detailed usage information is given at the Using Bismarck page, but we present simple invocations here to test your Bismarck installation.

Issue the following SQL queries to run the tasks (invoking SVM is similar to LR). Check if the output loss value is decreasing and within the respective ranges after the first several epochs:

SELECT dense_logit('forest', 1, 54);		--Range: (2.9e5, 4e5)
SELECT sparse_logit('dblife', 22, 41270);	--Range: (2.7e3, 4e3)
SELECT factor('mlens1m', 333, 6040, 3952, 10);	--Range: (0.7, 1)
SELECT crf('conll', 4444, 7448606, 22, 19, 1); 	--Range: (0.0, 5e4)

The same checks can be done using the Python-based front end, using the sample spec files provided in the bin folder:

cd $PGHOME/bin