Installing Columbus

This Web page describes how to set up Columbus and run a simple example. It is assumed that the target user of the system has basic familiarity with the Linux Operating System.

This documentation is created to be compatible with Red Hat or Ubuntu Linux Operating System running the bash shell. You might need to do minor modifications to the commands based on your environment. You do NOT need to have root access to your working machine in order to set up and run Columbus.

Source code (.tar.gz or .zip) is available in the download page.

1. Dependencies

You need to install the following dependency packages in order to run Columbus. The source code and examples in the Columbus release are compatible with the versions in parentheses.

2. Set up the Database

  • Once the Oracle is installed , use the  following command to startup the database  from sqlplus command line.


oracle startup screen

If the Oracle instance is successfully started, it will display a success message as above.

  • Create a user for example columbus with the password set to columbus. Grant all privileges to the created user.
  • From the root of source code, change the directory to scripts and execute the following command from sqlplus as follows

sqlconfig

The above sql file will create required types and objects for the Columbus system. If the execution was successful, the output should look like the following




sql config execution

  • From the scripts directory, load the User Defined Functions in to the database using loadscripts.sh file. Note that, the file has to be modified to comply with the user name and password that was created in the previous steps. After successful loading of the scripts, one should see the following output

load script

  • Now the tables and initial values required for the Columbus system is created by executing createColumbusTables.sql in the script directory from sqlplus command prompt. The output of the script should look like the following



createColumbusTables

If all the steps go through without any fatal errors, then the database is setup to run Columbus system. Congratulations. 

If some of the above steps did not succeed, you can safely re-run it after solving the issue that caused the interruption.

3. Environment

  • The following environment variables  are specific to oracle and must be set correctly to connect database instance from Columbus. The environment variables will be specific the path and sid of oracle installation. An example set of environment variables is given below.
export ORACLE_HOSTNAME=localhost.localdomain
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/11.2.0/dbhome_1
export ORACLE_SID=AL32UTF8
export NLS_LANG=.AL32UTF8
export ORACLE_UNQNAME=AL32UTF8
export PATH=$PATH:$ORACLE_HOME/bin
  • Set the environment variable COLUMBUS_HOME pointing to the base of the source code directory. For example
 export COLUMBUS_HOME=/home/oracle/columbus

You need to install the following dependency packages in order to run Columbus. The source code and examples in the Columbus release are compatible with the versions in parentheses.

5. Load Test Data

Sample datafiles are located in the data folder.  The schema  for the  data files are  created by executing  corresponding  create  files in the  scripts  directory. Run these files in the sqlplus command prompt in the usual way.

Once the schema is created, execute the control function using the sqlldr command from the scripts directory. Note that the control function must be invoked with appropriate username and password created. An example is illustrated below.


loading data

Once the data is loaded, we are now ready to use the system.

6. Run Columbus

Detailed usage information for each operation is given at the Using Columbus page , but we present simple invocation to test our Columbus installation.

Before executing, we have to initialize some parameters for the Columbus system. In the DataScientist directory, the config.R file contains the basic configuraion information. The variable config is a list and holds value for different configuration parameters. The parameters are explained below

  • repository : directory location where temporary R data files are stored.
  • user: user name to access the oracle database
  • pass: password to access the oracle database
  • read.intercept : intercept value for the read cost
  • write.intercept: intercept value for the write cost
  • read.cost : cost to read a value from a column in a tuple.
  • write.cost: cost to write a value to a column in a tuple
From the R command prompt, change the working directory to DataScientist and execute the
source('testprogram.R')

The execution should go through without any errors. The testprogram executes columbus operations in an interactive manner followed by batch execution.

After successful execution, one should observe the following message in the screen

loading data