Project Victor

Victor is a middle-ware system that uses an RDBMS to solve a large class of statistical data analysis problems. Victor's main technical claim is that many supervised machine learning algorithms can be processed using the features that are available in any commercial or open-source database. The insight is that a special style of algorithms, incremental gradient algorithms, have data access properties that allow them to be processed by any existing RDBMS.


Components

The Victor project contains four sub-projects. All the codes are available on their corresponding Web pages!

  • Columbus provides a declarative framework of operations for feature selection over in-RDBMS data.
  • Victor-SQL integrates incremental schemes with an RDBMS via a (hopefully) easy-to-use python interface.
  • Bismarck unifies the underlying architecture of in-RDBMS analytics using incremental gradient schemes.
  • HOGWILD! discovers a new way of parallelizing incremental gradient algorithms. Hogwild's approach is simple: get rid of locking entirely! We prove that as long as the data are sparse, Hogwild achieves linear speedups.
  • Jellyfish exploits a large-scale parallel stochastic gradient algorithm for nonconvex relaxations for large-scale matrix completion. Jellyfish is two orders of magnitude faster to the same error (RMSE) versus any algorithm that we know about!