Tutorial

Spartan is still a work in progress. You may encounter issues while running this tutorial; if this happens, please file an issue with the bug tracker.

Spartan Tutorial

This tutorial will lead you through fetching the Spartan code and setting it up on your machine, and the writing and execution of a simple application (linear regression).

Getting the code

The source distribution of Spartan requires Cython to be installed. If you do not have it already, you can install it via pip:

pip install [--user] cython

or, on Debian systems via:

apt-get install cython

The newest version of Spartan is available via the GitHub repo; clone it to your machine using:

git clone https://github.com/rjpower/spartan.git

To install Spartan and it's dependencies, use setup.py:

cd spartan
python setup.py develop --user

Initializing Spartan

We're now ready to start using Spartan. To use Spartan in an application, just import it and call initialize:

import spartan as sp
sp.initialize()

By default, Spartan runs in a multi-threaded mode. This is convenient for testing, but because of the Python GIL, it won't run any faster than a single process. If you want your application to run faster, you'll have to start Spartan in cluster mode.

Cluster mode

Spartan has builtin support for running on a cluster of machines via ssh. To run Spartan in cluster mode, we just change our call to initialize (alternatively, we can specify options via command line flags):

sp.initialize(['--cluster=1', '--hosts=localhost:4,foxtrot:8,bobcat:8'])

This tells Spartan to run 4 worker processes on the local machine, and 8 processes on each of foxtrot and bobcat. This assumes you have passwordless ssh access to the machines (i.e. you are using public key authentication and ssh-agent).

The full list of flags understood by Spartan can be found by running:

python -c 'import spartan; spartan.initialize(["--help"])

spartan.ini

Specifying flags on the command line or via initialize is a pain. Instead of doing this every time, we can put any flags we want to use into the spartan.ini in our home directory: $HOME/.config/spartan/spartan.ini. Flags will automatically be pulled in via this file (command line options will override the ini settings).

# spartan.ini
[flags]
hosts=a:8,b:8,c:8
cluster=1

Lazy Evaluation

An important note before we start: Spartan looks like NumPy, but uses lazy evaluation to capture expressions into an expression graph before running them. This results in a few differences from normal NumPy code. For example, if we run the following code (or via IPython):

# test_simple.py

import spartan as sp
sp.initialize()

x = sp.rand(10000, 10000)
y = sp.rand(10000, 10000)
z = x + y

We find that it executes faster than we'd expect. What's happening is that the operation is being deferred; which we can see if we print z:

print z
MapExpr { 
  children = DictExpr { 
    vals = {'k3': MapExpr { 
      children = DictExpr { 
        vals = {'k1': NdArrayExpr { 
          _shape = (10000, 10000),
          sparse = False,
          dtype = <type 'float'>,
          tile_hint = None,
          reduce_fn = None,
          expr_id = 3,
...

If we want to make sure a Spartan expression is evaluated, we can force it:

z.force()

After we do this, our console will stall for a bit while computing the result. We can inspect z using the normal slicing operators:

zslice = z[0:10, 0:10]

If we print zslice, we see that it's another expression node. We can see the actual result by calling glom():

print zslice.glom()
[[ 0.40431615  0.78758898  0.64372971  0.83738517  0.35252063  0.61085179
   0.50201212  0.77996823  1.01946723  1.54100078]
 [ 0.8255713   0.9784094   0.5944809   0.9151916   1.62231947  0.6985127
   1.05003632  1.10276565  0.50976401  1.79484165]
 [ 1.54347696  0.91283842  1.21791409  1.56077292  0.81929879  1.21397101
   0.7277431   1.19146302  1.08149324  1.30490862]
 [ 0.82468134  0.63385957  1.38083906  1.4475998   1.55722686  1.59542322
   0.71032193  1.22207764  1.39695799  0.56424774]
 [ 1.92879978  1.07464252  0.54652076  0.60779678  1.4911869   0.7863396
   0.77091178  0.41473159  1.78402857  1.46132885]
 [ 1.3920112   0.71718343  0.04712277  1.78117627  0.53857002  0.85893516
   0.57882432  0.85399033  1.28200041  1.4449996 ]
 [ 1.04510724  0.99072941  0.65680299  1.10509358  1.17346329  0.87073785
   1.1710321   0.55426738  1.36207195  0.29851448]
 [ 1.68384304  0.39496023  1.61920443  0.06775426  1.45594822  1.28999251
   1.09191703  0.20535368  0.43640492  0.52627781]
 [ 0.62870181  1.15012164  0.62304233  0.90594462  1.05958128  0.64907288
   0.93111492  1.3595818   0.84221813  1.60843973]
 [ 1.64512868  1.20342383  1.66162832  1.27969195  1.21537476  0.52412064
   1.00017709  1.32339968  0.64233495  1.34834738]]

Writing an Application

We're now ready to write a real application using Spartan; in this case, we're going to implement linear regression on a made up dataset.

import spartan as sp
sp.initialize()

N_DIM = 10
N_EXAMPLES = 1000 * 1000
EPSILON = 1e-6

x = 100 * sp.ones((N_EXAMPLES, N_DIM)) + sp.rand(N_EXAMPLES, N_DIM)
y = sp.ones((N_EXAMPLES, 1))

# put weights on one server
w = sp.rand(N_DIM, 1)

for i in range(50):
    yp = sp.dot(x, w)
    diff = x * (yp - y)
    grad = sp.sum(diff, axis=0).reshape((N_DIM, 1))
    w = w - (grad / N_EXAMPLES * EPSILON)
    print grad.sum().glom()

We can run this with the log level lowered to avoid seeing as many messages:

python lreg.py --log_level=WARN

For a simple problem like this one, and with such a small amount of data, Spartan can end up being slower than Numpy. As we increase our dataset size, we'll expect to see better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly