PyMW: Summer-y of Code

Today is the “suggested pencils down” date for Summer of Code and I’m very happy to say that my proposal is complete and I feel the summer was a great success!

For those of you who don’t know, PyMW is a Master-Worker computing framework in Python. It wraps several other Master-Worker frameworks such as MPI, Condor, BOINC or even just using multi-core processors, and exposes them as a simple and elegant API.

The way it works is, you create tasks and submit them to a Master and the master uses an interface/wrapper (BOINC, Condor, MPI, etc) to process those tasks. The master distributes the tasks out to compute nodes (or processor cores), called Workers, and the results get sent back to the Master. This allows you to debug using the multi-core interface (a single machine) and then, by changing one switch on the command line, you can run the same code on thousands of machines using BOINC or any other supported interface.

My proposal was to improve BOINC integration with PyMW by 1) eliminating the startup script and the need to compile C code; 2) adding pure-Python support for BOINC Assimilators and Validators; and 3) by adding a new checkpointing mechanism for long running jobs (optional).

I completed (2) very quickly by virtue of some old Python code I found in BOINC, but (1) took a lot of sweat and tears — the existing BOINC interface was functional, but in need of some serious work. It was running under Mac and Linux, however the Windows client was crashing on every task. To get it working under Windows I ended up creating a C++ launcher application to avoid using batch scripts.

In addition to my proposal, I also added a few other tasks. When working with BOINC, the existing interface assumed that Python was already installed and on the system PATH environment variable. This is not a very safe assumption and so I also crated a portable Python interpreter integrated with the BOINC API. PyMW will now install this interpreter as your BOINC application so that clients no longer need to have Python installed to run PyMW-BOINC compute jobs. Along the way I also created a new logo and graphic design for the PyMW web site and setup WordPress, check it out.

Sadly goal (3) was not completed, however, it was originally proposed as an optional part of the project. I actively chose to pursue the BOINC-Python interpreter over (3) because I felt it was more important to the BOINC interface. In the end, checkpointing can always be done manually, but sending the Python interpreter is a considerably harder task.

I am still wrapping up a few odds and ends, but for the most part, I feel very happy with the state of the BOINC interface. If you are looking for a distributed or parallel processing framework for a future project, please consider PyMW.

If you have any questions or comments, I would love to hear them!

Tagged Tags: , , on August 10, 2009 at 12:45 pm

PyBOINC Work Continues

I am still working on PyBOINC (the embedded Python interpreter with support for BOINC). The actual integration (exposing the BOINC API to Python) was easy, it’s the cross-platform build that’s most difficult.

The problem is that when running an application on BOINC, you have no guarantee of what libraries will be available, so you must either distribute the libraries you need or compile them statically. I chose the later, which also caused lots of issues with the Python standard library. It’s mostly working now, with the exception of the sqlite module.

I moved the repo over to my bit bucket account since Nicolas is busy with other projects. The latest code can be found here.

Tagged Tags: , , , on July 31, 2009 at 4:11 pm

Integrating Python & BOINC

I started working on an embedded Python interpreter for BOINC with Nicolás Alvarez. The interpreter will be the main executable for Python based workunits and provides interop with the BOINC client API. It allows, for example, reporting percentage complete per workunit, which isn’t possible currently with PyMW alone.

The project will likely be incorporated into the BOINC trunk, but the current code is available on bitbucket as PyBOINC if you are interested in seeing how it works.

The Python developers made embedding the interpreter incredibly easy from C/C++ and the BOINC API is readily available from C/C++ as well, so there really isn’t much code. However, packaging the Python standard library and compiling it so it runs on multiple platforms are still going to be challenging.

Tagged Tags: , , , on July 12, 2009 at 10:29 pm

Pacman Server Running!

I got the Pacman server running today! It’s processing matches in a round-robin style, not the double elimination tournament yet, but it’s running :)

This has again exposed the BOINC file immutability issue and I can see now that there is no hacking my way around it — I am going to have to change the interface so that all executables and WorkUnits are uniquely named.

Tagged Tags: , , , on July 6, 2009 at 10:14 pm

PyMW Distributed Pacman Server

Pacman CTF
To create a real application for PyMW, I think I am going to create a distributed Pacman server.

Last semester I took an artificial intelligence class that used Pacman as a teaching tool. At the end of the semester, there was a tournament where each team could pit their Pacman AI client against each other in a game of Pacman-style capture the flag. We submitted our clients to a server and then waited 24 hours or so for the results to appear. If your client crashed, you had to wait another 24 hours to see your standings.

My idea is this:

  • Create a PyMW application that runs Pacman tournaments
  • Each job will be 3 matches between two clients
  • An animated GIF will be created for one of the 3 matches (one that agrees with the outcome)
  • The BOINC interface will be used so students can contribute compute time
  • The output of the PyMW application will be records in a MySQL database
  • Create a website for statistics

To test the tournament server, I am going to get the AI client code from last semester’s teams and then run it on the BOINC Alpha group. This should provide a solid test of the PyMW BOINC interface and my tournament server.

Tagged Tags: , , , , on June 29, 2009 at 12:17 pm

PyMW: Week one

Today is the official end of my 7th day of working on the PyMW interface for BOINC for Google Summer of Code.

It took me three days to get the my first PyMW app to run (monte_pi.py), which ran on 4 virtual nodes (4 tasks). More than four tasks was causing problems, it turned out that there was a bug in the PyMW BOINC interface. Now that that’s fixed, I ran with 200 nodes yesterday and 800 today.

This morning, I tried running with 2 physical nodes: my laptop and my Ubuntu VM, which failed at the end of computation. Somehow the canonical results are not being recognized which causes the BOINC interface to get lost in limbo and hang forever.

This week, I created a pure-Python assimilator for PyMW, which works pretty well, but is perhaps causing the error above.

I also rewrote a big swath of the BOINC interface to stop it from using a new thread for each task during task reclamation (getting data back from BOINC). Since it was using one thread per task, it was reaching the maximum number of scheduler threads. This in turn caused the execution thread to hang until some of the tasks completed. Now it reclaims tasks in a single thread and is able to queue all tasks in a single shot, greatly improving the throughput from PyMW -> BOINC.

Overall, it’s been really fun so far. The first few days were trying, since there was little documentation of how to get PyMW to play nice with BOINC. But seeing the first application run was great :)

Tagged Tags: , , , , on June 5, 2009 at 12:01 pm

Google Summer of Code!

PyMW: Python + BOINC

My proposal has been accepted by the Python Software Foundation for Google Summer of Code 2009!

I will be working on the Python Master-Worker computing project (PyMW), a Python API that provides access to various distributed and parallel computing frameworks. In particular, I will be working on the Berkeley Open Infrastructure for Network Computing (BOINC) integration. BOINC is best known as the underlying system that enabled SETI@Home and I’m really excited to peek under the hood!

My goal is to make it easier for PyMW users to create BOINC applications by simplifying the setup process as well as adding new support for some important BOINC features. This will require some changes to PyMW as well as some changes to the BOINC server. If you are interested in seeing all the details, a public copy of my proposal is posted at the GSoC website.

Tagged Tags: , , , , , on April 23, 2009 at 12:33 am

AI, Pacman & Python…

Collision Detection Sketch

I’m taking an Artificial Intelligence class this semester, and the class uses Pacman (in Python) as a pedagogical tool for learning AI. What a great idea! The projects typically start off by asking you to play a game of Pacman, and then dive into the content.

Here is one of the projects, if you are interested.

Also, if you want to use this system at your school, I think they are actively seeking other instructors to try it. Contact Dan Klein if you are interested in more information.

Tagged Tags: , , on February 13, 2009 at 12:18 pm

Six Reasons to Love Python

Reading language documentation is usually dry and offers few *good* surprises. While “Python in a Nutshell” was not exactly exciting, the language offers some features that you might gloss over if you just jumped in and started hacking. Here is a brief list of six little things that I found most interesting.

Note: If you are new to python, and want to try out some of the functions I discuss here, check out the web-based interactive python shell at shell.appspot.com (no download required).

6. Else clauses in Loops and Try…Catch

Every language has an else clause for if statements, and python is no exception, but not many languages have an else clause in loop structures. For example:


for i in range(10):
if i == 11: break
else:
print "11 was not found"

The else clause executes when the loop terminates naturally, without calling break (this also works with while loops). In the loop above, “11 was not found” is printed because i only ranges from 0 to 9 and break will never be executed. Else has similar functionality for try…catch:


try:
value + ''
except TypeError:
print "value is not a string"
else:
print "value is a string"

Like in loops, the else executes when the try clause terminates normally. In both cases the else clause is optional.

5. Sequences

In Python, a sequence is a container of ordered items. Basic sequences include lists, dictionaries (name-value pairs), tuples and strings. Sequences are very important in Python, and can be thought of as an evolution of arrays. Each sequence has it’s own declaration syntax built into the language:


aList = [1,2,3,4,5]
aTuple = (1,2,3,4,5)
aString = "12345"
aDict = {'a':1,
'b':2,
'c':3,
'd':4,
'e':5,}

Negative indexing allows you to access a sequence from any direction, for example:

print aList[1] # prints 2
print aList[-1] # prints 5

Slicing lets you select subsets from a sequence:

print aList[1:3] # prints 2,3
print aList[:3] # prints 1,2,3
print aList[0:5:2] # prints odd values 1,3,5
print aList[1:5:2] # prints even values 2,4

The syntax is sequence[start:end:stride]. Start is exclusive, and end is inclusive. So [1:5] means 2,3,4,5. Omitting start causes the slice to begin at the first item in the sequence, omitting end defaults to the last item in the sequence and omitting step defaults step to 1. By using this syntax, you can select a nice variety of numeric patterns out of a sequence without a loop.

4. Assignment

There aren’t many surprises left in variable assignment, however Python does have some nice improvements, even if not entirely original. Multiple targets:


X = Y = 1

Both X and Y are assigned the value 1, this works just like C++. It’s a nice and simple feature that I always missed in .NET.

Unpacking assignment:

# assume that coord is the tuple (0, 1)
x,y = coord

Now x = 0 and y = 1. This can be used with any sequence, not just tuples. Another type of unpacking assignment allows you to juggle two variables without using a temp variable:


x,y = y,x

# the alternative would be:
temp = x
x = y
y = temp

3. Iterators

Iterators are nothing new, but Python’s obsessive dedication to them is refreshing. In most languages, when you learn about arrays, you typically learn some form of the following code:


for i as integer = 0 to array.length
foo &= array(i)
next

or

for(int i = 0; i <= array.length; i++)
{
foo += array[i]
}

Which leads to lots of off-by-one errors (did you see the bug when you read the code above?). The "pythonic" way to loop through a sequence is a paradigm shift from the old standard: instead of using a index to access an array, iterators are preferred whenever possible.


for item in array:
foo += item

Of course, iterators are available in other languages, and while this shift seems simple, you feel it when writing code. The fact that Python wants you to use iterators *by default* is an important distinction. In fact, there is no indexed loop in the language. This type of loop is accomplished by iterating over the range (or xrange) function:


for i in range(len(array)):
foo += array[i]

But notice here, that range defaults to exclude the final item (Range(5) returns [0,1,2,3,4]), so it is still harder to make off-by-one errors even when using an index.

2. List Comprehensions

List comprehensions (LCs) are available in many languages, including .NET in the form of LINQ. The builtin support for sequences in Python make LCs even more powerful. The syntax for an LC is


[ expression for target in iterable lc-clauses]

The square brackets here are part of the syntax. Here are a few examples:


aList = [1,2,3,4,5,6,7,8,9]

# basic syntax example, not doing much
# returns a list of all items in aList
[x for x in aList]

# returns a list of all items incremented by 1
[x+1 for x in aList]

# this uses an LC if clause
# returns a list of items > 8
[x for x in aList if x > 8]

# strings are also sequences
# ''.join() converts a list into a string
# returns "str", letters greater than n in aphabet
''.join([ch for ch in "astring" if ch > 'n'])

# slicing a string with a step of 2
# returns "SECRET"
''.join([ch.upper() for ch in "asbeccdreeftg"[1::2] ])

This gets even more interesting when using lists of objects. Of course, this functionality could be represented as a loop with an if statement and a temp variable, but this syntax is very nice and compact.

1. Significant Whitespace

Whitespace indentation defines variable scope. For example:

for x in seq:
# loop body is indented
print x

# no longer in loop body because
# indentation is at a different level

The reason I like the idea of significant whitespace is because it:

1) Forces developers to use standard indentation, making the code more readable.

2) Eliminates the need for "ok, I'm done" signals like "End Function" or "}". These are extraneous keystrokes that don't help the developer, they help the compiler and lower the overall efficiency of the language.

Significant whitespace also has a down side: mixing tabs and spaces confuses the interpreter and will bring you to the brink of insanity. But using a good tool (like Eclipse) eliminates this issue.

Conclusion

Python is not a revolutionary language. In fact, most of its features have been borrowed from other languages. The thing that sets python apart is the way these features are applied and the dedication by the Python developers to make common tasks in the language as clear and concise as possible.

Tagged Tags: on May 27, 2008 at 2:08 pm

Matrix Lib Update

I have updated the Matrix Library to make it more useful. Mainly, I’ve removed the 2×2 limitations from calculating determinants and finding inverses. I also added several new functions:

isSize: Checks the size of a matrix by rows and columns
newMatrix: Creates a new matrix of the specified size
transpose: Transposes the given matrix (rows -> cols)
augment: Augments A with matrix B -> AB
multRow: Multiplies the given row by constant k, k != 0
multRowAdd: Multiplies the source row by constant k, k != 0
then adds that row to the dest row
swapRow: Swaps row i with row j
ref: Uses Gauss elimination to reduce to row-echelon
format
rref: Uses Gauss-Jordan elimination to reduce to
reduced-row-echelon format
swapCol: Swaps column i for column j
delCol: Deletes the specified column
delRow: Deletes the specified row
minor: Returns the row minor, Mij

The most important additions were the elementary row operations, multRow, swapRow and multRowAdd – these were used to implement the row-echelon reduction methods. det(), ref() and rref() are the most *interesting* additions, although, I think the algorithms are not very efficient.

Tagged Tags: , on January 31, 2008 at 11:57 am