Python

Python-related posts
Posted by R. Tyler Ballance

Dealing with statics in Python is something that has bitten me enough times that I have become quite pedantic about them when I see them. I'm sure you're thinking "But Dr. Tyler, Python is a dynamic language!", it is indeed, but that does not mean there aren't static variables.

The funny thing about static variables in Python, in my opinion, once you understand a bit about scoping and what you're dealing with, it makes far more sense. Let's take this static class variable for example:

  1. >>> class Foo(object):
  2. ... my_list = []
  3. ...
  4. >>> f = Foo()
  5. >>> b = Foo()

You're trying to be clever, defining your class variables with their default variables outside of your __init__ function, understandable, unless you ever intend on mutating that variable.

  1. >>> f.my_list.append('O HAI')
  2. >>> print b.my_list
  3. ['O HAI']
  4. >>>

Still feeling clever? If that's what you wanted, I bet you do, but if you wanted each class to have its own internal list you've inadvertantly introduced a bug where any and every time something mutates my_list, it will change for every single instance of Foo. The reason that this occurs is because my_list is tied to the class object Foo and not the instance of the Foo object (f or b). In effect f.__class__.my_list and b.__class__.my_list are the same object, in fact, the __class__ objects of both those instances is the same as well.

  1. >>> id(f.__class__)
  2. 7680112
  3. >>> id(b.__class__)
  4. 7680112


When using default/optional parameters for methods you can also run afoul of statics in Python, for example:

  1. >>> def somefunc(data=[]):
  2. ... data.append(1)
  3. ... print ('data', data)
  4. ...
  5. >>> somefunc()
  6. ('data', [1])
  7. >>> somefunc()
  8. ('data', [1, 1])
  9. >>> somefunc()
  10. ('data', [1, 1, 1])
  11. >>>

This comes down to a scoping issue as well, functions and methods in Python are first-class objects. In this case, you're adding the variable data to the somefunc.func_defaults tuple, which is being mutated when the function is being called. Bad programmer!

It all seems simple enough, but I still consistently see these mistakes in plenty of different Python projects (both pony-affiliated, and not). When these bugs strike they're difficult to spot, frustrating to deal with ("who the hell is changing my variable!") and most importantly, easily prevented with a little understanding of how Python scoping works.

PYRAGE!

Posted by R. Tyler Ballance

My "roots" in the open source community come from the BSD side of the open source spectrum, my first major introduction being involvement with FreeBSD and OpenBSD. It is not surprising that my licensing preferences fall on the BSD (2 or 3 clause) or MIT licenses, the MIT license reading as follows:

Copyright (c) [year] [copyright holders]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

I bring the subject up because I wanted to address a brief "kerfuffle" that occurred recently on the Eventlet mailing list with the maintainer of gevent, a fork/rewrite of Eventlet. Both projects are MIT licensed which gives anybody that would like to fork the source code of either project a great deal of leeway to hack about with the code, commercialize it, etc.

Posted by R. Tyler Ballance

In my spurious free time I maintain a few Python modules (py-yajl, Cheetah, PyECC) and am semi-involved in a couple others (Django, Eventlet), only one of which properly supports Python 3. For the uninitiated, Python 3 is a backwards incompatible progression of the Python language and CPython implementation thereof, it's represented significant challenges for the Python community insofar that supporting Python 2.xx, which is in wide deployment, and Python 3.xx simultaneously is difficult.

As it stands now my primary development environment is Python 2.6 on Linux/amd64, which means I get to take advantage of some of the nice things that were added to Python 3 and then back-ported to Python 2.6/2.7. Regular readers know about my undying love for Hudson, a Java-based continuous integration server, which I use to test and build all of the Python projects that I work on. While working this weekend I noticed that one of my C-based projects (py-yajl) was failing to link properly on Python 2.4 and 2.5. It might be easy to cut-off support for Python 2.4, which was first released over four years ago, there are still a number of heavy users of 2.4 (such as Slide), in fact it's still the default /usr/bin/python on Red Hat Enterprise Linux 5. What makes this C-based module special, is that thanks to Travis, it runs properly on Python 3.1 as well. Since the Python C-API has been fairly stable through the 2 series into Python 3, maintaining a C-based module that supports multiple versions of Python.

In this case, it's as easy as some simple pre-processor definitions:

  1. #if PY_MAJOR_VERSION >= 3
  2. #define IS_PYTHON3
  3. #endif

Which I can use further down the line to modify the handling some of the minor internal changes for Python 3:

  1. #ifdef IS_PYTHON3
  2. result = _internal_decode((_YajlDecoder *)decoder, PyBytes_AsString(bufferstring),
  3. PyBytes_Size(bufferstring));
  4. Py_XDECREF(bufferstring);
  5. #else
  6. result = _internal_decode((_YajlDecoder *)decoder, PyString_AsString(buffer),
  7. PyString_Size(buffer));
  8. #endif

Not particularly pretty but it gets the job done, supporting all major versions of Python.

Python on Python

Writing modules in C is fun, can give you pretty good performance, but is not something you would want to do with a large package like Django (for example). Python is the language we all know and love to work with, a much more pleasant language to work with than C. If you build packages in pure Python, those packages have a much better chance running on top of IronPython or Jython, and the entire Python ecosystem is better for it.

A few weeks ago when I started to look deeper into the possibility of Cheetah support for Python 3, I found a process riddled with faults. First a disclaimer, Cheetah is almost ten years old; it's one of the oldest Python projects I can think of that's still chugging along. This translates into some very old looking code, most people who are new to the language aren't familiar with some of the ways the language has changed in the past five years, let alone ten.

The current means of supporting Python 3 with pure Python packages is as follows:

  1. Refactor the code enough such that 2to3 can process it
  2. Run 2to3 over the codebase, with the -w option to literally write the changes to the files
  3. Test your code on Python 3 (if it fails, go back to step 1)
  4. Create a source tarball, post to PyPI, continue developing in Python 2.xx

I'm hoping you spotted the same problem with this model that I did, due to the reliance on 2to3 you are now trapped into always developing Python targeting Python 2. This model will never succeed in moving people to Python 3, regardless of what amazing improvements it contains (such as the Unladen Swallow work) because you cannot develop on a day-to-day basis with Python 3, it's a magic conversion tool away.

Unlike with a C module for Python, I cannot #ifdef certain segments of code in and out, which forces me to constantly use 2to3 or fork my code and maintain two separate branches of my project, duplicating the work for every change. With Python 2 sticking around on the scene for years to come (I don;t believe 2.7 will be the last release) I cannot imagine either of these workflows making sense long term.

At a fundamental level, supporting Python 3 does not make sense for anybody developing modules, particularly open source ones. Despite Python 3 being "the future", it is currently impossible to develop using Python 3, maintaining support for Python 2, which all of us have to do. With enterprise operating systems like Red Hat or SuSE only now starting to get on board with Python 2.5 and Python 2.6, you can be certain that we're more than five years away from seeing Python 3 installed by default on any production machines.

Posted by R. Tyler Ballance

Earlier this week I was checking out Pygame, pondering what I could possibly build with it that could keep me motivated enough to finish it. Motivation would like be the primary problem for me with any amount of game programming; I'm not a gamer, I don't harbor a dislike of games, they're just not something I typically spend time playing (I do like to play "haggard late night open-source hacker" though, that's a fun one). Friday night I stumbled across an idea, ET likes to play (casual) games, perhaps we could write a game together; ask any engineer at EA or Ubisoft, there's nothing more romantic than working on a game.

Talking over the idea with ET on the ride home from the office, we talked about creating a typing-oriented game and started to brainstorm. The tricky aspect of a typing-oriented game is you have to walk the fine line of "educational gaming", that is to say, the game's goal is not to teach the player how to type. That sucks. Contrasted to some other games where the means of progressing in some games is by solving a puzzle, killing noobs in others, in this game we wanted the player to progress through levels/situations with their typing ability (ET finds this fun, we do not have this in common).

Over pizza we discussed more about how the levels would work, I decided that I wanted to use stories/articles instead of random words for the "content" of the game. We settled on a couple fundamental concepts: the player would earn coins by correctly completing a words as the scrolled from right to left (similar to a ticker tape), they would lose coins if they made a mistake or could not keep up. After a player completed a level (i.e. a "story") they would find themselves in a "store" of sorts, where they could purchase "tools" for future levels with their coins.

Posted by R. Tyler Ballance

These days, the majority of my day job revolves around working with Apture's Django-based code which, depending on the situation, can be a blessing or a curse. In some of my recent work to help improve our ability to scale effectively, I started swapping out Apache for Spawning web servers which can more efficiently handle large numbers of concurrent requests. One of the mechanisms by which Spawning accomplishes this task, is by using eventlet's tpool (thread pool) module in addition to some other clever tricks. With Apache, we used pre-forked workers to accomplish the work needed to be done and while still using forked child processes with Spawning, threading was also thrown into the mix, that's when "shit got real" (so to speak).

We started seeing sporadic, difficult to reproduce errors. Not a lot, a trickle of exception emails throughout the day. Digging deeper into some of the exceptions, careful stepping through Apture code, into Django code and back again, I started to realize I had thread-safety problems. Shock! Panic! Despair! Lunch! Disappointment! Shock! I felt all these things and more. I've long lamented the number of globals used in Django's code base but this is the icing on the cake.

Apparently Django's threading problems are sufficiently documented in a few places. Using a slightly older version of the Django framework certainly doesn't help but it doesn't appear that recent releases (1.1.1) can guarantee thread-safety anyways. I think it's safe to assume the majority of Django framework users are not using threaded web servers in any capacity, else this would have become a far larger issue (and hopefully of been fixed) by now. From NoReverseMatch exceptions, to curious middleware problems to thread-safety issues in the WSGI support layer, Django has potholes lying all along the road to multithreadedness.

Beware.

Posted by R. Tyler Ballance

Lately I've fallen in love with a couple of fairly simple but powerful technologies: haproxy and WSGI (web server gateway interface). While the latter is more of a specification (PEP 333) the concepts it puts forth have made my life significantly easier. In combination, the two of them make for a powerful combination for serving web applications of all kinds and colors.

HAProxy is a robust, reliable piece of load balancing software that's very easy to get started with, For the uninitiated, load balancing is a common means of distributing the load of a number of inbound requests across a pool of processes, machines, clusters and so on. Whenever you hit any web site of non-trivial size, your HTTP requests are invariably transparently proxied through a load balancer to a pool of web machines.

I started looking into haproxy when I began to move Urlenco.de away from my franken-setup of Lighttpd/FastCGI/Mono/ASP.NET to a pure Python stack. After poking around some articles about haproxy I discovered it can be used for virtual hosts as well as simple load balancing. Using a haproxy's ACLs feature (see Section 7 in the configuration.txt), you can redirect requests to one backend or another.