Midnight Oil

Wednesday, March 9, 2011

Computing function call binding

A lot of times, in Python, you write decorators that handle any type/number of arguments by using *a, **kw. Inside you usually call the wrapped function/method by passing the arguments as you got them by using *a, **kw again. You can do tracing, caching, retries and similar stuff very easily this way.

However, sometimes you want to write something generic like that, but also be able to be aware of which values were passed in to the wrapped method.

For example, I have some sort of hook framework for testing my code, where a user can register a hook that will throw an exception upon entering or exiting some method, or make it wait until it's released before calling the method (this one is good for testing and reproducing race conditions).

This is all good, but suppose I want to register a hook so it will only apply if the method is called with a certain value for parameter X. Or I want to be able to intercept and change the value that's passed into parameter X. The problem is that I need a way to understand what value will be assigned to parameter X and that value can come from positional arguments (*a), keyword arguments (**kw), or a default value.

The code below helps with this - it gets a callable, and values passed as *a, **kw, and returns a dictionary of values for each of the method's arguments.
For example, if the function was

  def foo(a,b,c=3)

and I called it with

  foo(1,b=2)

then the binding computation will return

  { 'a':1, 'b':2, 'c':3 }

Here's the code:


def compute_call_binding(f,a,kw):
   binding = {} # varname -> value
   spec = inspect.getargspec(f)
  
   # if it's a bound method, then getargspec returns self, even though client shouldn't supply it
   # so we need to identify that case and pretend 'self' was given explicitly in our positional arguments
   if getattr(f,'im_self',None) is not None: # XXX - better way to identify bound methods?
       a = [f.im_self] + list(a)

   # assign positional args
   for varname,val in zip(spec.args,a):
       binding[varname] = val
      
   # find any extra positional values
   extra_a = a[len(spec.args):]
      
   # handle keyword args and find any extra ones
   extra_kw = {} # will collect name->value for assignments to variables that weren't in the spec.args
   for varname,val in kw.iteritems():
       assertions.fail_if(varname in binding, "Got duplicate value for argument", varname=varname)
       if varname in spec.args:
           binding[varname] = val
       else:
           extra_kw[varname] = val

   # assign varargs and keyword variables if they appear in the argspec
   if spec.varargs is None:
       assertions.fail_if(extra_a, "Got extra positional arguments", argspec=spec, args=a, kwargs=kw)
   else:
       binding[spec.varargs] = extra_a           
   if spec.keywords is None:
       assertions.fail_if(extra_kw, "Got extra named arguments", argspec=spec, args=a, kwargs=kw)
   else:
       binding[spec.keywords] = extra_kw
      
   # assign defaults to vars in spec.args that have them and weren't assigned till now
   if spec.defaults is not None:
       for varname,val in zip(spec.args[-len(spec.defaults):],spec.defaults):
           if varname not in binding:
               binding[varname] = val

   # check all variables in spec.args have values
   for varname in spec.args:
       assertions.fail_unless(varname in binding, "No value for variable", varname=varname, argspec=spec, args=a, kwargs=kw, method=f)
      
   return binding

Saturday, April 3, 2010

Git, here we come

I just uploaded a patch to the git plugin for trac that allows existing wiki links that point to SVN changesets to keep working after the conversions.

It's the first time in a long while that I've been able to contribute code to an open source project - so I'm very happy with myself. (I have been able to open some bugs for IronPython, but that's about it)

The patch itself is trivial, but it was fun learning to work with trac plugins and in the process learning more about setuptools and some basic linux skills (I really am clueless there).

Also got our TeamCity working with git, and it was almost painless.

Just a few more kinks to figure out and we'll be able to finally switch...

Btw, this book is a very good intro to git. This video is great too.

Also, my boss found this post by Joel Spolsky about distributed version control convincing.

Saturday, November 8, 2008

Checking Type Contract from IronPython

In our system we have an IronPython server, called by a C# client.
The way we expose python to the client is by declaring a C# interface, and implementing it in python.

The layer that implements that interface is actually a facade, so there's very little logic there, and I got away without writing tests for it for a long time... However, I just needed to modify it, so decided to do the right thing :-)

Most of the testing I did until now used fake objects, so the first step was to get a mock library. I downloaded Michael Foord's Mock library which looks really nice and does everything I need at this point.

One thing I wanted to test is that my class implements the C# interface correctly in terms of methods' signatures. Otherwise IronPython throws an exception when it tries to map the arguments to the python method, or when it tries to convert the return value. However, my tests are in python, and calls to the object from python don't go through the type checking/conversion parts.

To overcome this, I call the methods using .NET reflection, which mimics a C# call more accurately and goes through the type checking code. Here's the utility class that helps streamline this process:


class CallThroughCSharpInterface(DynamicProxy):
  def __init__(self, obj):
      DynamicProxy.__init__(self)
      self._obj = obj
      self._cs_type = clr.GetClrType(type(obj))

  def _dispatch(self,name,*a):
      mth = self._cs_type.GetMethod(name)
      try:
          return mth.Invoke(self._obj,a)
      except TargetInvocationException, exc:
          raise exc.InnerException

Now I can do something like:


py_server = MyServer()
server = CallThroughCSharpInterface(py_server)
server.foo(x,y,z)

And the call to foo is now checked for type problems.

The code above reuses a DynamicProxy class I already had. Here it is for completeness:


class DynamicProxy(object):
  def __getattr__(self,name):
          return DynamicProxy.CalledNameHelper(self,name)
   
  class CalledNameHelper(object):
      def __init__(self,proxy,method_name):
          self.proxy = proxy
          self.method_name = method_name
      def __call__(self,*a,**kw):
          return self.proxy._dispatch(self.method_name,*a,**kw)

This only handles the basic cases - only method calls (no fields / properties) and the interface doesn't contain any overloaded methods.
That's all I needed, so I could keep it simple :-)

Thursday, September 25, 2008

surprises while porting to IPy 2

It's amazing how many assumptions your code can gather when you're not looking...

I'm now in the middle of porting my code from IronPython 1.1.1 to version 2.0 (beta 5). So far it's been more work than I expected and less work than it would have been without the excellent support from the IPy team and the community.

Since the hosting API has changed completely, I expected most of the work to be around getting the C# hosting code working again. I then expected some bugs in the new version since it's still beta, but not too many, since it's already beta number 5 (next is supposed to be RC1).

So that's basically what happened, except there was one more thing - I expected my code itself to be basically correct.

What surprised me was the last phase where my unit tests stopped failing because of hosting problems, bugs in IPy2 or incompatibilities between the versions, and started failing because of bugs in the original code that surfaced because the "environment" had changed in subtle ways.

Here's one example. Have a look at the following code:
print list(set([1,0,3,2,4]))

The output is consistent in both versions, but different:
IPy 1.1.1: [1, 0, 3, 2, 4] # same as the input sequence's order
IPy 2.0b5: [0, 1, 2, 3, 4] # ordered

now, sets and dictionaries don't promise anything about order, so I didn't assume anything. I'll rephrase - I didn't think I assumed anything...

One place that was affected by this is our control decision logic. That module looks at what the current state is, compares it to how it wants things to look and performs actions to bring it closer to the desired state. It contains many smaller "checks" that look at specific parts of the state and are in charge of affecting specific actions.
Many checks are independent of each other, so it doesn't matter which one runs first. I basically keep them in a dictionary and iterate over it to perform all checks. Sometimes checks or actions do depend on one another, so I need to add code to synchronize them (e.g. don't perform this action until that other action is finished)

well, you can guess what happened - the change to IPy 2.0 ran my checks in a different order and exposed some hidden "race conditions".

There are other examples, but post is already longer than I intended. I'll just say that I learned the difference between __builtin__ and __builtins__

My conclusions from this are:

If it's not tested it doesn't work. And when you change the environment under which your code runs you need to retest.

The unit tests paid off again, since they allowed me to find many problems in places I didn't expect.

I need to do more to flush out these hidden assumptions. One way is to add a randomized test that runs over longer periods and plays with some variables. I could easily randomize the order in which I go over the checks, inject random errors (the system is supposed to be self healing) or delays in some strategic points, etc. The not-so-easy part is verifying the system behaved correctly, and being able to reproduce problems once they surface.

Would be really good if I had code coverage tools. Don't think there's anything available for IPy :-(

This happens every time I port non-trivial code. Need to stop being surprised :-)

Saturday, July 19, 2008

easier python evaluation from C#

I really like IronPython.

My part of our project is almost completely python, and it both uses and is used by C#.
Using C# from python is almost completely natural. On rare occasions you need to manually resolve a tricky method overload or something of the sort, but generally it's really a joy.

Allowing C# code to call the python code is not so much fun. Somewhere down the road it's supposed to be much better but for now my experience has been that you can do things, but in a lot of cases it's somewhat of a pain. I'm working with IPy 1.x, but from what I gather on mailing list IPy 2.0 hosting might be more abstract at the expense of simplicity.

Anyway, I just came up with a simple extention method for PythonEngine which I think makes things a bit nicer in many common cases. For example, you can write:

object x = engine.Eval("[2*i for i in range({0})",5) // returns [0,2,4,6,8]
object y = engine.Eval("{0}[-1]",x) // returns 8
engine.Eval("{0}.foo(name={1})",pyobj,s) // equivalent to pyobj.foo(name=s)

Here's the code (and here it is again in non-mangled form):

public static object Eval(this PythonEngine engine, string expression_format, params object[] args)
{
object[] arg_names = new object[args.Length];
Dictionary<string, object> locals = new Dictionary<string, object>();
for (int i = 0; i < args.Length; ++i)
{
object arg = args[i];
string arg_name = "_cs_arg_" + i; // names should be unique enough not to hide other python names
locals[arg_name] = arg;
arg_names[i] = arg_name;
}

string expression;
if (args.Length > 0)
{
expression = string.Format(expression_format, arg_names);
}
else
{
expression = expression_format;
}

return engine.Evaluate(expression, engine.DefaultModule, locals);
}

parting thoughts:
* I need to learn how to format code in these blogs
* I need to learn about Curt Hagenlocher's Coils.
* I need to learn about how hosting looks in IPy 2.0
* I really need to get some sleep :-(

Friday, June 27, 2008

client side soap timeout

I was getting timeouts working with VMware API. Most of their commands that can take a long time have an async interface, but some don't, and I was getting frequent timeouts for one of the new commands I started using.

Took me a while to find out the timeout was actually coming from my .net soap proxy and not from the VMware at all.

Seems .net proxies have a client side timeout which is set to 1:40 minutes by default (100,000 msecs). Changing it is just a matter of setting the Timeout property on the proxy.

I still wish VMware would provide an async API for all their long commands, but now at least the problem shifts to making things run concurrently instead of not failing.

just in case I'm not the last person to have to figure this out...

Monday, June 9, 2008

Making VMware API friendlier

I've had very little time in the last week, and am expecting it to stay that way for the next few weeks :-(. Still, wanted to get this one out.

For the last year I've been working with VMware's ESX and VirtualCenter servers. The core product is great. The API for using it is, well, not as great...

For example, here's what you need to do if you have a managed object reference (a handle) to a virtual machine and want to get its name:

def get_name(vm_moref):
    # Create a filter spec with object and property specs
    fs = VimApi.PropertyFilterSpec()

    # create a property spec (describes what we want retrieved)
    ps = VimApi.PropertySpec()
    ps.type = 'VirtualMachine'
    ps.all = False
    ps.pathSet = ('name',)
    fs.propSet = (ps,)
       
    # the search starting point
    os = VimApi.ObjectSpec()
    os.obj = vm_moref
    fs.objectSet = (os,)
               
    # run the query
    # (assumes you have service object and property collector moref)
    raw_res = service.RetrieveProperties(property_collector_moref,(fs,))
       
    # translate the result
    if raw_res is None:
        return None
    propSet = raw_res[0].propSet
    if propSet is None:
        return None           
    return propSet[0].val

yep, not so great :-(

In the .net parts of their 2.0 SDK they had a code generation tool written in XSLT that provided an object oriented wrapper over the basic API that was much more usable.
Unfortunately, with version 2.5 it seems to no longer be supported.

Luckily, it's pretty easy to write an equivalent wrapper in IronPython.
Here's an example usage:

from vimwrap import ServiceWrapper
svc = ServiceWrapper(url,user,pswd)
svc.login()

f = svc.searchIndex.FindByInventoryPath('.../my_folder')
print 'Folder: %s' % f
print 'Children:'
for child in f.childEntity:
    print '\t%s: %s' % (child.name,child)

sample output (names and places changed to protect the innocent):

Folder: Folder(group-v205)
Children:
    RonnieTest: Folder(group-v13059)
    Staging: Folder(group-v12431)
    Testing: Folder(group-v12432)
    MockWin2k3: VirtualMachine(vm-12697)
    Apache: VirtualMachine(vm-12603)
    ...

Here's another example - powering on a machine, and checking task progress:

vm =  svc.searchIndex.FindByInventoryPath('.../test_machine')
task = vm.PowerOnVM_Task(None)
ti = task.info # call VMware to get updated task info
print 'powering on... state=%s, progress=%s' % (ti.state,ti.progress)

The vimwrap module is about 150 lines of code. I put it on the ironpython cookbook.
The code works with API versions 2.0 and 2.5 (ESX 3.0 and 3.5 respectively) - just make sure you use the right version of the VimService2005 assembly (part of the SDK).
Let me know if you find this useful.

note of caution:
This is great for exploration and basic tasks. However, once you need to go over larger configurations, you will need to use methods like RetrieveProperties directly to get only the data you need and get all of it in one call. I suggest writing a wrapper for that API too - it's still much more complex and boilerplate than it needs to be.