Midnight Oil: 2008

Saturday, November 8, 2008

Checking Type Contract from IronPython

In our system we have an IronPython server, called by a C# client.
The way we expose python to the client is by declaring a C# interface, and implementing it in python.

The layer that implements that interface is actually a facade, so there's very little logic there, and I got away without writing tests for it for a long time... However, I just needed to modify it, so decided to do the right thing :-)

Most of the testing I did until now used fake objects, so the first step was to get a mock library. I downloaded Michael Foord's Mock library which looks really nice and does everything I need at this point.

One thing I wanted to test is that my class implements the C# interface correctly in terms of methods' signatures. Otherwise IronPython throws an exception when it tries to map the arguments to the python method, or when it tries to convert the return value. However, my tests are in python, and calls to the object from python don't go through the type checking/conversion parts.

To overcome this, I call the methods using .NET reflection, which mimics a C# call more accurately and goes through the type checking code. Here's the utility class that helps streamline this process:


class CallThroughCSharpInterface(DynamicProxy):
  def __init__(self, obj):
      DynamicProxy.__init__(self)
      self._obj = obj
      self._cs_type = clr.GetClrType(type(obj))

  def _dispatch(self,name,*a):
      mth = self._cs_type.GetMethod(name)
      try:
          return mth.Invoke(self._obj,a)
      except TargetInvocationException, exc:
          raise exc.InnerException

Now I can do something like:


py_server = MyServer()
server = CallThroughCSharpInterface(py_server)
server.foo(x,y,z)

And the call to foo is now checked for type problems.

The code above reuses a DynamicProxy class I already had. Here it is for completeness:


class DynamicProxy(object):
  def __getattr__(self,name):
          return DynamicProxy.CalledNameHelper(self,name)
   
  class CalledNameHelper(object):
      def __init__(self,proxy,method_name):
          self.proxy = proxy
          self.method_name = method_name
      def __call__(self,*a,**kw):
          return self.proxy._dispatch(self.method_name,*a,**kw)

This only handles the basic cases - only method calls (no fields / properties) and the interface doesn't contain any overloaded methods.
That's all I needed, so I could keep it simple :-)

Thursday, September 25, 2008

surprises while porting to IPy 2

It's amazing how many assumptions your code can gather when you're not looking...

I'm now in the middle of porting my code from IronPython 1.1.1 to version 2.0 (beta 5). So far it's been more work than I expected and less work than it would have been without the excellent support from the IPy team and the community.

Since the hosting API has changed completely, I expected most of the work to be around getting the C# hosting code working again. I then expected some bugs in the new version since it's still beta, but not too many, since it's already beta number 5 (next is supposed to be RC1).

So that's basically what happened, except there was one more thing - I expected my code itself to be basically correct.

What surprised me was the last phase where my unit tests stopped failing because of hosting problems, bugs in IPy2 or incompatibilities between the versions, and started failing because of bugs in the original code that surfaced because the "environment" had changed in subtle ways.

Here's one example. Have a look at the following code:
print list(set([1,0,3,2,4]))

The output is consistent in both versions, but different:
IPy 1.1.1: [1, 0, 3, 2, 4] # same as the input sequence's order
IPy 2.0b5: [0, 1, 2, 3, 4] # ordered

now, sets and dictionaries don't promise anything about order, so I didn't assume anything. I'll rephrase - I didn't think I assumed anything...

One place that was affected by this is our control decision logic. That module looks at what the current state is, compares it to how it wants things to look and performs actions to bring it closer to the desired state. It contains many smaller "checks" that look at specific parts of the state and are in charge of affecting specific actions.
Many checks are independent of each other, so it doesn't matter which one runs first. I basically keep them in a dictionary and iterate over it to perform all checks. Sometimes checks or actions do depend on one another, so I need to add code to synchronize them (e.g. don't perform this action until that other action is finished)

well, you can guess what happened - the change to IPy 2.0 ran my checks in a different order and exposed some hidden "race conditions".

There are other examples, but post is already longer than I intended. I'll just say that I learned the difference between __builtin__ and __builtins__

My conclusions from this are:

If it's not tested it doesn't work. And when you change the environment under which your code runs you need to retest.

The unit tests paid off again, since they allowed me to find many problems in places I didn't expect.

I need to do more to flush out these hidden assumptions. One way is to add a randomized test that runs over longer periods and plays with some variables. I could easily randomize the order in which I go over the checks, inject random errors (the system is supposed to be self healing) or delays in some strategic points, etc. The not-so-easy part is verifying the system behaved correctly, and being able to reproduce problems once they surface.

Would be really good if I had code coverage tools. Don't think there's anything available for IPy :-(

This happens every time I port non-trivial code. Need to stop being surprised :-)

Saturday, July 19, 2008

easier python evaluation from C#

I really like IronPython.

My part of our project is almost completely python, and it both uses and is used by C#.
Using C# from python is almost completely natural. On rare occasions you need to manually resolve a tricky method overload or something of the sort, but generally it's really a joy.

Allowing C# code to call the python code is not so much fun. Somewhere down the road it's supposed to be much better but for now my experience has been that you can do things, but in a lot of cases it's somewhat of a pain. I'm working with IPy 1.x, but from what I gather on mailing list IPy 2.0 hosting might be more abstract at the expense of simplicity.

Anyway, I just came up with a simple extention method for PythonEngine which I think makes things a bit nicer in many common cases. For example, you can write:

object x = engine.Eval("[2*i for i in range({0})",5) // returns [0,2,4,6,8]
object y = engine.Eval("{0}[-1]",x) // returns 8
engine.Eval("{0}.foo(name={1})",pyobj,s) // equivalent to pyobj.foo(name=s)

Here's the code (and here it is again in non-mangled form):

public static object Eval(this PythonEngine engine, string expression_format, params object[] args)
{
object[] arg_names = new object[args.Length];
Dictionary<string, object> locals = new Dictionary<string, object>();
for (int i = 0; i < args.Length; ++i)
{
object arg = args[i];
string arg_name = "_cs_arg_" + i; // names should be unique enough not to hide other python names
locals[arg_name] = arg;
arg_names[i] = arg_name;
}

string expression;
if (args.Length > 0)
{
expression = string.Format(expression_format, arg_names);
}
else
{
expression = expression_format;
}

return engine.Evaluate(expression, engine.DefaultModule, locals);
}

parting thoughts:
* I need to learn how to format code in these blogs
* I need to learn about Curt Hagenlocher's Coils.
* I need to learn about how hosting looks in IPy 2.0
* I really need to get some sleep :-(

Friday, June 27, 2008

client side soap timeout

I was getting timeouts working with VMware API. Most of their commands that can take a long time have an async interface, but some don't, and I was getting frequent timeouts for one of the new commands I started using.

Took me a while to find out the timeout was actually coming from my .net soap proxy and not from the VMware at all.

Seems .net proxies have a client side timeout which is set to 1:40 minutes by default (100,000 msecs). Changing it is just a matter of setting the Timeout property on the proxy.

I still wish VMware would provide an async API for all their long commands, but now at least the problem shifts to making things run concurrently instead of not failing.

just in case I'm not the last person to have to figure this out...

Monday, June 9, 2008

Making VMware API friendlier

I've had very little time in the last week, and am expecting it to stay that way for the next few weeks :-(. Still, wanted to get this one out.

For the last year I've been working with VMware's ESX and VirtualCenter servers. The core product is great. The API for using it is, well, not as great...

For example, here's what you need to do if you have a managed object reference (a handle) to a virtual machine and want to get its name:

def get_name(vm_moref):
    # Create a filter spec with object and property specs
    fs = VimApi.PropertyFilterSpec()

    # create a property spec (describes what we want retrieved)
    ps = VimApi.PropertySpec()
    ps.type = 'VirtualMachine'
    ps.all = False
    ps.pathSet = ('name',)
    fs.propSet = (ps,)
       
    # the search starting point
    os = VimApi.ObjectSpec()
    os.obj = vm_moref
    fs.objectSet = (os,)
               
    # run the query
    # (assumes you have service object and property collector moref)
    raw_res = service.RetrieveProperties(property_collector_moref,(fs,))
       
    # translate the result
    if raw_res is None:
        return None
    propSet = raw_res[0].propSet
    if propSet is None:
        return None           
    return propSet[0].val

yep, not so great :-(

In the .net parts of their 2.0 SDK they had a code generation tool written in XSLT that provided an object oriented wrapper over the basic API that was much more usable.
Unfortunately, with version 2.5 it seems to no longer be supported.

Luckily, it's pretty easy to write an equivalent wrapper in IronPython.
Here's an example usage:

from vimwrap import ServiceWrapper
svc = ServiceWrapper(url,user,pswd)
svc.login()

f = svc.searchIndex.FindByInventoryPath('.../my_folder')
print 'Folder: %s' % f
print 'Children:'
for child in f.childEntity:
    print '\t%s: %s' % (child.name,child)

sample output (names and places changed to protect the innocent):

Folder: Folder(group-v205)
Children:
    RonnieTest: Folder(group-v13059)
    Staging: Folder(group-v12431)
    Testing: Folder(group-v12432)
    MockWin2k3: VirtualMachine(vm-12697)
    Apache: VirtualMachine(vm-12603)
    ...

Here's another example - powering on a machine, and checking task progress:

vm =  svc.searchIndex.FindByInventoryPath('.../test_machine')
task = vm.PowerOnVM_Task(None)
ti = task.info # call VMware to get updated task info
print 'powering on... state=%s, progress=%s' % (ti.state,ti.progress)

The vimwrap module is about 150 lines of code. I put it on the ironpython cookbook.
The code works with API versions 2.0 and 2.5 (ESX 3.0 and 3.5 respectively) - just make sure you use the right version of the VimService2005 assembly (part of the SDK).
Let me know if you find this useful.

note of caution:
This is great for exploration and basic tasks. However, once you need to go over larger configurations, you will need to use methods like RetrieveProperties directly to get only the data you need and get all of it in one call. I suggest writing a wrapper for that API too - it's still much more complex and boilerplate than it needs to be.

Saturday, May 31, 2008

Combinations

Ok. Now that we got started, here's a short post about one of my favorite snippets of python.
Haven't seen this anywhere. Let me know if you have...

A lot of times you've got several groups and want to go over all combinations of items (for example, for testing all combinations of a set of parameters).

Doing this for two groups is a one liner, thanks to list comprehensions:

>>> A = [True,False]
>>> B = [1,2,3]
>>> [(a,b) for a in A for b in B]
[(True, 1), (True, 2), (True, 3), (False, 1),
(False, 2), (False, 3)]

This is a great start, but I want something that works for any number of groups. So we just need to apply this one liner as a building block iteratively until we're left with a single group, which is exactly what the built in function reduce does! well, almost...

>>> def combinations(*seqs):
...     def comb2(A,B):
...         return [(a,b) for a in A for b in B]
...     return reduce(comb2,seqs)
...
>>> A = [True,False]
>>> B = [1,2,3]
>>> C = ['yes','no']
>>> combinations(A,B,C)
[((True, 1), 'yes'), ((True, 1), 'no'), ...]

The problem is that the result is nested. Instead of getting (True,1,'yes'), we get ((True,1), 'yes').
The solution is to change the building block so it treats the two arguments differently. The second argument will still be a regular sequence of items. The first argument will now be a sequence of combination groups built so far.
Our building block now becomes:

//for each group and item, append the item to the group
def comb2(A,b):
return [a+[i] for a in A for i in b]

But now we need to handle start conditions, since we don't have any "group of groups" when we start. And this is the fun part - after a few iterations with the interactive shell, I ended up with this, which I think is quite cute:

>>> def combinations(*seqs):
...     def comb2(A,b):
...         return [a+[i] for a in A for i in b]
...     // prepend a sequence with a single empty group
...     return reduce(comb2,seqs,[[]])
...
>>> combinations(A,B,C)
[[True, 1, 'yes'], [True, 1, 'no'], ...]

And that's that.

At the time I was coding mainly in C++. Doing this in C++ is going to be much more work and end up being much less elegant. But what really blew me away at the time was this:
Suppose I wanted to handle cases where the number of combinations was very big. In that case generating them all up front could take up too much memory, and I'd just want to generate them on the fly as I iterate over them. You can do this in python by replacing the comb2 list comprehension with a generator expression:

def comb2(A,b):
return (a+[i] for a in A for i in b)

Now I can happily iterate over the returned generator and python manages the stack of nested generators, each with it's own state in the iteration.
Try that in C++! (you can do it, but it's going to hurt, which in reality means that if it's not very important for you, you won't do it).

I remember hearing GvR at PyCon 2006, when he talked about the evolution of python. One of the things he said was that it was a pretty lousy functional programming language. I haven't learned any good functional language yet (want to try haskell or F# sometime), and I trust he knows what he's talking about, but still, this beats the hell out of C++, Java or C# (although C# 3.0 now has some new features that could help. Would be interesting to see how easy it is to do this now).

On the same note, take a look at this, which is also neat, and comes in handy quite often.

and btw, if someone has a tip on how to format code sections for the blog, let me know :-(

Getting Started

About a year ago a coworker asked me whether I wanted to write a blog. I took a full 2 seconds to make sure I wasn't just reacting and just said - "no".

It was that clear. I am a private person by nature. Why would I want to talk to a bunch of people I don't even know? It made no sense. I mean, I could understand why some other people would want to, and even enjoyed reading some blog posts (most of which I got from friends by email), but me blogging was an idea that made no sense to me.

So what's this, then?
Well, to be honest, I'm not 100% sure myself :-)

One thing that happened is that for the last year I finally got to work in Python (IronPython to be exact). Before that I was mainly writing in C++ and in the last few years also dabbling with python for side projects. I think the effect of the community is much more important in the python ecosystem than it is in C++. I've got some thoughts on why this might be so, but that's not important for this post. Maybe it's not even true, but the fact is that I've started reading some blogs, subscribing to the ironpython mailing list, and in general I got a lot of help and ideas from other people's work.
And when I had some cool stuff I'd written, or some insight, I sometimes wanted to be able to share it.

Luckily I also started working with a friend that blogs. I really learned a lot from watching and talking to him. In this respect, I learned that not every blog has to have a capital B. I could pay something forward by sharing the little things I do. And sometimes someone will google them and it will help him. Cool!

He also sent me this post which I really liked. Well, he forgot to mention "I have no time", but apart from that it's spot on. Time still is a real limitation. Working at a startup and having two (cute as hell) little kids at home doesn't leave a lot of spare time. So blogging will have to compete for the time slot before I go to sleep (hence the blog's name). But that's easy - if I have time and something to say, I'll write. If not, I don't have to.

The last reason to write is because I don't feel comfortable with it. Being a perfectionist I'm worried that I might say something silly, or trivial, or maybe just that no one will care.
All these things will probably happen, but as a father I keep trying to teach my kids that making mistakes is ok. If you're not making mistakes you're obviously not trying things that are hard for you, so you're not learning as quickly as you could. Damn those Genes! the least I can do is set an example by getting better at making mistakes :-)

So now we're done with static friction and this meta-post, I'll try and write a short programming one soon.

Midnight Oil