October 2, 2009

Why you should bother!

I write this entry in response to Why Should I Bother?. The answer, in short, is that I find Python to be a great language for getting things done, and you shouldn't let stupid interviewers deter you from learning a language that allows you to get more done.

I think I'm a pretty tough interviewer, so I also describe the things I'd recommend that a Python coder knows before applying to a Java position, based on my own technical-interview tactics.

A spectrum of formality

As my friend pointed out to me long ago, many computer scientists don't care about effective programming methods, because they prefer theory. Understandably, we computer scientists qua programmers (AKA software engineers) find ourselves rent in twain.

As a computer science degree candidate, you are inevitably enamored with complex formalities, terminology, [*] and a robust knowledge of mathematical models that you'll use a few times per programming project (if you're lucky). Pragmatic programming during a course of study often takes a back seat to familiar computer science concepts and conformance to industry desires.

As a programmer, I enjoy Python because I find it minimizes boilerplate, maximizes my time thinking about the problem domain, and permits me to use whichever paradigm works best. I find that I write programs more quickly and spend less time working around language deficiencies. Importantly, the execution model fits in my brain.

Real Computer Scientists (TM) tend to love pure-functional programming languages because they fit into mathematical models nicely — founded on recursion, Curry-Howard isomorphism, and what have you — whereas Python is strongly imperative and, in its dynamism, lacks the same sort of formality.

Languages like Java sit somewhere in the middle. They're still strongly imperative (there are no higher-order functions in Java), but there are more formalities. As the most notable example, compile-time type checking eliminates the possibility of type errors, which gives some programmers a sense of safety. [†] Such languages still let scientists chew on some computer sciencey problems; for example, where values clash with the type system, like provably eliminating NullPointerExceptions, which is fun, but difficult!

As the cost of increased formality, this class of languages is more syntax-heavy and leans on design patterns to get some of the flexibility dynamic typing gives you up front.

It's debatable which category of languages is easiest to learn, but Java-like languages have footholds in the industry from historical C++ developer bases, Sun's successful marketing of Java off of C++, and the more recent successes of the C# .NET platform.

It makes sense that we're predominantly taught this category of languages in school: as a result, we can play the percentages and apply for most available developer jobs. Given that we have to learn it, you might as well do some throw-away programming in it now and again to keep yourself from forgetting everything; however, I'd recommend, as a programmer, that you save the fun projects for whichever language(s) that you find most intriguing.

I picture ease-and-rapidity of development-and-maintenance on a spectrum from low to high friction — other languages I've worked in fall somewhere on that spectrum as higher friction than Python. Though many computer scientists much smarter than I seem to conflate formality and safety, I'm fairly convinced I attain code completion and maintainability goals more readily with the imperative and flexible Python language. Plus, perhaps most importantly, I have fun.

My technical-interview protocol

The protocol I use to interview candidates is pretty simple. [‡]

Potential Java interview weaknesses

Interviewing a candidate whose background is primarily Python based for a generic Java developer position (as in Sayamindu's entry), I would immediately flag the following areas as potential weaknesses:

Primitive data types

A programmer can pretty much get away never knowing how a number works in Python, since you typically overflow to appropriately sized data types automatically.

The candidate needs to know what all the Java primitives are when the names are provided to them, and must be able to describe why you would choose to use one over another. Knowing pass-by-value versus pass-by-reference is a plus. In Python there is a somewhat similar distinction between mutable and immutable types — if they understand the subtleties of identifier binding, learning by-ref versus by-value will be a cinch. If they don't know either, I'll be worried.

Object oriented design

The candidate's Python background must not be entirely procedural, or they won't fare well in a Java environment (which forces object orientation). Additionally, this would indicate that they probably haven't done much design work: even if they're an object-orientation iconoclast, they have to know what they're rebelling against and why.

They need to know:

  • When polymorphism is appropriate.

  • What should be exposed from an encapsulation perspective.

  • What the benefits of interfaces are (in a statically typed, single-inheritance language).

Basically, if they don't know the fundamentals of object oriented design, I'll assume they've only ever written "scripts," by which I mean, "Small, unimportant code that glues the I/O of several real applications together." I don't use the term lightly.

Unit testing

If they've been writing real-world Python without a single unit test or doctest, they've been Doing it Wrong (TM).

unittest is purposefully modeled on xUnit. They may have to learn the new jUnit 4 decorator syntax when they start work, but they should be able to claim they've worked with a jUnit 3 -like API.

Abstract data structures

Python has tuples, lists and dictionaries — all polymorphic containers -- and they'll do everything that your programmer heart desires. [§] Some other languages don't have such nice abstractions.

It'd be awesome if they knew:

  • The difference between injective and bijective and how those terms are important to hash function design. If they can tell me this, I'll let them write my high-performance hash functions.

They must know (in ascending importance):

  • The difference between a HashMap and a TreeMap.

  • The difference between a vector and a linked list, or when one should preferred over the other. The names are unimportant — I'd clarify that a vector was a dynamically growing array.

  • The "difference" between a tree and a graph.

Turning attention to you, the reader: if you're lacking in data structures knowledge, I recommend you read a data structures book and actually implement the data structures. Then, take a few minutes to figure out where you'd actually use them in an application. They stick in your head fairly well once you've implemented them once.

Some interviewers will ask stupid questions like how to implement sorting algorithms. Again, just pick up a data structures book and implement them once, and you'll get the gist. Refresh yourself before the interview, because these are a silly favorite — very few people have to implement sorting functions anymore.

Design patterns

Design patterns serve several purposes:

  • They establish a common language for communicating proposed solutions to commonly found problems.

  • They prevent developers for inventing stupid solutions to a solved class of problems.

  • They contain a suite of workarounds for inflexibilities in statically typed languages.

I would want to assure myself that you had an appropriate knowledge of relevant design patterns. More important than the names: if I describe them to you, will you recognize them and their useful applications?

For example, have you ever used the observer pattern? Adapter pattern? Proxying? Facade? You almost certainly had to use all of those if you've done major design work in Python.

Background concepts

These are some things that I would feel extra good about if the candidate knew and could accurately describe how they relate to their Python analogs:

  • The importance of string builders (Python list joining idiom)

  • Basic idea of how I/O streams work (Python files under the hood)

  • Basic knowledge of typecasting (Python has implicit polymorphism)

Practical advice

Some (bad) interviewers just won't like you because you don't know their favorite language. If you're interviewing for a position that's likely to be Java oriented, find the easiest IDE out there and write an application in it for fun. Try porting a Python application you wrote and see how the concepts translate — that's often an eye-opener. Or katas!

If you find yourself unawares in an interview with these "language crusaders," there's nothing you can do but show that you have the capacity to learn their language in the few weeks vacation you have before you start. If it makes you feel better, keep a mental map from languages to number of jerks you've encountered — even normalizing by developer-base size the results can be surprising. ;-)

Footnotes

[*]

Frequently unncessary terminology, often trending towards hot enterprise jargon, since that's what nets the most jobs and grant money.

[†]

Dynamic typing proponents are quick to point out that this doesn't prevent flaws in reasoning, which are the more difficult class of errors, and that you'll end up writing tests for these anyway.

[‡]

Clearly candidates could exploit a vulnerability in my interview protocol: leave off things they know I'm likely to test that they know particularly well; however, I generally ask them to stop after I'm satisfied they know something. Plus, the less I know about their other weaknesses the more unsure I am about them, and thus the less likely I am to recommend them.

[§]

Though not necessarily in a performant way; i.e. note the existence of collections.deque and bisect. Knowing Python, I'd quiz the candidate to see if they knew of the performant datatypes.

Eliminating web service dependencies with a language-specific abstraction barrier

Hyperbolic analogy: Saying, "You shouldn't need to wrap the web service interface, because it already provides an API," is like saying, "You shouldn't need different programming languages, because they're all Turing complete."

Web services tend to deliver raw data payloads from a flat interface and thus lack the usability of native language APIs. Inevitably, when you program RPC-like interfaces for no language in particular, you incur incompatibilities for every particular language's best practices, idioms, and data models. [*] The issue of appropriately representing exceptions and/or error codes in RPC-like services is a notorious example of this.

There are additional specification mechanisms like WSDL [†] that allow us to make the payloads more object-like. Additional structure is indicated through the use of user-defined "complex types," but this only gets you part of the way to a usable API for any given language. In Python, it's a lot more sensible to perform an operation like in the following abstraction:

from internal_tracker.service import InternalTracker
bug_serivce = InternalTracker(username=getpass.getuser(),
    password=getpass.getpass())
bug = bug_service.get_bug(123456)
bug.actionable.add('Chris Leary') # may raise ReadOnlyException
comment = Comment(text='Adding self to actionable')
bug.arbs.add_comment(comment)
bug.save() # may raise instanceof ServiceWriteException

Than to use an external web service API solution directly (despite using the excellent Suds library):

# Boilerplate
client = suds.client.Client(wsdl=wsdl_uri)
security = suds.wsse.Security()
security.tokens.append(UsernameToken(getpass.getuser(),
    getpass.getpass()))
client.set_options(wsse=security)
internal_tracker_service = client.service

# Usage
service_bug = internal_tracker_service.GetBug(123456)
service_bug.actionable += ', Chris Leary'
# Do we check the response for all WebFault exceptions?
# (Do we check for and handle all the possible transport issues?)
internal_tracker_service.UpdateBug(service_bug)
Comment = internal_tracker_service.factory['Comment']
comment = Comment()
comment.BugId = service_bug.Id
comment.Text = 'Adding self to actionable'
# Again, what should we check?
internal_tracker_service.AddComment(comment)

Why is it good to have the layer of indirection?

Lemma 1: The former example actually reads like Python code. It raises problem-domain-relevant exceptions, uses keyword arguments appropriately, follows language naming conventions, and uses sensible language-specific data types that may be poorly represented in the web service. For example, actionable may be a big comma-delimited string according to the service, whereas it should clearly be modeled as a set of (unique) names, using Python's set data type. Another example is BigIntegers being poorly represented as strings in order to keep the API language-neutral.

Lemma 2: The layer represents an extremely maintainable abstraction barrier between the client and the backing service. Should a team using the abstraction decide it's prudent to switch to, say, Bugzilla, I would have no trouble writing a port for the backing service in which all client code would continue to work. Another example is a scenario in which we determine that the transport is unreliable for some reason, so decide all requests should be retried three times instead of one. [‡] How many places will I need to make changes? How many client code bases do I potentially need to keep track of?

Why is it risky to use the web service interface?

If the web service API represents the problem domain correctly with constructs that make sense for your language, it's fine to use directly. (As long as you're confident you won't have transport-layer issues.) If you're near-certain that the backing service will not change, and/or you're willing to risk all the client code that will depend on that API directly being instantaneously broken, it's fine. The trouble occurs when one of these is not the case.

Let's say that the backing service does change to Bugzilla. Chances are that hacking in adapter classes for the new service would be a horrible upgrade experience that entails:

  1. Repeated discovery of leaky abstractions,

  2. Greater propensity to bugs, [§] and

  3. More difficult maintenance going forward.

Client code that is tightly coupled to the service API would force a rewrite in order to avoid these issues.

Pragmatic Programming says to rely on reliable things, which is a rule that any reasonable person will agree with. [¶] The abstraction barrier is reliable in its loose coupling (direct modeling of the problem domain), whereas direct use of the web service API could force a reliance on quirky external service facts, perhaps deep into client code.

Is there room for compromise?

This is the point in the discussion where we think something along the lines of, "Well, I can just fix the quirky things with a bunch of shims between my code and the service itself." At that point, I contend, you're really just implementing a half-baked version of the language-specific API. It's better to make the abstractions appropriate for the target language and problem domain the first time around than by incrementally adding shims and hoping client code didn't use the underlying quirks before you got to them. Heck, if the web service is extremely well suited to your language, you'll end up proxying most of the time anyway, and the development will be relatively effortless. [#]

What about speed of deployment?

If we have language-specific APIs, won't there be additional delay waiting for it to update when additional capabilities are added to the backing service?

First of all, if the new capability is not within the problem domain of the library, it should be a separate API. This is the single responsibility principle applied to interfaces — you should be programming to an interface abstraction. Just because a backing service has a hodgepodge of responsibilities doesn't mean that our language-specific API should as well. In fact, it probably shouldn't. Let's assume it is in the problem domain.

If the functionality is sane and ready for use in the target language, it should be really simple for the library owner to extend the language-specific API. In fact, if you're using the proxy pattern, you may not have to do anything at all. Let's assume that the functionality is quirky and you're blocked waiting for the library owner to update with the language-specific shim, because it's non-trivial.

Now our solution tends to vary based on the language. Languages like Python have what's known as "gentlemen's privacy", based on the notion of a gentlemen's agreement. Privacy constraints are not enforced at compile-time and/or run-time, so you can just reach through the abstraction barrier if you believe you know what you're doing. Yes, you're making an informed decision to violate encapsulation. Cases like this are exactly when it comes in handy.

assert not hasattr(bug_service, 'super_new_method_we_need')
# HACK: Violate abstraction -- we need this new capability right now
# and Billy-Bo, the library owner, is swamped!
suds_client = bug_service._suds_client
result = suds_client.SuperNewMethodWeNeed()
target_result = de_quirkify(result)

As you can see, we end up implementing the method de_quirkify to de-quirk the quirky web service result into a more language-specific data model — it's bad form to make the code dependent on the web service's quirky output form. We then submit our code for this method to the library owner and suggest that they use it as a basis for their implementation, so that a) they can get it done faster, and b) we can seamlessly factor the hack out.

For privacy-enforcing languages, you would need to expose a public API for getting at the private service, then tell people not to use it unless they know what they're doing. As you can tell, you pretty much wind up with gentlemen's privacy on that interface, anyway.

Footnotes

[*]

In EE we call this kind of phenomenon an impedance mismatch, which results in power loss.

[†]

And many others, with few successful ones.

[‡]

Or maybe we want to switch from HTTP transport to SMTP. Yes, this is actually possible. :-)

[§]

In duplicating exact web service behaviors — you have to be backwards compatible with whichever behaviors the existing client code relies on.

[¶]

It's a syntactic tautology. The caveat is that reasonable people will almost certainly quibble over the classification of what's reliable.

[#]

At least in Python or other languages with a good degree of dynamism, it will.