Eliminating web service dependencies with a language-specific abstraction barrier
Hyperbolic analogy: Saying, "You shouldn't need to wrap the web service interface, because it already provides an API," is like saying, "You shouldn't need different programming languages, because they're all Turing complete."
Web services tend to deliver raw data payloads from a flat interface and thus lack the usability of native language APIs. Inevitably, when you program RPC-like interfaces for no language in particular, you incur incompatibilities for every particular language's best practices, idioms, and data models. [*] The issue of appropriately representing exceptions and/or error codes in RPC-like services is a notorious example of this.
There are additional specification mechanisms like WSDL [†] that allow us to make the payloads more object-like. Additional structure is indicated through the use of user-defined "complex types," but this only gets you part of the way to a usable API for any given language. In Python, it's a lot more sensible to perform an operation like in the following abstraction:
from internal_tracker.service import InternalTracker
bug_serivce = InternalTracker(username=getpass.getuser(),
bug = bug_service.get_bug(123456)
bug.actionable.add('Chris Leary') # may raise ReadOnlyException
comment = Comment(text='Adding self to actionable')
bug.save() # may raise instanceof ServiceWriteException
Than to use an external web service API solution directly (despite using the excellent Suds library):
client = suds.client.Client(wsdl=wsdl_uri)
security = suds.wsse.Security()
internal_tracker_service = client.service
service_bug = internal_tracker_service.GetBug(123456)
service_bug.actionable += ', Chris Leary'
# Do we check the response for all WebFault exceptions?
# (Do we check for and handle all the possible transport issues?)
Comment = internal_tracker_service.factory['Comment']
comment = Comment()
comment.BugId = service_bug.Id
comment.Text = 'Adding self to actionable'
# Again, what should we check?
Why is it good to have the layer of indirection?
Lemma 1: The former example actually reads like Python code. It raises problem-domain-relevant exceptions, uses keyword arguments appropriately, follows language naming conventions, and uses sensible language-specific data types that may be poorly represented in the web service. For example, actionable may be a big comma-delimited string according to the service, whereas it should clearly be modeled as a set of (unique) names, using Python's set data type. Another example is BigIntegers being poorly represented as strings in order to keep the API language-neutral.
Lemma 2: The layer represents an extremely maintainable abstraction barrier between the client and the backing service. Should a team using the abstraction decide it's prudent to switch to, say, Bugzilla, I would have no trouble writing a port for the backing service in which all client code would continue to work. Another example is a scenario in which we determine that the transport is unreliable for some reason, so decide all requests should be retried three times instead of one. [‡] How many places will I need to make changes? How many client code bases do I potentially need to keep track of?
Why is it risky to use the web service interface?
If the web service API represents the problem domain correctly with constructs that make sense for your language, it's fine to use directly. (As long as you're confident you won't have transport-layer issues.) If you're near-certain that the backing service will not change, and/or you're willing to risk all the client code that will depend on that API directly being instantaneously broken, it's fine. The trouble occurs when one of these is not the case.
Let's say that the backing service does change to Bugzilla. Chances are that hacking in adapter classes for the new service would be a horrible upgrade experience that entails:
Repeated discovery of leaky abstractions,
Greater propensity to bugs, [§] and
More difficult maintenance going forward.
Client code that is tightly coupled to the service API would force a rewrite in order to avoid these issues.
Pragmatic Programming says to rely on reliable things, which is a rule that any reasonable person will agree with. [¶] The abstraction barrier is reliable in its loose coupling (direct modeling of the problem domain), whereas direct use of the web service API could force a reliance on quirky external service facts, perhaps deep into client code.
Is there room for compromise?
This is the point in the discussion where we think something along the lines of, "Well, I can just fix the quirky things with a bunch of shims between my code and the service itself." At that point, I contend, you're really just implementing a half-baked version of the language-specific API. It's better to make the abstractions appropriate for the target language and problem domain the first time around than by incrementally adding shims and hoping client code didn't use the underlying quirks before you got to them. Heck, if the web service is extremely well suited to your language, you'll end up proxying most of the time anyway, and the development will be relatively effortless. [#]
What about speed of deployment?
If we have language-specific APIs, won't there be additional delay waiting for it to update when additional capabilities are added to the backing service?
First of all, if the new capability is not within the problem domain of the library, it should be a separate API. This is the single responsibility principle applied to interfaces — you should be programming to an interface abstraction. Just because a backing service has a hodgepodge of responsibilities doesn't mean that our language-specific API should as well. In fact, it probably shouldn't. Let's assume it is in the problem domain.
If the functionality is sane and ready for use in the target language, it should be really simple for the library owner to extend the language-specific API. In fact, if you're using the proxy pattern, you may not have to do anything at all. Let's assume that the functionality is quirky and you're blocked waiting for the library owner to update with the language-specific shim, because it's non-trivial.
Now our solution tends to vary based on the language. Languages like Python have what's known as "gentlemen's privacy", based on the notion of a gentlemen's agreement. Privacy constraints are not enforced at compile-time and/or run-time, so you can just reach through the abstraction barrier if you believe you know what you're doing. Yes, you're making an informed decision to violate encapsulation. Cases like this are exactly when it comes in handy.
assert not hasattr(bug_service, 'super_new_method_we_need')
# HACK: Violate abstraction -- we need this new capability right now
# and Billy-Bo, the library owner, is swamped!
suds_client = bug_service._suds_client
result = suds_client.SuperNewMethodWeNeed()
target_result = de_quirkify(result)
As you can see, we end up implementing the method de_quirkify to de-quirk the quirky web service result into a more language-specific data model — it's bad form to make the code dependent on the web service's quirky output form. We then submit our code for this method to the library owner and suggest that they use it as a basis for their implementation, so that a) they can get it done faster, and b) we can seamlessly factor the hack out.
For privacy-enforcing languages, you would need to expose a public API for getting at the private service, then tell people not to use it unless they know what they're doing. As you can tell, you pretty much wind up with gentlemen's privacy on that interface, anyway.
Fugly default avatars: a force for good?
One of the first things I observed in picking up a Twitter account (read: crack pipe of the Internet) was how fugly the default avatars are.
Daunting in its ugliness!
Needless to say, I was immediately repulsed by such an inaccurate representation of my facial features. Although my eyes do often reside in those exact proportions, I have a) more than one lip, b) pupils, and c) I tend to wear a hat. Following this line of reasoning, I had two choices:
Desire of all the avatar ladies.
After the makeover, it was a tough call. A toss up, really, but I decided to go with the slightly more handsome fellow on the right.
After I replaced the default avatar, I stopped for a moment to reflect on the experience. As I stared into the light blue ASCII characters that pass for a face these days, I realized that I had been tricked in a fairly brilliant way...
Fugly default avatars give users initial encouragement to get involved. I didn't want to be represented by an entity with only one lip that clashed with the color scheme of the site, so I was tricked into putting some work into it. In contrast, the Gravatar default avatar is far too pretty — I could definitely live with an avatar that looks like this without feeling inclined to change it:
Belle of the ball
After I spent time working on my "Twitter identity", I felt more invested in the account. Paying out some "sweat equity" tends to create a psychological attachment. You don't want your work to be for naught; hence, the time and effort spent personalizing an account makes it more meaningful to you.
Additionally, with your (real) face plastered on every update, you can't help but feel a sense of responsibility in what you create. All of a sudden, I had a reason to care about the quality of my user-generated content on the site! (People who use less personal avatars will obviously be less affected by this phenomenon.) Of course, there may exist people who like having other users read their content, look at their avatar, and intrinsically register, "Ah, so that's what a douchebag looks like!" but I am not one of them.
At this point you might be saying, "You're running out of ideas, man! That point you just made is a specific instance of the more general idea that an increased sense of identity on the web breeds an increased sense of responsibility." I agree — that may be the case; however, it may also be a totally independent function of image. Case in point: I don't want people to associate my personal image with douchebaggery, regardless of whether or not they know exactly who I am.
Thoughts on Stack Overflow
This is a short article detailing my thoughts on the recently released
programming Q&A site, Stack Overflow.
Historically, I've had three resources for programming questions:
Over the course of 80 days, I've found Stack Overflow to be a better resource
than all three of the above, even when combined.
The way I see it, Stack Overflow (hereby referred to as SO) is going strong for
two fundamental reasons:
SO baited the right community with the appropriate timing
SO uses tags
If you think about it, there's nothing about SO that ties it to programming
questions, aside from the constitution. (On SO, the constitution is the site
FAQ). All in all, SO is a Q&A framework. If it's just a Q&A framework, how
did SO manage to stay on topic and under control from its inception? They took
the right members in with the right timing.
The beta test population was roughly given by the following:
- Podcast listeners
(Jeff's readers UNION Joel's readers) - the relatively uninterested -
- Beta testers
(Podcast listeners INTERSECT people that cared enough about the
programming Q&A site to find an obscure signup form) - more attrition
Joel Spolsky and Jeff Atwood are both well-known in the blogosphere among
readers interested in improving their programming skills and doing software the
Right Way. Beginning with their reader base (already amenable to their cause)
there were two significant levels of filtration, as reflected in the above
pseudo-formulae, that ensured that SO started off with a group of people who a)
cared and b) had a significant body of knowledge with respect to programming
and good software practices. This is just the kind of constituency that you
want to impart some positive momentum on a fledgling Q&A site.
The private beta provided an adequate growth period so that, at release, there
were enough core members with a solid conception of the constitution that
they helped to create. Additionally, the core was able to uphold the tenets of
the constitution with power from the reputation that they built. (If there were
a third reason for the site's success, it would be how empowered the high-rep
members are to uphold the constitution.)
If SO had been released to the public in a Hollywood Launch, without the
beta momentum they had, I believe it would have failed. The framework is not
programming-specific — the community is.
The site happens to be particularly well designed for programming questions in
its tag-centric model. SO is a big pipe for programming questions with an
unlimited number of virtual channels, each of which is denoted by a tag. With
recently added capabilities to ignore or flag particular virtual channels, you
(subtractively) take only the content that you want from the big pipe and
prioritize the results. Exactly how nice this capability is will come to light
when comparing SO to the other programming question outlets.
The tag model is also particularly well suited to the structure of knowledge in
the programming domain, where the interests of individual constituents have a
strong tendency to straddle several subdomains. Anecdotally, this is especially
true for those who really care about their craft: the best programmers tend to
have a great deal of depth to their knowledge, which inevitably ends up
overlapping with other areas of interest. For example, most of today's great
programmers use version control systems, convey information effectively through
documentation, and recognize/employ design patterns. Many great programmers
also understand than one programming paradigm and program in more than one
language. When you mix a number of these programmers together, you get some
really strong sauce. The tag system allows these members to cut out the noise
and exchange information in their subdomains of expertise.
Community again: noobs
Don't take my subliminal messaging the wrong way: the noobs help. As they say,
everybody starts out as a noob. It's clear that noobs pave the way for many
others to follow by asking their noobish questions — that's rarely disputed.
The really interesting thing is that noobs can provide a more brute force
approach to answering questions correctly.
So long as the noobs are semi-informed, they're probably on SO because they're
trying to learn about a topic of interest. Active learning processes are
accompanied by reading and revisiting things that the more seasoned veterans
haven't cared to think about in a long time. Noobs, with references fresh in
their mind, can offer up suggestions or quotations (which they may or may not
fully understand) while the rest of the members determine whether or not their
information is helpful via votes and comments. Even if the noob's proposed
answer is somehow incorrect, other members will learn exactly why. If other
members thought the noob's answer was feasible as well, they'll be informed and
corrected by seeing the dialog. This isn't something you get on an
experts-only-answer site: interpolation of the truth through the correction of
There is, of course, the potential for Noobs of Mass Destruction (NMDs?) a la
the Eternal September. If noobs outweigh the properly knowledgeable
constituency so heavily that misconceptions are voted up far more rapidly than
proper solutions, the site will suffer from a misinformation-shock. This
misinformation may be corrected over time, but aside from Accepted Answers it's
difficult to jump a correct answer to the top over highly up-voted incorrect
answers. You need a critical mass of users that know what they're talking about
to tip the scales with their votes and their arguments.
Lucky for us members, this didn't happen at public release. Even more lucky for
the world of programmers, the success of the site and lack of an Eternal
September-like phenomenon on SO will lead to more informed programmers from
here forward, further reducing the chance for SO's quality to deteriorate.
Really, it was just the initial gamble of going public and, as I mentioned
before, SO got the timing right.
Community scaling through tagging
One of my favorite parts of all this is that tags allow the community scale
beautifully. If SO gains a thousand new C# programmers as members, does that
hurt, say, the Python programmers? No: because of tags, more members can only
mean a better site. "Stack Overflow is biased towards C#" is not a self
fulfilling prophesy. I'll explain why:
For argument's sake, let's say these are C# robots who only understand ways to
use C# syntax to do what you want (i.e. "You can use regions to ease generated
code injection. Beep."). If I'm a Python programmer who doesn't care about C#,
I'm ignoring the tag anyway and don't get inundated with noise from the robots.
Inevitably, our hypothetical is incorrect and the C# programmers will all have
knowledge which crosses into other subdomains. In the (slightly more
realistic) case that the members are human beings who know C# along with some
generic principles of programming and software design, they can only
assist me in my cross-domain problems.
More C# programmers can only help the Python programmers. For all X and Y, more
people interested in X can only help people interested in Y, so long as
everybody tags everything appropriately. Except Lisp.
Comparison to other outlets
How does SO stack up against the alternatives? The primary differentiation
comes in a few identifiable areas:
SO is structured. The framework provides incentives for helping others,
an intentionally limiting structure for threads of conversation, and
visibility of unanswered questions/questions of interest.
SO is asynchronous. Information is aggregated as members have time to ask
or answer questions, with a clear record of which questions have been asked
and what answers have been provided.
SO is centralized. The site is a one-stop-shop for all programming
questions, broken into virtual channels via tags. SO consolidates programming
question information across subdomains. User visibility spans subdomains. There
is a centralized search capability that allows you to restrict by virtual
SO is persistent. All questions and answers are permanently stored and
indexed for future reference.
Folks on IRC, Usenet, or your buddy list have no real incentive to help you
beyond the goodness of their hearts. I'm a starry-eyed idealist and I'm happy
that this has worked historically, but it's readily apparent that people love
playing for points. SO is one of those purely healthy forms of competition
where everybody seems to win; from what I've seen, RTFM and "Google is Your
Friend" trolls are consistently down-voted! The reputation system also appears
to increase the responsiveness of the site — everybody is looking for the
quick "Accepted Answer" grab if they can get it. I had figured that people
would try to game the system, but it seems like most people with reputation
have been sane, and the people with little reputation have their teeth pulled
appropriately. Kudos to the karma system.
What differentiates SO from a big bulletin board is the three-tier threads. You
have question (Original Poster), answer (many answers to one question), and
comments (many comments to one answer). @replies allow for infinite "virtual"
threading, but there's a clear indication of how the conversation is supposed
to take place through the structure of the site. My experience with this format
has led me to believe that it's ideal for removing noise from the answer tier
(via short comments), without letting the meta-conversation get too crazy.
Threading on Usenet allows you to explore related topics of conversation with
less friction, but it can be a big problem when you just want to know the
answer that the Original Poster (OP) finally accepted. You often see such
sub-conversations on Usenet get turned into new threads, while SO asks that you
form the new thread pre-emptively as a new question. I have no problem with
SO's approach, given the benefits of the three tiered conversation and the more
precise indexing capabilities that result from structured threads.
Visibility of questions and answers is a big problem on IRC: there's a distinct
fire-and-be-forgotten phenomenon in most channels, proportional to their noise
level. Additionally, there's usually a few super gurus in each channel that can
only handle one or two problems at a time, leading to,
[impatient-43/4] Can anybody answer my question^^!?!?
messages ad nauseum.
Usenet does better than IRC in terms of question visibility because it's an
asynchronous medium. IRC's synchronous format makes help a lot more
interactive, but at great cost. In addition to the fire-and-be-forgotten
phenomenon, you inevitably juggle O(n) synchronous channels
simultaneously, where n is the number of topics you're interested in.
Also, remember that chat is exactly that: you're going to get unwanted noise.
Other people's Q&As, off topic conversation, and sometimes spammers all
interfere with your ability to communicate a problem and get an answer in real
time. If you've ever tried reading an IRC log to determine the answer to your
question, you probably understand this principle — once you mix anonymized
handles in with a many-to-many conversation, you give up quickly.
The asynchronous model fits into everybody's day more nicely and scales much
better. I haven't yet seen a question on SO where I said to myself, "This Q&A
could have benefited greatly from an increased level of synchronous
interaction." (Yeah, that's really how I talk to myself. Wanna fight about it?)
As I mentioned, the big pipe is a beautiful thing. Some nice corollaries are:
A single, unifying constitution that prevents you from taking flak (from
trolls or disgruntled people) and remembering various sets of rules.
You avoid redundant work because it's easy to point somewhere else in the
same big pipe.
Subscribing to topics you care about and are knowledgeable in scales more
easily across topics.
You end up with more eyes per channel because because it's easy to subscribe
One could argue that IRC's Freenode is similar in the virtual channel respect,
but logging is certainly not centralized, and listening to many virtual
channels simultaneously quickly converges to impossible. Unlike SO's multi-tag
view, asking a question in one IRC channel is unlikely to get the attention of
people who reside in other channels.
Newsgroups are all-over-the-place decentralized. It's definitely a web 1.0
technology. There's a bunch of services that consolidate information for
newsgroups of interest (Google Groups, gmane), but due to the information being
replicated all over the web, the page rank for a given Q&A will tend to be
weaker as it's divided across the resources and components of the thread.
Newsgroups don't tend to play together as nicely as SO tags — it's easy to see
how a question like, "What's monkeypatching?" could be asked on
comp.lang.python, comp.lang.ruby, and so on, without ever being referred to
On SO, if you tag things properly, information naturally crosses virtual
channels and is well indexed for search.
IRC channels tend to get inundated with the same questions over and over, so
they make an FAQ to persist a subset of the information that's routinely
provided in the channel. Taken to its rational extreme, you could persist all
the Q&A information in such a manner, in which case you'd have SO.
Some IRC channels get logged, but I rarely care where the logs are — there's
little hope of you finding the answer from the log (as previously discussed).
It's also unlikely that the page rank of any given log will be significant. In
my IRC experience, you keep your own chat logs if you really care to find the
conversations later on. In any case, this is much less elegant than SO's
centralized and indexed persistence capabilities.
As I mentioned before, newsgroups have persistence, but it's not well
centralized or indexed. Persistence is a moot point if you can't find what
you're looking for.
Since I'm out of a job as a karma system and NMD doomsayer, I've got to talk
about the potential for secondary Armageddon-like effects.
SO doesn't have a significant enough differentiation from refactormycode.
Its mission is well differentiated, but it seems like the permitted content
on SO is a superset of what can be found on refactormycode. I would consider
this kind of Q&A noisy, but it certainly follows the same general format. It's
possible the authors are cool with SO engulfing a lot of refactormycode
material, but in that case I hope we get some better large code block
support. If SO doesn't want it, it should be in the constitution.
I'm concerned about question staleness. Over time we'll see how venerable the
Q&As are, but my immediate concern is the plot of views over time: is the
drop off in number of views over time for a given question so significant that
the return rate cannot overcome initial misconceptions? If misconceptions are
introduced later, will users still be watching the thread? There's no "watch
this thread" capability in SO for push notification, so to some extent the
system expects you to check back at regular intervals to monitor activity on
threads. This may be an unrealistic assumption. To be fair, the constitution
explicitly states you may re-ask a question if you acknowledge that the other
exists, which may prevent this from being such a big deal.
I'm curious as to how the number of non-programming, technical questions has
trended over time. Potential problems in this area are alleviated by the
constitution and the fact that sufficiently reputable members can close threads,
but it's easy to see how there will be an inevitable flow of system
administrative questions due to how knowledgeable the constituency is. If the
site didn't have such good safeguards, it would easily swallow a whole lot of
other Q&A domains that are indirectly programming related.
Thoughts on blogging quantity vs quality
I'm very hesitant to post things to my real blog. [*] I often have complex ideas that I want to convey via blog entries, and the complexity mandates that I perform a certain level of research before making any real claim. As a result, I'm constantly facing a quantity vs. quality problem. My queue of things to post is ever-increasing and the research/writing process is agonizingly slow. [†]
Just saying "screw it" and posting whatever un-validated crap spills out of my head-holes seems promising, but irresponsible. The question that I really have to ask myself is whether or not the world would be a better place if I posted more frequently, given that it is at the cost of some accuracy and/or completeness.
I'm starting to think that the best solution is to relate a daily occurrence  to a larger scheme of things. Trying to piece together some personal mysteries and detailing my thought processes may be both cathartic and productive — in the literal sense of productive.
A preliminary idea is to prefix the titles of these entries with "Thoughts on [topic]" and prefix more definitive and researched articles with a simple "On [topic]". Readers may take what I'm saying more at face value if I explicitly announce that my post relates to notions and not more concrete theories. [‡]
From Blogger to Wordpress
I've decided to move my blog from my Blogger cdleary.blogger.com account to a Wordpress install on my personal blog.cdleary.com domain. Once I took a gander at all the new features and capabilities of Wordpress, the choice wasn't very difficult.
Issues with Blogger
Posting mixed text and code over the course of my blogging history presented interesting problems. From what I could tell, each blog on Blogger seems to have a global "interpret newline as <br />" setting, which prevented me from switching styles (to use <br /> explicitly) without editing all of my previous posts. When you mix this with the fact I was using Vim's "generate highlighted syntax as HTML" feature in lieu of searching for a proper way to post source code in Blogger (which I found far too late in the game :), the GeSHi Syntax Highlighter plugin for Wordpress was looking mighty fine. I haven't pinpointed any exact reasons, but the line breaks and HTML equivalents feel a lot more natural in Wordpress than they did in Blogger.
The blogger backlinks ("Links to this post") capability wasn't cutting it for me. Due to either to the infrequency of my posting, the irrelevance of my posts, or my (heretofore) unwillingness to advertise my blog, I couldn't find backlinks via the Blogger service that I knew to exist. The ping system that Wordpress employs seems a lot more enabling for a low-profile blogger like myself. It's possible that my previous blog never received a ping and that Blogger actually has this feature as well; however, I knew of a few other blogs that linked to my Blogger blog that didn't show up (comments were disabled — maybe that was a problem?).
Easy Feed Migration, Evil URI Migration
Thanks to FeedBurner decoupling my feed URI from my blog URI, the feed migration process was easy as pie. It makes me think that everybody should use FeedBurner, if only for an extra level of indirection between the blog hosting and the RSS pollers.
I was evil, however, and totally dropped my old URIs. My Blogger blog wasn't very highly read or recognized, so I figured rather than go through some painful URI redirection process via <meta> tag manipulation in the blogger template, I'd just delete my old blog. Slightly evil, but significantly productive. I'll just cross my fingers and hope that the people who cared were subscribed to my RSS feed as well. :/
Trying out Comments
I did not enable comments on my Blogger account. I theoretically don't like blog comments — they provide inadequate space for a conversation and proper synthesis of ideas surrounding a conversation. I'm fairly convinced that commenting systems are flawed in low-traffic blogs like my own, and that blog-entry-to-blog-entry responses are much more maintainable, scalable, and helpful for bloggers without thousands of readers; however, I'm willing to give comments another short test period before turning them off.
Categories and Tags?
One of the most foreign things in the Wordpress installation is the category-tag-duality. It seems that these two things are distinct, as explained in the Wordpress Glossary:
Think of it like a Category, but smaller in scope. A post may have several tags, many of which relate to it only peripherally.
For the time being, I've only promoted a few of the most-used labels from tags to categories, which I figure I'll continue to do once the tags cross some arbitrary threshold of posts. It sounds kind of neat to have two tiers of categorization — you can go a little wild in the lower tier while keeping the upper tier simple and clean.