December 21, 2008

Thoughts on Wall-E

I just saw Wall-E, and I really enjoyed it. It made me think about a few things that I find interesting enough to blog about.

It's really easy to strongly empathize with a character that wants something very simple, attainable, and appropriate — especially when it will bring them a great amount of happiness. In Wall-E's case it was the hand-holding. "Why not!?" you say to yourself, as the writers bait you over and over again. This is a fundamental undertone in oh so many romantic comedies (like The Office and Scrubs). It's also used as a tactic in more drama-oriented partner-based content, though I can't think of any good examples off the top of my head.

You're roped in because of the sheer simplicity of the greatest possible outcome. It could happen at any moment, so you can't miss a beat and have to keep watching if there's more available. In the more shameless writing, the character gets what they want, only to have it ruined by some trivial little thing. Then the cycle can start all over again, since you're obviously appalled that a trivial little thing could ruin something which brings the character such great happiness.

The woman-like robot was the more capable and powerful one. I'm a fan of traditional role reversal.

Negative Utilitarianism

The humans from the movie bring a bunch of interesting points to light. They live in a world that offers the epitome of luxury and narcissism. It's clear in the move that a byproduct of this lifestyle is a lack of human connection. All in all, the situation brings up an important point about what the "end goal" of human development might be.

Were the end goal a utility maximization, as "positive" utilitarians hold, the indefinite life of extreme luxury would be a feasible outcome. From the way I empathized with the characters in the movie, it's fairly clear to me that this is not reflective of human potential, and should actually be construed as a tragic loss.

The awesomeness of humanity may very well be in the constant struggle — the actualization of untapped potential. Though the life of extreme luxury could offer more happiness, there seems to be something inherently meaningless in the human race sitting on its butt being happy. There are other philosophical works that indirectly support this theory, such as Marx's theory of alienation of the working man and Kierkegaard's idea that one's life was typically devoted to some project which was intended to give the life meaning.

I think that this is an interesting counterpoint for positive utilitarianism as well as an argument for negative utilitarianism. Negative utilitarianism (vacuously) contends that, beyond alleviation of suffering, that which brings us the most utils is not necessarily the best way to go. This seems wholly consistent with the idea that, after we've gotten humanity to a point of civility, how to proceed is relatively unknown. Maximization of potential is not obviously consistent with maximization of utility, but somehow, in a common-sense kind of way, seems right.

One could argue that the positive utilitarian argument still holds, under the interpretation that human actualization in the long run leads to greater utility than local maximization (luxury in the now), but I dispute that we would have the capability to determine that at any given time were positive utilitarianism our guiding philosophy. Perhaps it would be formally correct, but I don't care much if we would fail to interpret it correctly. Negative utilitarianism seems a more appropriate philosophy for the now, certainly until we alleviate all the suffering — I'd be happy to figure it out from there. ;-)

(Note: I didn't make an attempt to be philosophically rigorous in the above discussion, so don't sweat the small stuff if you're going to be critical — I just intended to get an idea out there. If you'd like me to make an attempt to be rigorous and/or precise, just LMK and I'll write another entry.)

Thoughts on Stack Overflow

This is a short article detailing my thoughts on the recently released programming Q&A site, Stack Overflow.

Background

Historically, I've had three resources for programming questions:

Over the course of 80 days, I've found Stack Overflow to be a better resource than all three of the above, even when combined.

The way I see it, Stack Overflow (hereby referred to as SO) is going strong for two fundamental reasons:

  1. SO baited the right community with the appropriate timing

  2. SO uses tags

Community

If you think about it, there's nothing about SO that ties it to programming questions, aside from the constitution. (On SO, the constitution is the site FAQ). All in all, SO is a Q&A framework. If it's just a Q&A framework, how did SO manage to stay on topic and under control from its inception? They took the right members in with the right timing.

The beta test population was roughly given by the following:

Podcast listeners

(Jeff's readers UNION Joel's readers) - the relatively uninterested - attrition

Beta testers

(Podcast listeners INTERSECT people that cared enough about the programming Q&A site to find an obscure signup form) - more attrition

Joel Spolsky and Jeff Atwood are both well-known in the blogosphere among readers interested in improving their programming skills and doing software the Right Way. Beginning with their reader base (already amenable to their cause) there were two significant levels of filtration, as reflected in the above pseudo-formulae, that ensured that SO started off with a group of people who a) cared and b) had a significant body of knowledge with respect to programming and good software practices. This is just the kind of constituency that you want to impart some positive momentum on a fledgling Q&A site.

The private beta provided an adequate growth period so that, at release, there were enough core members with a solid conception of the constitution that they helped to create. Additionally, the core was able to uphold the tenets of the constitution with power from the reputation that they built. (If there were a third reason for the site's success, it would be how empowered the high-rep members are to uphold the constitution.)

If SO had been released to the public in a Hollywood Launch, without the beta momentum they had, I believe it would have failed. The framework is not programming-specific — the community is.

Tagging

The site happens to be particularly well designed for programming questions in its tag-centric model. SO is a big pipe for programming questions with an unlimited number of virtual channels, each of which is denoted by a tag. With recently added capabilities to ignore or flag particular virtual channels, you (subtractively) take only the content that you want from the big pipe and prioritize the results. Exactly how nice this capability is will come to light when comparing SO to the other programming question outlets.

The tag model is also particularly well suited to the structure of knowledge in the programming domain, where the interests of individual constituents have a strong tendency to straddle several subdomains. Anecdotally, this is especially true for those who really care about their craft: the best programmers tend to have a great deal of depth to their knowledge, which inevitably ends up overlapping with other areas of interest. For example, most of today's great programmers use version control systems, convey information effectively through documentation, and recognize/employ design patterns. Many great programmers also understand than one programming paradigm and program in more than one language. When you mix a number of these programmers together, you get some really strong sauce. The tag system allows these members to cut out the noise and exchange information in their subdomains of expertise.

Community again: noobs

Don't take my subliminal messaging the wrong way: the noobs help. As they say, everybody starts out as a noob. It's clear that noobs pave the way for many others to follow by asking their noobish questions — that's rarely disputed. The really interesting thing is that noobs can provide a more brute force approach to answering questions correctly.

So long as the noobs are semi-informed, they're probably on SO because they're trying to learn about a topic of interest. Active learning processes are accompanied by reading and revisiting things that the more seasoned veterans haven't cared to think about in a long time. Noobs, with references fresh in their mind, can offer up suggestions or quotations (which they may or may not fully understand) while the rest of the members determine whether or not their information is helpful via votes and comments. Even if the noob's proposed answer is somehow incorrect, other members will learn exactly why. If other members thought the noob's answer was feasible as well, they'll be informed and corrected by seeing the dialog. This isn't something you get on an experts-only-answer site: interpolation of the truth through the correction of proposed answers.

There is, of course, the potential for Noobs of Mass Destruction (NMDs?) a la the Eternal September. If noobs outweigh the properly knowledgeable constituency so heavily that misconceptions are voted up far more rapidly than proper solutions, the site will suffer from a misinformation-shock. This misinformation may be corrected over time, but aside from Accepted Answers it's difficult to jump a correct answer to the top over highly up-voted incorrect answers. You need a critical mass of users that know what they're talking about to tip the scales with their votes and their arguments.

Lucky for us members, this didn't happen at public release. Even more lucky for the world of programmers, the success of the site and lack of an Eternal September-like phenomenon on SO will lead to more informed programmers from here forward, further reducing the chance for SO's quality to deteriorate. Really, it was just the initial gamble of going public and, as I mentioned before, SO got the timing right.

Community scaling through tagging

One of my favorite parts of all this is that tags allow the community scale beautifully. If SO gains a thousand new C# programmers as members, does that hurt, say, the Python programmers? No: because of tags, more members can only mean a better site. "Stack Overflow is biased towards C#" is not a self fulfilling prophesy. I'll explain why:

For argument's sake, let's say these are C# robots who only understand ways to use C# syntax to do what you want (i.e. "You can use regions to ease generated code injection. Beep."). If I'm a Python programmer who doesn't care about C#, I'm ignoring the tag anyway and don't get inundated with noise from the robots.

Inevitably, our hypothetical is incorrect and the C# programmers will all have knowledge which crosses into other subdomains. In the (slightly more realistic) case that the members are human beings who know C# along with some generic principles of programming and software design, they can only assist me in my cross-domain problems.

More C# programmers can only help the Python programmers. For all X and Y, more people interested in X can only help people interested in Y, so long as everybody tags everything appropriately. Except Lisp.

Comparison to other outlets

How does SO stack up against the alternatives? The primary differentiation comes in a few identifiable areas:

Structured

Folks on IRC, Usenet, or your buddy list have no real incentive to help you beyond the goodness of their hearts. I'm a starry-eyed idealist and I'm happy that this has worked historically, but it's readily apparent that people love playing for points. SO is one of those purely healthy forms of competition where everybody seems to win; from what I've seen, RTFM and "Google is Your Friend" trolls are consistently down-voted! The reputation system also appears to increase the responsiveness of the site — everybody is looking for the quick "Accepted Answer" grab if they can get it. I had figured that people would try to game the system, but it seems like most people with reputation have been sane, and the people with little reputation have their teeth pulled appropriately. Kudos to the karma system.

What differentiates SO from a big bulletin board is the three-tier threads. You have question (Original Poster), answer (many answers to one question), and comments (many comments to one answer). @replies allow for infinite "virtual" threading, but there's a clear indication of how the conversation is supposed to take place through the structure of the site. My experience with this format has led me to believe that it's ideal for removing noise from the answer tier (via short comments), without letting the meta-conversation get too crazy.

Threading on Usenet allows you to explore related topics of conversation with less friction, but it can be a big problem when you just want to know the answer that the Original Poster (OP) finally accepted. You often see such sub-conversations on Usenet get turned into new threads, while SO asks that you form the new thread pre-emptively as a new question. I have no problem with SO's approach, given the benefits of the three tiered conversation and the more precise indexing capabilities that result from structured threads.

Visibility of questions and answers is a big problem on IRC: there's a distinct fire-and-be-forgotten phenomenon in most channels, proportional to their noise level. Additionally, there's usually a few super gurus in each channel that can only handle one or two problems at a time, leading to,

[impatient-43/4] Can anybody answer my question^^!?!?

messages ad nauseum.

Asynchronous

Usenet does better than IRC in terms of question visibility because it's an asynchronous medium. IRC's synchronous format makes help a lot more interactive, but at great cost. In addition to the fire-and-be-forgotten phenomenon, you inevitably juggle O(n) synchronous channels simultaneously, where n is the number of topics you're interested in.

Also, remember that chat is exactly that: you're going to get unwanted noise. Other people's Q&As, off topic conversation, and sometimes spammers all interfere with your ability to communicate a problem and get an answer in real time. If you've ever tried reading an IRC log to determine the answer to your question, you probably understand this principle — once you mix anonymized handles in with a many-to-many conversation, you give up quickly.

The asynchronous model fits into everybody's day more nicely and scales much better. I haven't yet seen a question on SO where I said to myself, "This Q&A could have benefited greatly from an increased level of synchronous interaction." (Yeah, that's really how I talk to myself. Wanna fight about it?)

Centralized

As I mentioned, the big pipe is a beautiful thing. Some nice corollaries are:

One could argue that IRC's Freenode is similar in the virtual channel respect, but logging is certainly not centralized, and listening to many virtual channels simultaneously quickly converges to impossible. Unlike SO's multi-tag view, asking a question in one IRC channel is unlikely to get the attention of people who reside in other channels.

Newsgroups are all-over-the-place decentralized. It's definitely a web 1.0 technology. There's a bunch of services that consolidate information for newsgroups of interest (Google Groups, gmane), but due to the information being replicated all over the web, the page rank for a given Q&A will tend to be weaker as it's divided across the resources and components of the thread. Newsgroups don't tend to play together as nicely as SO tags — it's easy to see how a question like, "What's monkeypatching?" could be asked on comp.lang.python, comp.lang.ruby, and so on, without ever being referred to each other.

On SO, if you tag things properly, information naturally crosses virtual channels and is well indexed for search.

Persistent

IRC channels tend to get inundated with the same questions over and over, so they make an FAQ to persist a subset of the information that's routinely provided in the channel. Taken to its rational extreme, you could persist all the Q&A information in such a manner, in which case you'd have SO.

Some IRC channels get logged, but I rarely care where the logs are — there's little hope of you finding the answer from the log (as previously discussed). It's also unlikely that the page rank of any given log will be significant. In my IRC experience, you keep your own chat logs if you really care to find the conversations later on. In any case, this is much less elegant than SO's centralized and indexed persistence capabilities.

As I mentioned before, newsgroups have persistence, but it's not well centralized or indexed. Persistence is a moot point if you can't find what you're looking for.

Critical Thinking

Since I'm out of a job as a karma system and NMD doomsayer, I've got to talk about the potential for secondary Armageddon-like effects.

SO doesn't have a significant enough differentiation from refactormycode. Its mission is well differentiated, but it seems like the permitted content on SO is a superset of what can be found on refactormycode. I would consider this kind of Q&A noisy, but it certainly follows the same general format. It's possible the authors are cool with SO engulfing a lot of refactormycode material, but in that case I hope we get some better large code block support. If SO doesn't want it, it should be in the constitution.

I'm concerned about question staleness. Over time we'll see how venerable the Q&As are, but my immediate concern is the plot of views over time: is the drop off in number of views over time for a given question so significant that the return rate cannot overcome initial misconceptions? If misconceptions are introduced later, will users still be watching the thread? There's no "watch this thread" capability in SO for push notification, so to some extent the system expects you to check back at regular intervals to monitor activity on threads. This may be an unrealistic assumption. To be fair, the constitution explicitly states you may re-ask a question if you acknowledge that the other exists, which may prevent this from being such a big deal.

I'm curious as to how the number of non-programming, technical questions has trended over time. Potential problems in this area are alleviated by the constitution and the fact that sufficiently reputable members can close threads, but it's easy to see how there will be an inevitable flow of system administrative questions due to how knowledgeable the constituency is. If the site didn't have such good safeguards, it would easily swallow a whole lot of other Q&A domains that are indirectly programming related.

Reclaimed bile juices

Dear property owners of Santa Clara,

As much as I appreciate your movement to nurture the sidewalks during their critical summer-growth phase, I do not appreciate being sprayed with reclaimed wastewater every day on the way home from work. Being a bicyclist, I will choose being hit with poo-water over being hit by cars, but I will not like it. I will also be forced to curse your name and everything you stand for. [For example, in the following story.]

Legend has it that Axmark was taking a walk around Sun's beautiful campus one evening, contemplating acceptance of Sun's fascist policies, as an impact sprinkler's arm fatefully slammed against its nozzle. Axmark was quickly doused in processed toilet liquids, and came to realize that he didn't have to take that kind of shit.

With terrible puns,

Chris' Raging Bile Duct

Idiomatic Python refactoring: for-else, "in" (contains) operator

I was perusing the App Engine SDK and I came across this snippet:

if self.choices:
  match = False
  for choice in self.choices:
    if choice == value:
      match = True
  if not match:
    raise BadValueError('Property %s is %r; must be one of %r' %
                        (self.name, value, self.choices))

Since I don't work with many other Python programmers, I always have trouble figuring out what interesting tidbits would be useful to post in, say, a blog entry. I don't have a good understanding of the popular knowledge level, but I figure that I can't go too wrong refactoring code written by Google engineers (who I naively assume are all as cool as Steve Yegge). [*]

The for-else statement

Let's forget about self for now [†] and refactor to use an obscure (but useful) Python feature, the for-else construct. for-else removes the necessity for the boolean-flag-state idiom from the original code, which is often used in lower level languages. [‡]

if choices:
    for choice in choices:
        if choice == value:
            break
    else:
        raise BadValueError

The for-else statement looks a little strange when you first encounter it, but I've come to love it. The else suite is evaluated if you don't break out of the for loop. In this case, if we didn't break out of the for loop, then we never found a value equivalent to choice.

We also gain some efficiency over the original by using the break statement as soon as we find a match: there's no need to keep looking if you've already found a result! This can save you from iterating over all len(choices) items if you find it's a valid choice in the first iteration.

in (contains) operator

Here is an even more readable and Python-like refactoring that uses the in operator: [§]

if choices and value not in choices:
    raise BadValueError

The in operator works on any iterable object and performs the same behavior as the code above: it looks for any item within self.choices such that choice == item. If it finds it early in the list, it won't keep looking. This is similar behavior to our early break statement from the first refactoring.

Just like the original code with the for loop, the in operator raises a TypeError if choices is not iterable. The in operator is effectively a drop-in replacement for the (more verbose) for loop when it comes to membership testing.

Footnotes

[*]

You should read his blog if you don't already.

[†]

For the language lawyers: we're forgetting about the fact that this code was intended to be executed in a bound instance method. ;)

[‡]

For example, C. For more information on programming languages and their "heights", see this Wikipedia entry.

[§]

Yeah, yeah... technically it's the not in operator.

Thoughts on desktop Linux incompatibilities with iPhone and Android

Linux users want music-player/phone integration. Linux users want to sync all of their data — contacts, emails, calendars, bookmarks, documents, ebooks, music, photos, videos — at the touch of a button. Linux users want 3G data rates. Linux users want a state of the art, coordinated mobile platform.

If FLOSS developers are so prone to scratching their own itches, why doesn't there exist such a thing?

Because large scale mobile device companies box us out.

The iPhone Platform

I believe that Linux users who purchase their iPhone with the intent of jailbreaking it to fake compatibility are doing the Linux community a great disservice. They are purchasing a device which is made with the intent of not working with your computer. There's no more mass storage device. There's no longer a known iTunesDB format. The iPhone goes so far to obscure our intended usage that the community-recommend method of gaining functionality was to use an arbitrary code execution exploit. This is what we're driven to do. Do you want to support this behavior with your $200-500?

From a technological standpoint, our historical success at reverse engineering is very cool. It demonstrates the community's technical prowess through our ability to overcome artificial barriers. Despite the coolness factor, however, we can not and should not rely on our ability to kluge around obstacles in our path. Why? Because it doesn't allow us to make any definitive progress. It constantly puts us several steps behind the capabilities of a "properly" functioning device, both due to the difficulty of finding a solution and the misdirection of creative energy. One can't reasonably expect to build a working, Linux-compatible platform on top of a series of hacks that could potentially break with any minor release.

Even more insulting is the message that alternative solutions that work within the system are unwelcome. In my mind, the rallying cry of the Linux community should be "iPhone != iTunes". Ideally, the community could write an iTunes replacement application that played Ogg Vorbis and FLAC files. Let's enumerate some problems that this would solve for FLOSS developers and enthusiasts:

  1. We wouldn't have to reverse engineer the new iTunesDB format (or anything having to do with iTunes).

  2. We wouldn't have to reverse engineer the new iPhone USB protocol.

  3. We would be starting a platform with a solid base that we could build upon. We would no longer be at the mercy of a development shop that clearly doesn't care about our demographic.

  4. We could have it connect to a small socket server on our local machines and automatically sync music over WiFi.

  5. We could play Ogg Vorbis files, for God's sake!

We could write a whole suite of totally legitimate applications for the iPhone to perform compatible iPhone-native-application-like functionality, all within the artificial constraints of the iPhone! There's nothing stopping us — except for the distribution mechanism. If Apple is at all amenable to our cause, the rejection of competitive apps will have to stop. Again: we should not have to void our warranties to use our product in legitimate ways on our competitive computing platforms.

Sadly, even if iTunes-store enlightenment came to fruition, we'd still be screwed. Platform restrictions disallow several key abilities. Case in point, we could not background our iTunes-replacement music player while we browsed the web (or did anything else, for that matter). We find ourselves at the mercy of the exposed API and Human Interface restrictions. Although this is unfortunate, it's decidedly better than founding a platform on our ability to hack around the poor design decisions of others.

The Android Platform

I'm much less well informed about the Android platform and the upcoming HTC Dream mobile device. Nobody is well informed at this point — almost exactly one month from the expected release date — much to the chagrin of potential customers. There are early indications that Linux desktop compatibility will not be supported natively on this platform either. As a Linux user, I can only cross my fingers and hope that Android will be as open as Google makes it out to be, while keeping a close watch on the potentially hazardous centralized distribution model.

Food For Thought

Since this article is supposed to contain my "thoughts on" the subject, I feel I should also share this little tidbit that keeps rattling around in my head. I'm not drawing any conclusions, just providing the reader with another, incomplete step in my thought process.

Monopoly law exists, in part, to disallow certain practices that are thought to be detrimental to "consumer welfare". From Wikipedia (emphasis added):

Competition law does not make merely having a monopoly illegal, but rather abusing the power that a monopoly may confer, for instance through exclusionary practices.

Update: September 20, 2008

An application named MailWrangler was also barred from the Apple Store for vaguely duplicating the functionality of Mail.app. From Angelo DiNardi's article:

Normally to check multiple Gmail accounts in mobile Safari you would have to log in and out of all of the accounts, typing the username and password for each. Using just the Apple Mail application you aren’t able to see threaded views, your google contacts, archive (quickly), star, etc without going through the hassles that are present when using Gmail’s IMAP on the iPhone.

This is another case of barring an application that offers features for a smaller demographic. I personally can't see why Apple is so "afraid" — let third party apps spring up for specialized features, so long as they don't violate the device's terms of use. If you feel like incorporating those features into Mail.app somewhere down the road, the other applications will die out naturally.

I feel sincere sympathy for Angelo; however, on the desktop Linux side we're at an even greater disadvantage — for us, there isn't even similar functionality available on the iPhone platform. To just sync our music, we have to void our warranties. The only thing we can possibly do without voiding our warranties is write an app with similar functionality to the iTunes music player and acquire it through the Apple Store. Forbidding us from doing this makes legitimate desktop Linux use impossible — for what advantage?