'Best Practices' category

« Older entries
Newer entries »

Thoughts on blogging quantity vs quality

I'm very hesitant to post things to my real blog. [*] I often have complex ideas that I want to convey via blog entries, and the complexity mandates that I perform a certain level of research before making any real claim. As a result, I'm constantly facing a quantity vs. quality problem. My queue of things to post is ever-increasing and the research/writing process is agonizingly slow. [†]

Just saying "screw it" and posting whatever un-validated crap spills out of my head-holes seems promising, but irresponsible. The question that I really have to ask myself is whether or not the world would be a better place if I posted more frequently, given that it is at the cost of some accuracy and/or completeness.

I'm starting to think that the best solution is to relate a daily occurrence [2] to a larger scheme of things. Trying to piece together some personal mysteries and detailing my thought processes may be both cathartic and productive — in the literal sense of productive.

A preliminary idea is to prefix the titles of these entries with "Thoughts on [topic]" and prefix more definitive and researched articles with a simple "On [topic]". Readers may take what I'm saying more at face value if I explicitly announce that my post relates to notions and not more concrete theories. [‡]

Footnotes

[*]

As opposed to my Tumblr account, which I use like an unencumbered Twitter.

[†]

My great grammatical aspirations certainly don't speed up the process.

[‡]

Perhaps one that seems inconsequential! That would make for a bit of a fun challenge.

[§]

Theory: a well-substantiated explanation of some aspect of the natural world.

Using Python identifiers to helpfully indicate protocols

Background

The Stroop Effect indicates that misleading identifiers will be more prone to improper use and will be more subtle when introducing bugs. Because of this phenomenon, I try to make my identifiers' intended usage as clear as possible without over-specifying and/or repeating myself. Additionally, I prefer programming languages which allow for latent typing, which has interesting results: I end up encoding protocol indicators into identifiers. [*]

An Example

If you're (still) reading this, you're most likely a Python programmer. When you find that there exists an identifier chunks, what "kind of thing" do you most expect chunks to be bound to? Since this is a very hand-wavy question, I'll provide some options to clarify:

  1. A sequence (iterable) of chunk-like objects.

  2. A callable that returns chunks.

  3. A mapping with chunk-like values (presumably not chunk-like keys).

  4. A number (which represents a count of chunk-like objects somewhere in the problem domain).

If you've got a number picked out then you can know that I'm the bachelor behind door number one. Since I would identify a lone chunk by the identifier chunk, the identifier says to me, "I'm identifying some (a collection of) chunks." By iterating, I'm asking to hold the chunks one at a time. (Yuck. :)

Callables and Action Words

If you chose door number two and think that it's a callable, then your bachelor is this Django project API, which I am using in this particular (chunky) example. This practice not at all uncommon, however, and another good example of this present within the Python standard library is the BaseHTTPServer with its version_string and date_time_string methods. I might be missing something major; however, I'm going to claim that callables should be identified with action word (verb) prefixes.

To me, it seems well worth it to put these prefixes onto identifiers that are intended to be callables to make their use more readily apparent. To my associative capacities, action words and callables go together like punch and pie. Since it helps clarify usage while writing code, it seems bound to help clarify potential errors while reading code, as in the following contrast:

for chunk in uploaded_file.get_chunks:
    """Looks wrong and feels wrong writing it... action word but no
    invocation.
    """
for chunk in uploaded_file.chunks:
    """Looks fine and feels okay writing it, but uploaded_file.chunks is
    really a method.
    """

Mappings and Bylines

If you chose number three and think that it's a mapping, I'm surprised. There's nothing about the identifier to indicate that there is a key-value relation. Additionally, attempting to iterate over chunks, if it is a mapping with non-chunk keys, will end up iterating over the (non-chunk) keys, like so:

>>> chunks = {1: 'Spam', 2: 'Eggs'}
>>> i = iter(chunks)
>>> repr(i)
'<dictionary-keyiterator object at 0xb7da4ea0>'
>>> i.next()
1

chunks being a mapping makes the code incompatible with the people who interpret the identifier as an iterable of chunks (number two), since the iterator method (__iter__) for a mapping iterates over the keys rather than the chunks. This is the kind of mistake that I dislike the most: a potentially silent one! [†]

To solve this potential ambiguity in my code I use "bylines", as in the following:

>>> chunk_by_health_value = {1: 'Spam', 2: 'Eggs'}
>>> health_values = iter(chunk_by_health_value)

Seeing the fact that the identifier has a _by_healthiness postfix tells me that I'm dealing with a mapping rather than a simple sequence, and the code tends to read in a more straightforward manner: if it has a _by_* postfix, that's what the default iterator will iterate over. In a similar fashion, if you had a mapping of healthiness to sequences of chunks, I would name the identifier chunks_by_healthiness. [‡]

Identifying Numbers

If you chose number four, I see where you're coming from but don't think the same way. Every identifier whose purpose is to reference a numerical count I either prefix with num_ or postfix with _count. This leaves identifiers like chunks free for sequences that I can call len() on, and indicates that chunk_count has a number-like interface.

Compare/Contrast with Hungarian notation

Though my day-to-day usage I find that this approach doesn't really suffer from the Wikipedia-listed criticisms with Hungarian notation.

  • ... becomes confusing when used to represent several protocols ... If the identified object is implementing multiple protocols, chances are that it is already confusing and should be Handled With Care (TM). In these cases I fall back on the singular to make the client and/or maintainer look at the API.

  • ... may lead to inconsistency when the code is modified ... If you switch the (strongly related) protocol of an identified object, there seems to be a high chance that all of your old code won't (and shouldn't) continue to work. For example, see the iterating-over-a-mapping example above — if you switched an identified object from a sequence to a mapping you'd want to change the identifier to clarify what was going on anyway.

  • ... most of the time, knowing the use of an object implies knowing its protocol ... This isn't really true in Python, given that there's at least a few operations that apply well across different protocols: iter(), __getitem__/__setitem__ (indexing). (Note: This is mentioned in PEP 246.)

Unless I sorely misunderstood the distinction, you could classify this system as a broadly applicable Apps Hungarian, since protocols are really all about semantic information (being file-like indicates the purpose of being used like a file!). Really, this guideline just developed from a desire to use identifiers that conform to the some general notions that we have of language and what language describes; I don't tend to think of chunks as something that I can invoke. (Invoke the chunks!)

For objects that span multiple protocols or aren't "inherently" tied to any given protocol, I just use the singular.

Potential Inconsistencies

Strings can be seen as an inconsistency in this schema. Strings really fall into an ordered sequence protocol, but identifiers are in the singular; i.e. "message". One could argue that strings are really more like sequences of symbols in Python and that the identifiers would be more consistent if we used something like: "message_letters". These sort of identifiers just seem impractical, so I'm going to reply that you're really identifying a message object adapted to the string protocol, so it's still a message. Feel free to tear this argument apart. ;)

Footnotes

[*]

A protocol in Python is roughly a "well understood" interface. For example, the Python file protocol starts with an open() ability that returns a file-like handle: this handle has a read() method, a write() method, and usually a seek() method. Anything that acts in a file-like way by using these same "well understood" methods can usually be used in place of a file, so the term protocol is used due to the lack of a de jure interface specification. For a really cool PEP on protocols and adapters, read Alex Martelli and Clark Evans' PEP 246. (Note: This PEP wasn't accepted because function annotations are coming along in Python 3, but it's still a really cool idea. :)

[†]

Perl has lots of silently conforming behavior that drives me nuts.

[‡]

I still haven't figured out a methodological way to scale this appropriately in extreme cases; i.e. a map whose values are also maps becomes something like chunk_by_healthiness_by_importance, which gets ugly real fast.

Do permalinks change when blogger posts change title?

This is a test post to figure out whether or not permalinks change when you change the title of a post on blogger. The problem is that cool URIs don't change, but sometimes I want my titles to.

A permanent HTTP redirect would probably be ideal since the slug could be correctly indexed in the future, but I wouldn't mind settling for the new content being served from the old URI.

Let's see what happens...

Update: Permalinks are indeed permanent and the new content is served from the old URI on a title change.

Problems with Python __version__ parsing

As stated by Armin and commenters [*] the change from 0.9 to 0.10 is a convention in open source versioning, and the fault seems to lie more on the version-parsers than the version-suppliers. [†] Armin also notes that the appropriate solution is to use:

from pkg_resources import parse_version

Despite it not being the fault of the version supplier, we've recognized that this can be an issue and can certainly take precautions against letting client code interpret __version__ as a float. Right now there are two ways that I can think of doing this:

  1. Keep __version__ as a tuple. If you keep __version__ in tuple form you don't need to worry about client code forgetting to use the parse_version method.

  2. Use version numbers with more than one decimal. This prohibits the version from being parsed as a float because it's not the correct format — taking the current Linux kernel version as an example:

    >>> __version__ = '2.6.26'
    >>> float(__version__)
    Traceback (most recent call last):
    ...
    ValueError: invalid literal for float(): 2.6.26
    >>> tuple(int(i) for i in __version__.split('.'))
    (2, 6, 26)
    

    This ensures that the client code will think about a more appropriate way to parse the version number than using the float builtin; however, it doesn't prevent people from performing an inappropriate string comparison like the tuple does.

Footnotes

[*]

In … and 0.10 follows 0.9

[†]

Which seems to invalidate the implied conclusion of my title How not to do software version numbers, which I now realize was stupidly named.

twill is mini-language done right

In my college career I found a book whose genre I can best describe as "computer science philosophy." The Pragmatic Programmer taught me several key philosophical ideals behind real-world code design, which I found to be quite useful in my every-day programming life. [*]

One of the aforementioned "Pragmatic Programming" practices that I had the most difficulty wrapping my head around dealt with the creation of mini-languages close to the problem domain (AKA domain specific languages). To me, this seemed like an unnecessary amount of additional work, when one could just specify an interface that was a close approximation of a mini-language in a semantically rich language. After looking at twill, I realize that it isn't too much work, specifically because you can create a close approximation in a semantically rich language.

Twill has a commands module that exports functions with simple names; for example, "go", "find", "back", etc. This is the more simple approximation of a mini-language that I had anticipated as a time savings. However, Twill also has a shell module which deals with the online execution of these functions and their arguments within a shell-like environment. This creates, as interpreted by the shell, a mini-language!

With Twill's elegant and maintainable (extensible) coding, the command loop is approximately 250 lines. It's not too hard to create a simple file reader to interpret commands line by line from a script and feed them into the command loop one by one, at which point you have a scriptable mini-language!

There's nothing extraordinarily amazing about any single component of this process, but the fact that a mini-language can be created with so little effort kind of stunned me — one of those "lightbulb going off in your head" situations. :)

Letting my mind wander a bit, I doubt that it's always possible to wrap an existing API with simple, domain-specific, script-like commands. If creating a domain-specific mini-language is a tangible possibility, it seems like a design decision you want to know early on.

Footnotes

[*]

My CS degree program had a tendency to emphasize provably correct programs, which I don't find great need for in my personal day-to-day work beyond some simple design-by-contract guidelines. This is the topic for another entry, however... perhaps several entries!