December 3, 2009

Inedible vectors of spam: learning non-reified generics by example

I've been playing with the new Java features, only having done minor projects in it since 1.4, and there have been a lot of nice improvements! One thing that made me do a double take, however, was a run-in with non-reified types in Java generics. Luckily, one of my Java-head friends was online and beat me with a stick of enlightenment until I understood what was going on.

In the Java generic system, the collections are represented by two separate, yet equally important concepts — the compile-time generic parameters and the run-time casts who check the collection members. These are their stories.

Dun, dun!

An example: worth a thousand lame intros

The following code is a distilled representation of the situation I encountered:

import java.util.Arrays;
import java.util.List;

public class BadCast {

    static interface Edible {}

    static class Spam implements Edible {}

    List<Spam> canSomeSpam() {
        return Arrays.asList(new Spam(), new Spam(), new Spam());
    }

    /**
     * @note Return type *must* be List<Editable> (because we intend to
     *       implement an interface that requires it).
     */
    List<Edible> castSomeSpam() {
        return (List<Edible>) canSomeSpam();
    }

}

It produced the following error in my IDE:

Cannot cast from List<BadCast.Spam> to List<BadCast.Edible>

At which point I scratched my head and thought, "If all Spams are Edible, [*] why won't it let me cast List<Spam> to List<Edible>? This seems silly."

Potential for error

A slightly expanded example points out where that simple view goes wrong: [†]

import java.util.Arrays;
import java.util.List;
import java.util.Vector;

public class GenericFun implements Runnable {

    static interface Edible {}

    static class Spam implements Edible {

        void decompose() {}
    }

    List<Spam> canSomeSpam() {
        return Arrays.asList(new Spam(), new Spam(), new Spam());
    }

    /**
     * Loves to stick his apples into things.
     */
    static class JohnnyAppleseed {

        static class Apple implements Edible {}

        JohnnyAppleseed(List<Edible> edibles) {
            edibles.add(new Apple());
        }

    }

    @Override
    public void run() {
        List<Spam> spams = new Vector<Spam>(canSomeSpam());
        List<Edible> edibles = (List<Edible>) spams;
        new JohnnyAppleseed(edibles); // He puts his apple in our spams!
        for (Spam s : spams) {
            s.decompose(); // What does this do when it gets to the apple!?
        }
    }
}

We make a (mutable) collection of spams, but this time, unlike in the previous example, we keep a reference to that collection. Then, when we give it to JohnnyAppleseed, he sticks a damn Apple in there, invalidating the supposed type of spams! (If you still don't see it, note that the object referenced by spams is aliased to edibles.) Then, when we invoke the decompose method on the Apple that is confused with a Spam, what could possibly happen?!

The red pill: there is no runtime-generic-type-parameterization!

Though the above code won't compile, this kind of thing actually is possible, and it's where the implementation of generics starts to leak through the abstraction. To quote Neal Gafter:

Many people are unsatisfied with the restrictions caused by the way generics are implemented in Java. Specifically, they are unhappy that generic type parameters are not reified: they are not available at runtime. Generics are implemented using erasure, in which generic type parameters are simply removed at runtime. That doesn't render generics useless, because you get typechecking at compile-time based on the generic type parameters, and also because the compiler inserts casts in the code (so that you don't have to) based on the type parameters.

...

The implementation of generics using erasure also causes Java to have unchecked operations, which are operations that would normally check something at runtime but can't do so because not enough information is available. For example, a cast to the type List<String> is an unchecked cast, because the generated code checks that the object is a List but doesn't check whether it is the right kind of list.

At runtime, List<Edible> is no different from List. At compile-time, however, a List<Edible> cannot be cast to from List<Spam>, because it knows what evil things you could then do (like sticking Apples in there).

But if you did stick an Apple in there (like I told you that you can actually do, with evidence to follow shortly), you wouldn't know anything was wrong until you tried to use it like a Spam. This is a clear violation of the "error out early" policy that allows you to localize your debugging. [‡]

In what way does the program error out when you try to use the masquerading Apple like a Spam? Well, when you write:

for (Spam s : spams) {
    s.decompose(); // What does this do when it gets to the apple!?
}

The code the compiler actually generates is:

for (Object s : spams) {
    ((Spam)s).decompose();
}

At which point it's clear what will happen to the Apple instance — a ClassCastException, because it's not a Spam!

Exception in thread "main" java.lang.ClassCastException: GenericFun$JohnnyAppleseed$Apple cannot be cast to GenericFun$Spam
        at GenericFun.run(GenericFun.java:36)

Backpedaling

Okay, so in the first example we didn't keep a reference to the List around, making it acceptable (but bad style) to perform an unchecked cast:

Since, under the hood, the generic type parameters are erased, there's no runtime difference between List<Edible> and plain ol' List. If we just cast to List, it will give us a warning:

Type safety: The expression of type List needs unchecked conversion to conform to List<BadCast.Edible>

The real solution, though, is to just make an unnecessary "defensive copy" when you cross this function boundary; i.e.

List<Edible> castSomeSpam() {
    return new Vector<Edible>(canSomeSpam());
}

Footnotes

[*]

Obviously a point of contention among ham connoisseurs.

[†]

This doesn't compile, because we're imagining that the cast were possible. Compilers don't respond well when you ask them to imagine things:

$ javac 'Imagine you could cast List<Spam> to List<Edible>!'
javac: invalid flag: Imagine you could cast List<Spam> to List<Edible>!
Usage: javac <options> <source files>
use -help for a list of possible options
[‡]

Note that if you must do something like this, you can use a Collections.checkedList to get the early detection. Still, the client is going to be pissed that they tried to put their delicious Ham in there and got an unexpected ClassCastException — probably best to use Collections.unmodifiableList if the reference ownership isn't fully transferred.