Inside F#

Brian's thoughts on F# and .NET

An example of the interplay between language features and library design, part one

Posted by Brian on April 16, 2008

In today’s blog entry I want to trace an imagined lineage of a simple collection API method that demonstrates how language design, library design, and programming idioms co-evolve over time.  The API in question is a lookup function for a Dictionary, and I think careful study of this relatively mundane method can provide some fascinating insight into the interplay of programming language constructs, libraries, and idioms.

For today’s blog, I am going to use C# syntax, but merely for notational convenience, as my intention is to talk about things abstractly in a language-neutral way.

A very simple dictionary lookup

Let’s say we need a lookup table that maps string values to integer values.  (Perhaps the string represents a name, like "Brian", and the integer represents the number of blog entries posted this month.)  Suppose also we are in an introductory programming class (or perhaps using a very old language) and we don’t yet know about generics or exceptions.  We might design a "lookup" API on this "dictionary" class with an instance method that looks something like this:

    // returns -1 if key not in dictionary
    int GetValue(string key);

and then call it with client code like this:

    int r = dict.GetValue("Brian");
    if (r != –1)
        Console.WriteLine("Brian posted {0} blogs", r);

This API is simple and workable for this example.

Problems in the space of values

The previous GetValue() function utilized a special value (namely "-1") in the space of things that were being looked up in order to provide a way to report back whether the dictionary actually contained an entry for "Brian".  When the thing being stored is a non-negative integer (e.g. number of blog entries I posted), this is fine, but this scheme fails if we need to store values that span the entire value space of the data type.  So, for example, suppose once again that I want a structure that maps strings to integers, and the string key is a person’s name again, but this time the integer value is that person’s favorite 32-bit integer.  We can no longer use "-1" as a sentinel value to mean "not in the dictionary", because we would be unable to distinguish it from an entry of a person whose favorite number is indeed -1.  (Note: Sentinel values are also ruled out if we try to make the dictionary generic over the value type.)  So we might change the API:

    // returns true if the dictionary contains the key
    bool ContainsKey(string key);
    // (undefined behavior if key not in dictionary)
    int GetValue(string key); 

as well as the client code:

    if (dict.ContainsKey("Brian"))
        int r = dict.GetValue("Brian");
        Console.WriteLine("Brian’s favorite 32-bit integer is {0}", r);

We have solved the problem of not being allowed to use a particular sentinel value to mean "not there" (and thus this API makes it possible to author a "generic" dictionary class).  However in the process, we introduced a number of new problems.  First, the API has gotten more complicated – there are two methods the user needs to know about rather than just one.  Furthermore, the behavior of the APIs requires the client to use a particular idiom – the client code must call ContainsKey() before calling GetValue() to ensure the behavior is well-defined.  Things have probably become less performant, since we are now looking up the value in the data structure twice (once in the ContainsKey() call, and then again in the GetValue() call).  And finally, if this data structure can be simultaneously accessed by multiple execution threads, we would require the client to lock the data structure across the two calls, or else ContainsKey() might succeed, but then some other thread might remove that key-value pair before GetValue() runs.  What a mess!

Exceptions to the rescue?

One possible way out of the mess is to introduce a new language feature – exceptions.  If our programming language supports exceptions, we can redefine the API like so:

    // throws KeyNotFoundException if key is not in dictionary
    int GetValue(string key);

and the client code becomes:

        int r = dict.GetValue("Brian");
        Console.WriteLine("Brian’s favorite 32-bit integer is {0}", r);
    catch (KeyNotFoundException)

The "exceptions" language feature just helped solve all four problems we introduced in our previous attempt at the API.  We are back down to a single method, which requires no special client calling-order idioms, which needs only perform the lookup once, and which can do thread synchronization inside the method.  Quite an improvement!

Nevertheless, this API design is still poor, as exceptions are a very heavyweight mechanism.  In many languages, throwing an exception means your code’s performance will be poor (worse than an if-then-else, anyway).  Throwing an exception means you will disrupt the debugging experience (e.g. for those using ‘catch all first-chance exceptions’ to find real problems in the code).  And exceptions syntactically burden the caller (by requiring a try-catch).  So our API is still not great.

Out parameters to the rescue…

We can introduce another language feature to address these problems.  If our programming language supports "out" parameters, we can rewrite the API like this:

    // if key not present, returns false
    // else returns true and sets value
    bool TryGetValue(string key, out int value);

and our client code becomes:

    int r;
    if (dict.TryGetValue("Brian", out r))
        Console.WriteLine("Brian’s favorite 32-bit integer is {0}", r);

Once again, we have solved the problems introduced by our previous attempt.  This API is simple and performant.  Indeed, this is the API used by System.Collections.Generic.Dictionary, and is an instance of a more general design idiom known as the TryParse pattern, described here and here.  (If you haven’t read about "TryParse" before, follow the links and learn!)

The end of the story?

I could end the story here.  We’ve gone through four iterations of API design, constantly improving things by repairing deficiencies with the prior designs.  Along the way we saw how certain language features (like "generics", "exceptions" and "out parameters") interacted with the library API design and influenced the idioms used in the calling code.

But there is still more to say.  I’ll save it for the next blog entry, though.  (I’m such a tease!)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: