Inside F#

Brian's thoughts on F# and .NET

Motivating F# equality and comparison constraints

Posted by Brian on November 8, 2009

(I will return to the RSS Dashboard series soon; I needed to do this aside now to be timely.)

Don has just posted a great blog entry describing the details of the equality and comparison constraints in the latest release of F#.  I would like to take a moment to describe the higher-level view to motivate the feature.  By reading this blog entry, the reader should learn

  • what structural equality and comparison are
  • what is the problem that F# equality/comparison constraints solve
  • why this problem cannot be solved without a specific language feature
  • when does a programmer need to care about this

If you want to know the details of the F# language feature and how to exercise fine control and specification of equality/comparison on your user-defined types, then you should go read Don’s post.  On the other hand, if you want an overview and more examples about the ‘why’, hopefully this blog entry will give you what you need.  Some readers might find this blog entry is a useful ‘prequel’ to Don’s.

So let’s get started!

What are structural equality and comparison?

Consider this short code example:

let il1 : int list = [ 1; 2; 3 ]
let il2 : int list = [ 2; 3; 1 ]
printfn "%A" (il1 = il2) 
printfn "%A" (il1 < il2) 

What would this print?  If you said “false” and “true”, then you already have a good sense of structural equality and comparison.  But let’s go through it in more detail.

Structural equality means that for F# structural types (e.g. tuples, arrays, lists, options, records, and unions), we can get a default behavior for ‘equals’ that compares each individual element of the structure for equality.  That is, one ‘int list’ is equal to another ‘int list’ if both lists have the same length and each of their individual elements are equal.  (Contrast this to the default behavior of the .Equals() method of a class type, which defaults to reference equality.)  Structural equality is commonly the semantic the author of a type would want anyway, and in F#, you can get semantic “for free” on any structural type.  In my experience, most people have a good intuition for what structural equality means and see its utility.  (Technical aside: “structural equality” applies to both the .Equals() and the .GetHashCode() methods of a type, as well as the F# (=) operator and “hash” function.)

Structural comparison is analogous to structural equality, except that it applies to the comparison operators (<, <=, >, >=, whose behavior involves calls to System.IComparable.CompareTo()).  For some people, the behavior here is less intuitive – what does it mean to compare two integer lists with the less-than operator?  To explain this, I appeal to a more familiar domain: dictionary order on strings.  If I ask you what order the words “bad”, “ace”, “ad”, and “a” appear in the dictionary, I bet you could tell me.  Thus you are probably not too surprised by the behavior of this F# code:

let words = [| "bad"; "ace"; "ad"; "a" |]
printfn "%A" (Array.sort words)
// [|"a"; "ace"; "ad"; "bad"|]

Dictionary order, sometimes called lexicographical order, is a useful semantic for structural types.  (Aside: you may recall that I relied on structural comparison in a previous blog entry to get an automatic ordering of poker HandValuations.)  With the previous code example in mind, I hope it is now less surprising how this F# code behaves:

let intLists = [| [2;1;4]; [1;3;5]; [1;4]; [1] |]
printfn "%A" (Array.sort intLists)
// [|[1]; [1; 3; 5]; [1; 4]; [2; 1; 4]|]

All I did was change from strings, which ‘structurally’ are just lists of characters, to lists of integers (in this example, substituting a=1, b=2, …). 

So for example [1;2]<[2;1] because we can compare the first element and immediately see a difference, whereas [1;2]<[1;3] because we compare the first element and they are equal, so we move on to the next element until we find a difference.  This same logic applies to other structural types, like tuples: (1,”bbb”)<(2,”aaa”) and (1,”aaa”)<(1,”bbb”); records: {field1=1;field2=”aaa”}<{field1=1;field2=”bbb”}; etc.

To sum up then, structural equality and comparison define the most natural semantic for operations like (=) and (<) on structural types.  In F#, you can get these semantic “for free” by default on structural types.  (You can also opt out of these defaults, or choose to define your own equality/comparison behavior; for details see Don’s post.)

Now that we know what “structural equality” and “structural comparison” mean, we can move on to…

What problem do F# equality/comparison constraints solve?

In the previous section, we learned about this behavior for lists of integers:

let il1 : int list = [ 1; 2; 3 ]
let il2 : int list = [ 2; 3; 1 ]
printfn "%A" (il1 = il2) // false
printfn "%A" (il1 < il2) // true

Now consider a couple more cases for lists of various types.  What is the ideal behavior of equality/comparison on (Windows Forms) Controls?

open System.Windows.Forms 
let cl1 : Control list = [ new TextBox(); new Button(); new CheckBox() ]
let cl2 : Control list = [ new Button(); new CheckBox(); new TextBox() ]
printfn "%A" (cl1 = cl2) 
printfn "%A" (cl1 < cl2) 

Or how about on F# function types?

let fl1 : (int -> int) list = [ id; (+) 1; (fun x -> 2*x) ]
let fl2 : (int -> int) list = [ (+) 1; (fun x -> 2*x); id ]
printfn "%A" (fl1 = fl2) 
printfn "%A" (fl1 < fl2) 

Let’s consider each case in turn.  For Controls, there is a useful equality semantic: reference equality.  As a scenario example, you might hook up the same “click” event handler to a bunch of buttons, and then inside the handler test “if (sender=button1) then…” Indeed, the behavior of .Equals() on these .Net Framework types is reference equality.  So an equality test on a list of Controls should return true if and only if each element of the first list .Equals() the corresponding element of the second list.

What about comparison of Controls with the (<) operator?  Here there is nothing reasonable to do.  (Is a button less than a textbox?  Is this button less than that one?)  None of these type implement IComparable.  Ideally we want this program not to compile; trying to compare a list of Controls with (<) is nonsense.

How about the function types?  Those few humans who have studied the F# spec in detail may recall that the result of calling System.Object.ReferenceEquals() on two F# function types is underspecified (see 6.10.22: Types with Under-specified Object and Type Identity).  Trying to compare two “function values” for (=) equality is bad news!  Similarly, there is no reasonable “comparison” semantic for (<).  So ideally, we would not like either of the function examples to compile.

Happily, F# equality/comparison constraints give us exactly the behavior we want, as suggested by the comments:

let il1 : int list = [ 1; 2; 3 ]
let il2 : int list = [ 2; 3; 1 ]
printfn "%A" (il1 = il2) // false
printfn "%A" (il1 < il2) // true

open System.Windows.Forms 
let cl1 : Control list = [ new TextBox(); new Button(); new CheckBox() ]
let cl2 : Control list = [ new Button(); new CheckBox(); new TextBox() ]
printfn "%A" (cl1 = cl2) // false
printfn "%A" (cl1 < cl2) // does not compile

let fl1 : (int -> int) list = [ id; (+) 1; (fun x -> 2*x) ]
let fl2 : (int -> int) list = [ (+) 1; (fun x -> 2*x); id ]
printfn "%A" (fl1 = fl2) // does not compile
printfn "%A" (fl1 < fl2) // does not compile

If the operation is reasonable, the code compiles and gives the expected behavior.  If the operation if unreasonable, the code fails to compile and you get a helpful diagnostic from the compiler.

To sum up this section, hopefully I’ve provided a convincing argument that the behaviors described above are the ideal defaults.  However this may leave you asking…

Does this really need a language feature?

As you may know, the CLI and languages like C# support a rich variety of ways to constraint class/method signatures so that the static type system can enforce that only sensible combinations of generics/object-hierarchies will typecheck.  Surely we could use those existing mechanisms to get the ideal behavior described in the previous section?

It turns out you cannot.  (Some readers may wish to take a few minutes to try this themselves – try to author your own ‘list’ type (and possibly (=) and (<) operators) that give the same compile-time and run-time behavior as in the previous code example.)  The static type system of the CLI or languages like C# are not powerful enough to express this behavior.

To see why, consider just the comparison (<) aspect for a moment, and the ‘int’ versus ‘Control’ list example.  I want to define a type List<T>.  When I instantiate it with a comparable type like ‘int’, I want the resulting type “List<int>” to be an IComparable.  But when I instantiate with a non-comparable type like ‘Control’ I want the resulting type “List<Control>” to not be an IComparable.  This amounts to needing a generic type to be able to conditionally implement an interface, based on the generic argument.  And you can’t do that.  The nearest things you might try are C# code like

class ListA<T> : IComparable ...

or

class ListB<T> : IComparable where T : IComparable ...

but neither does what we want.  ListA allows lists of Controls, but also allows those lists to be comparable.  ListB disallows comparing lists of Controls, but does so by throwing out the baby with the bathwater, you cannot even create a list of Controls with ListB!  What you really need is something like this imaginary syntax:

if T : IComparable 
    class List<T> : IComparable {...}
else
    class List<T> {...}

But this feature does not exist in C# or the CLI.

The design methodology of F# here is to have features in the F# type system that are erased when the code is compiled.  The same thing happens with units of measure; units are a feature that exist in the F# type system to enable a very useful kind of static type checking in F#, but this information is erased when compiled down to assemblies referenced by other .Net languages.  So for instance, when F# looks at a particular F# assembly, it may see a method that returns “double<kg>”, but when C# looks at the same method, it will just see the return type as “double”.  Similarly, F# equality & comparison constraints allow F# code to see what are effectively these ‘conditional interfaces’, whereas C# code would just see these interfaces unconditionally.  The result is that the F# type system will catch some kinds of semantic errors at compile-time (like trying to add kilograms to meters, or trying to compare two functions values for equality), whereas the corresponding C# cannot (resulting in various unintended runtime semantics).

Without this feature, F# would have to choose between either (1) disallowing equality and comparison on structural types like lists (meaning e.g. you could not compare two lists for equality), or (2) allowing equality and comparison on all structural types, even when it does not make sense (this is roughly where F# was prior to the release of Beta2).  (In fact there are more points in the design space, but this paragraph summarizes the central choice.)  These constraints enable equality/comparison only when it makes sense.

To sum up this section: in order to get the desired behavior described in the previous section, we really do need a language feature.  For details about the feature itself, see Don’s blog.

Now that you hopefully understand why the feature is necessary, you may say, ok, so…

When do I need to worry about any of this?

The good news is, amazingly, you almost never need to care about any of this.  Almost all F# code that compiled and worked before continues to compile and work now the same way.  The main difference is that the type system is providing an extra safety net to prevent you from doing meaningless things on certain types (such as trying to compare function values for equality).  For the most part you get extra safety for free.

The F# library has constraints mentioned on a few common types (like Set<T>, which requires “T : comparison”) and functions (like Seq.groupBy, which requires “Key : equality” (note: at time of this writing, there’s a doc bug on the groupBy web page that says “unknown constraint”)).  So knowing these constraints is occasionally useful when reading documentation, to know for example that Set<T> only works for types T that support (<), and that groupBy can only group on keys that can be tested with (=).

When you write code that uses equality/comparison constraints, it often gets inferred for you on your behalf.  Consider this somewhat contrived example that finds the first element in a list ‘l’ that equals a certain value ‘e’:

let rec tryFindFirst e l =
    match l with
    | [] -> None
    | x::xs when x = e -> Some(x)
    | x::xs -> tryFindFirst e xs

Since (=) is used on the generic element type in the body of the function, the equality constraint is automatically inferred.  Thus the behavior of this function for a few different calls is as seen here:

tryFindFirst 3 [1;2;3;4] // Some(3)
tryFindFirst 5 [1;2;3;4] // None
tryFindFirst id [id;fun x -> x+1] // does not compile

Notice that we get the ‘safety check’ for free.  Our function only works on lists of elements for which the element types can be compared with (=), and we didn’t have to explicitly specify any constraint, it was automatically inferred.

You will occasionally need to be explicit about equality/comparison constraints, if authoring algorithms with explicit type annotations that depend on these constraints, or authoring generic data types that depend on these constraints in non-obvious ways.  For examples, see Don’s blog.

To sum up: so long as you are aware of the existence of the “equality” and “comparison” constraints, and have some basic idea of what they mean (so as to grok them when seen in documentation or F# hover-tooltips in Visual Studio), you’re fine.  You don’t need to understand the nitty-gritty details of the language mechanisms unless you are going to author types with ‘custom’ equality/comparison semantics, and those cases are relatively rare.

Advertisements

2 Responses to “Motivating F# equality and comparison constraints”

  1. Daniel said

    Just for the record, the = and < motivation you gave (List<int> as an IComparable) is of course quite valid for F#, but it\’s hardly a theoretical limitation of statically typed languages or even languages on the CLR. Scala runs (for the most part) on the CLR and it supports this sort of thing.The key is typeclasses. Throwing terminology to the wind, we define int to be in the typeclass IComparable and we define Control to *not* be in that typeclass. We further define List<T> to be in the typeclass IComparable iff T is also in the typeclass IComparable. We can do this in a couple ways in Scala, but the best way is to use typeclasses explicitly in an implicit parameter:trait List[+A] { … def <(ls: List[A])(implicit f: A => Ordered[A]) = …}Use of the < operator on lists will *only* typecheck for lists with component types which are convertable to Ordered (the Scala equivalent of IComparable). This is another way of saying "in the typeclass Ordered". Thus, Scala provides all of the equality flexibility of F#, but without making equality a core language feature.

  2. Brian said

    Don goes on to mention type classes in his section on \’design alternatives\’, and a discussion on why F# chose this point in the design space for now, so do be sure to read his blog. You\’re right that a more general/elegant mechanism can potentially be had here, but it comes with some of its own complications as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: