Inside F#

Brian's thoughts on F# and .NET

Archive for October, 2009

Overview of type inference in F#

Posted by Brian on October 25, 2009

I was listening to the StackOverflow podcast this week, as I do every week, and in this week’s podcast, Joel Spolsky and Scott Hanselman talked briefly about type inference, e.g. extending C#’s “var” keyword to parameters, rather than just local variables (listen for yourself – about 6 minutes, starting at 1:22:30).  The discussion was spirited, but murky: Joel said he knew it is possible, because some languages do it (citing examples); Scott said that as a layman he couldn’t immediately see how type inference could work in certain situations (mostly involving overloading).  So it seems the time is ripe for me to try to shed a little light on the subject.

Today’s blog post describes type inference in the context of F#, but as you read you’ll see how much of the same reasoning can be applied to other programming languages.  For those readers who already know F#, this blog entry might help solidify your mental model of how the F# type inference process works, and for those who are new to type-inferred languages, this blog entry will hopefully provide you with a basic framework to reason about what type-inferred language can (and cannot easily) do.

Type inference basics

To start things off, consider some very simple F# code:

let x = 3
let b = false

The compiler infers that ‘x’ has type ‘int’ and that ‘b’ has type ‘bool’, based on the type of the literal values on the right-hand-sides.  In a language like C# 2.0 you’d have to explicitly declare each variable’s type, whereas nowadays in C# you can just use ‘var’ and the types are inferred like with the ‘let’ keyword of F#.  (If you’re using Visual Studio, you can put your mouse over the declarations and the hover-tooltips will show you the inferred type.)

Though this feature does not buy you much in simple examples like the previous one, it starts earning its keep in cases like

open System.Collections.Generic 
let d : Dictionary<int,string> = new Dictionary<int,string>()
for kvp : KeyValuePair<int,string> in d do
    printfn "%d: %s" kvp.Key kvp.Value

where I’ve explicitly declared the types of ‘d’ and ‘kvp’.  Putting a type declaration on ‘d’ is completely redundant, since the same long type name is mentioned again a few characters later on the right-hand side.  And Dictionary is one of the most commonly used collection types in the framework; .Net developers know they can iterate over each of the KeyValue pairs in a Dictionary.  So we can get rid of the noise and write the code as just

open System.Collections.Generic 
let d = new Dictionary<int,string>()
for kvp in d do
    printfn "%d: %s" kvp.Key kvp.Value 

where the types of ‘d’ and ‘kvp’ are inferred.

Type inference of parameters and return types

It is straightforward to extend type inference to parameters and return types in some cases, based on the body of the method or function.  For example, consider the function

let Not(b) =
    if b then
        false
    else
        true

In the body of the function, ‘b’ is used in an “if”, so ‘b’ must have type ‘bool’.  And the if-then-else expression is also returning boolean literals.  So the compiler can easily infer that Not is a function that takes a bool and returns a bool.

Types can be inferred not just from uses of literals and control constructs (as in the last example) or constructors (as with Dictionary in the prior example), but also from calls to other methods whose type signatures are known.  Consider this contrived function to determine if a string is longer than 50 characters:

let IsLongString(s) =
    if System.String.IsNullOrEmpty(s) then
        false
    else
        s.Length > 50

Here, the first line of the function body calls String.IsNullOrEmpty(s), and that method takes a single parameter of type ‘string’ and has no other overloads.  As a result, the compiler can infer that ‘s’ must be a string.  The whole body resolves to a boolean, so the compiler infers that IsLongString takes a string and returns a bool. 

The examples in this section demonstrate that for many functions, inferring the parameter and return types is straightforward by simple analysis of the body of the function.  (Note also that eventual call sites of these functions need not be taken into account; the types here are completely decided based on the function bodies, not on how these functions might eventually be called.)

Type inference and generics

The power of type inference really shines when generics are involved.  For example:

let PrintAll(s) =
    for x in s do
        printfn "%A" x

What is the type of PrintAll?  Well, the ‘for’ statement constrains the heretofore unknown ‘s’ to be a seq<’T> (an alias for IEnumerable<T>, for those readers less familiar with F#), and so ‘x’ will have the corresponding ‘T’ type.  The ‘printfn’ line adds no further constraints (‘”%A” is the ‘any’ format specifier, which works for all types, just calling .ToString() if needed).  The type inference logic cannot nail down ‘T’, but the code in the body will work for all types ‘T’, so we can simply infer this to be a generic function.  Thus PrintAll’s type is inferred as “seq<’T> –> unit”, that is, a generic function that takes any IEnumerable<T> and returns nothing.  The corresponding C# would be

static void PrintAll<T>(IEnumerable<T> s)
{
    foreach (var x in s)
    {
        Console.WriteLine(x);
    }
}

Note that you can specify the function signature explicitly in F# too, if you so choose:

let PrintAll<'T>(s : seq<'T>) : unit =
    for x in s do
        printfn "%A" x

but F# lets you leave off the generic parameter declaration “<’T>”, or the argument type “: seq<’T>”, or the return type “: unit”, or all three of these type annotations, since all of these can be inferred from the function body.

For a more advanced example with generic constraints, consider this F# function:

let PrintAndDispose(s) =
    for x in s do
        printfn "%A" x
        (x :> System.IDisposable).Dispose()

This is like the last example, except after we print each item in the sequence we upcast it to IDisposable and Dispose() it.  The inferred argument type is now a seq<’T> for all ‘T that inherit IDisposable.  The corresponding C# is:

static void PrintAndDispose<T>(IEnumerable<T> s) where T : IDisposable 
{
    foreach (var x in s)
    {
        Console.WriteLine(x);
        x.Dispose();
    }
}

The ‘where’ clause constrains the generic parameter to be something that inherits IDisposable.

The examples in this section demonstrate that type inference can not only infer simple argument and return types, but even generic types or generic types with constraints.  Type inference is powerful, but it is not without limitations, as we’ll see next.

Type inference when overloading is involved

To paraphrase myself,  “overloading is the bugaboo of type-inferred languages”.  Consider this simple contrived example:

let s = "abcdefg" 
let FindIndexOfChar(c) =
    s.IndexOf(c)

Unlike the previous examples where the F# compiler easily inferred the types of everything, this time it falls down.  The first three error diagnostics the compiler issues are:

...\Program.fs(44,5): error FS0041: 
A unique overload for method 'IndexOf' could not be determined based on type information prior to this program point.
The available overloads are shown below (or in the Error List window). A type annotation may be needed. ...\Program.fs(44,5): error FS0041: Possible overload: 'System.String.IndexOf(value: char) : int'. ...\Program.fs(44,5): error FS0041: Possible overload: 'System.String.IndexOf(value: string) : int'.

(In fact there are a bunch more error lines, because there are a bunch more overloads of System.String.IndexOf.)  This example gets to the crux of the Joel & Scott discussion from the podcast.  Overloading introduces an ambiguity – in this example ‘c’ could have either type ‘char’ or type ‘string’ based on the overloads, and the F# type inference system doesn’t know what to infer.

(Programming language wonks may now want to take a moment to read the section “For the wonks” at the bottom of this blog entry, where I discuss other ways to deal with overloading, such as Haskell “type classes” and F# inline functions with “hat” types.  That’s a needless digression for most readers, though.)

In F#, the ‘answer’ is simple; as the compiler says, “a type annotation may be needed”.  So we just add one:

let FindIndexOfChar(c : char) =
    s.IndexOf(c)

Type inference is not all-powerful in F#, there are cases where you must explicitly specify types.  Overloaded methods are a case where types are ambiguous in the absence of other constraints; when you call some.Method(…) and the method is overloaded, you need other prior type information to determine which overload is intended.  This applies only for an overloaded method; recall the example:

let IsLongString(s) =
    if System.String.IsNullOrEmpty(s) then
        false
    else
        s.Length > 50

IsNullOrEmpty is not overloaded, and thus we can infer that ‘s’ is a string based on this method call.

The example in this section demonstrates one case where a type cannot be inferred and an explicit type annotation was required.  Next let’s look at another such case.

Another stumbling block for type inference

Consider a different variation of the IsLongString function:

let IsLongString2(s) =
    if s = null then
        false
    else
        s.Length > 50

Now let’s reason through what we can infer about ‘s’.  In the first line of the body, all we know is that ‘s’ is a reference type rather than a value type (since we’re comparing it to null).  And then on the last line, we try to call the .Length member.  Well, a lot of .Net types have a .Length member, and there’s no way to infer which of those types we mean here.  This time the compiler error points to s.Length, reporting

Program.fs(54,9): error FS0072: Lookup on object of indeterminate type based on information prior to this program point. 
A type annotation may be needed prior to this program point to constrain the type of the object.
This may allow the lookup to be resolved.

In other words, if you want to do “o.Prop” or “o.Method(…)” on some object ‘o’, you must already know the type of ‘o’ in order to perform the member lookup.  In this case we don’t already have the type of ‘s’ pinned down, so we can’t call s.Length.  Once again, the fix is simple – add a type annotation!

let IsLongString2(s : string) =
    if s = null then
        false
    else
        s.Length > 50

And now we’re good.

This section and the previous one show the two most common cases where F# will require a type annotation because types could not be unambiguously inferred.  Most of the times when the compiler will require you to declare a type, it’s because of one of these two cases.

Summing up type inference in F#

As we’ve seen, F# can infer types in a variety of contexts including

  • types of local variables
  • types of arguments to functions/methods
  • return types
  • generic types
  • type constraints on generics

and it can infer this information in a variety of ways, including

  • knowing the types of literals like ‘false’ or ‘3’
  • knowing the types of newly constructed objects like “new Dictionary<int,string>()”
  • knowing the type signatures of called functions or non-overloaded methods
  • inferring types based on control constructs (e.g. inferring IEnumerable<T> in “for x in s do…” or inferring bool in “if e then…”)

Nevertheless there are some cases where types cannot be inferred; in all such cases you can simply add a type annotation to explicitly declare the type.  Indeed, even when types can be inferred, you can still choose to declare the type – it’s up to you.  When you are accustomed to working in statically-typed languages without inference, it can be a little daunting when you’re first starting out in a type-inferred language and there are no types declared anywhere.  As Scott said in the podcast, “it’s just syntactic sugar”.  So you can “sweeten to your taste”.  You can always fall back to writing types everywhere in F#, just as you would in C# 2.0.  But as you do more and more functional programming, I think you’ll find that type inference is really handy.  My favorite example to drive home that point is this F# code:

let MyFunc(f,s) =
    seq {
        for x in s do
            yield f(x)
    }

In C# 4.0, the same function is written as

static IEnumerable<U> MyFunc<T, U>(Func<T, U> f, IEnumerable<T> s)
{
    foreach (var x in s)
    {
        yield return f(x);
    }
}

The bodies of these functions are pretty comparable, but the C# declaration line is pretty big compared to the F#.  Of course, you might recognize these functions as variations of F#’s “Seq.map” and C#’s “Enumerable.Select”.  When you do a lot of functional programming, you end up using and authoring lots of transformative functions like these (that take functions as arguments and apply those functions in various ways to perform transformations of input sequences), and it is nice to not have to clutter these functions with lots of type annotations.

That’s pretty much it for today; those interested in a little more content/detail can keep reading, but the remaining sections are “optional”.  :)

Parenthetical aside

Though type inference is “just” syntax sugar, it really can matter; there are cases where you’d never write cool functional programming code in C# because you get completely swamped under by the type annotations.  As an example see here; one of the C# functions in that example is this monstrosity:

static Tree<KeyValuePair<A, bool>> DiffTree<A>(Tree<A> tree, Tree<A> tree2)
{
    return XFoldTree((A x, Func<Tree<A>, Tree<KeyValuePair<A, bool>>> l, Func<Tree<A>, Tree<KeyValuePair<A, bool>>> r,
Tree<A> t) => (Tree<A> t2) => Node(new KeyValuePair<A, bool>(t2.Data, object.ReferenceEquals(t, t2)), l(t2.Left), r(t2.Right)), x => y => null, tree)(tree2); }

and yes, every single one of those type annotations is required by C# 4.0.  Here’s the corresponding F# code that uses the same identifier names:

let DiffTree(tree,tree2) = 
    XFoldTree (fun x l r t t2 ->  
        let (Node(x2,l2,r2)) = t2 
        Node((x2,t===t2), l l2, r r2)) (fun _ _ -> Leaf) tree tree2

This code is taken from an advanced example, and without any context I do not expect the reader to be able to understand either code snippet.  The point is just to illustrate how “sugar can matter”; without syntax sugars like type inference, some advanced functional programming techniques are just hopeless because the code is a horrible mess of generic type names and angle brackets that make it practically impossible to see where the type declarations end and the logic/computation begins.

For the wonks

There are other strategies used by various languages to deal with ‘overloading’.  For example, in some ML dialects, there is no overloading allowed at all.  Problem solved!  Of course means that, for example, you need a different function to add two integers than to add two floating point numbers.  As a result, you’ll find two different addition operators in these ML dialects, where + (plus) is used to add integers and +. (plus dot) is used add floats.  The same is true of the other math operators.  Kinda annoying; especially in the case of common arithmetical operators, overloading is pretty handy.

Then you have languages like Haskell, that use a language feature called “type classes” to deal with cases like overloading math operators for all numeric types.  Operators like + are members of the type class named “Num”, and then types like integers and floats are instances of the Num class and each define their own specific implementations of those operators.  Then you can write generic code that works for all numeric types.  (You might liken the implementation to a blending of constrained generic types and OO-style runtime polymorphism.)

In F#, operators like + are defined as “inline” functions which use generic type variables of the form “^T” (which I call “hat types”) that can do certain limited kinds of ad-hoc overloading logic based on specific concrete types or member constraints.  You can Bing for “F# duck typing” if you are interested in learning more.  These advanced features help compensate for the lack of a way to express abstractions like “type classes” on .Net or in F#.

All that said, none of these techniques apply the .Net-style overloading example I showed earlier (with String.IndexOf).  This kind of overloading, where the same method name is used multiple times with arbitrary parameter lists that differ by type or number, is known as “ad-hoc overloading”, and I don’t think there are any languages that try to infer types from such overloaded calls.  Contrast this kind of overloading from the + case; in the case of +, there is a set of numeric types T that all support the form T+T->T (as well as some other operators like ‘*’) with similar semantics and type schemata.  Whereas String.IndexOf just has a bunch of overloads with varying numbers of parameters that do various similarish things (find the index of a string, or a char, starting at the beginning, or maybe at a given starting index, …) and they all happen to have the same method name.  There’s no intrinsic commonality among “char c” and “string s” and “int startIndex” and “StringComparison compType”, these types just make for handy argument sets of String.IndexOf.  Whereas there is an intrinsic commonality among “int x” and “float x” and “BigInteger x” when it comes to operators like + and * and functions like “abs”.  There are various language strategies to cope with type inference in the “intrinsic commonality” cases where many types share the same operations (Haskell type classes, F# inline hat types, …), but personally I know of no mechanisms to deal with inference for ad-hoc overloading of a particular method name on a single type, a la .Net overloaded methods.

Posted in Uncategorized | 7 Comments »

Another release of F# along with VS2010 Beta2

Posted by Brian on October 19, 2009

If you’ve been anxiously awaiting another release of F#, then today is your day!  Check out Don’s blog for the release announcement as well as the release notes.  As usual, you can either download the CTP to run atop your VS2008 (or VS Shell), or install the VS2010 Beta.

That’s the big news… to avoid a one-paragraph blog entry for today, I thought I’d take a moment to point out a few new small features that are not mentioned in the release notes.

Error messages have unique numbers

If you are very observant, you may have noticed that prior releases of the F# compiler used the error code “FS0191” for a wide variety of error messages (perhaps 1000 different circumstances!).  For example:

type Foo() =
    private member this.F() = ()    

would yield

...\Program.fs(2,5): error FS0191: Visibility declarations should come immediately prior to the identifier naming a construct

in the prior release.  But now you get

...\Program.fs(2,5): error FS0531: Accessibility modifiers should come immediately prior to the identifier naming a construct

Not a huge difference, but moving forward, if you are looking for more information about an error message, it should be much easier to Bing for e.g. “FS0531” and find the exact information you’re looking for (rather than sifting through lots of info about unrelated errors).

Some common error diagnostics have been improved

Using the prior release, this (contrived) code snippet

let f x = 42
let r : unit = f 0
f 0

would yield these two diagnostics:

...\Program.fs(2,16): error FS0001: This expression has type  int but is here used with type  unit
...\Program.fs(3,1): warning FS0020: This expression should have type 'unit', but has type 'int'.

I must confess, it always irked me that these two similar messages had the ‘expected’ versus ‘actual’ types presented in opposite orders.  With the new release, you get:

...\Program.fs(2,16): error FS0001: This expression was expected to have type     unit     but here has type     int
...\Program.fs(3,1): warning FS0020: This expression should have type 'unit', but has type 'int'. 
Use 'ignore' to discard the result of the expression, or 'let' to bind the result to a name.

Now these two diagnostics are parallel, and furthermore FS0020 now reminds you that you want to either pipe the result to ignore (e.g. if just calling a function for its side-effects) or else use the result.  There are a handful of minor diagnostic improvements like this (though there is still plenty of room for us to continue to improve).

VS2010 Beta2 has more item templates

In the prior release, ‘Add New Item’ on an F# project only had three default item templates:

 Beta1ItemTemplates

In the VS2010 Beta2 release (but not in the CTP) there are three more – templates for “xml file”, “text file”, and “app.config” are in the list by default.  (Sorry, I have no screenshot handy.)  Note that regardless of what templates you have, you can always ‘Add Existing Item’ on some file to add it to the project, and VS is smart about picking an appropriate editor for that file type.  For example, drop an HTML file in the directory of an F# project, ‘Add Existing Item’ of that file, double-click the solution explorer icon for the new file, and the file is opened in the HTML editor.

More to come!

I have more topics I intend to blog about regarding the latest release of F#, and hope to do so soon.  In the meantime, go download the latest version (installing only after uninstalling the previous one), and as always, send us feedback or bug reports!

Posted in Uncategorized | Leave a Comment »