Inside F#

Brian's thoughts on F# and .NET

C# 3.0, lambda, and the first post of a series about “monadic parser combinators”

Posted by Brian on December 2, 2007

Where to begin?  Today I’ll start briefly with a little background about me.  Then I want to describe the new lambda feature of C# 3.0 and why it is good. Finally I’ll introduce "monadic parser combinators" as a cool application of the new C# 3.0 features – I really enjoy parser combinator libraries, and so they are my current motivation for writing a series of blog entries about code-related stuff.

Some quick background on Brian

Since this is a new blog for me, I should write a tiny bit of relevant history about myself.  In my previous life, I was a programming languages Ph.D. student who focused on multi-paradigm programming.  Much of my work in grad school went in to researching and developing a library called FC++, a C++ template library that enabled functional programming in the style of modern functional programming languages like Haskell.  I think the work I did in school is indicative of a more general trend.

For a while now, it seems that pragmatic real-world programming has been dominated by object-oriented languages, whereas functional languages have been outside the mainstream, used mostly in academia.  But the trend is that more and more functional features are appearing in mainstream object-oriented languages.  I am glad for this trend, as the functional programming style often makes it possible to raise the abstraction bar to a higher level, realize new kinds of designs, and author more succinct, precise code.

Lambda and C# 3.0

Ok, let’s talk some code.  One of the most important features common to any functional programming language is lambda.  In a functional language, functions are first-class values, and anonymous functions can be consed up on the fly using lambda syntax.  Both the choice of syntax and the anonymity matter a lot.  Consider, for example, a few different ways to write a particular function.  First, in C#, without delegates:

public static int TenXPlusY(int x, int y) 
{ return 10 * x + y; }

Then in C# 2.0, using delegates:

delegate(int x, int y) { return 10*x+y; }

Next, in C# 3.0 using the new lambda feature:

(x, y) => 10*x+y

And finally, for comparison, the same function expressed in Haskell (a popular functional language):

\x y -> 10*x+y

Lambdas differ from the other options in two main ways: anonymity, and succinct syntax.

Anonymity is important, because in functional programming, one often codes in a style that involves using combinators and higher-order functions (functions that take other functions as parameters).  When coding in this style, you end up writing a lot of tiny one-off functions that don’t need or deserve a name.  Being forced to use named functions clutters up the code and puts needless physical distance between the function definition and its sole use.  To be sure, in functional languages you still author plenty of named functions.  The point is that some functions need names and reuse, whereas others do not, and so it is important for a language to support both styles.

Succinctness of syntax is the other important detail.  Some people may argue that syntax details of a language are just that – details – and those details don’t qualitatively affect the use of a language.  Those people are wrong.  :)  Syntax is an important consideration; the most commonly used features of a language must be succinct.  To illustrate the point at the extreme, imagine if instead of the code on the left you had to use the code on the right:

x+3               Plus(var x, literal integer 3)
obj.ToString()    select method ToString from obj and invoke with parameters: ()

The point of this contrived example: when common language constructs are verbose, simple everyday code becomes a complex deluge of noise.  This is what makes the new lambda capabilities of C# 3.0 so notable to me; there is a big difference between these two forms:

delegate(int x, int y) { return 10*x+y; }   // C# 2.0
(x, y) => 10*x+y                            // C# 3.0

The obvious difference is the succinctness, which I’ve already demonstrated the value of.  The other notable difference is that the new lambda does not require redundant type annotations.  Static typing is an invaluable asset, but statically-typed languages often go overboard when it comes to requiring the programmer to decorate every function, parameter, and variable with a type annotation.  When source code is riddled with too many type annotations, they become a noisy hindrance to readability.

Applications – introducing monadic parser combinators

Though it is fun to wear the hat of the hobbyist programmer and tool around with the new features in C# 3.0, at the end of the day I would like to be at least a little pragmatic and talk about useful applications of those features.  First I must pause briefly to note that programming language design itself requires a great deal of balance among pragmatics, aesthetics, and complexity; language design is tricky business, and I think the C# guys have done a good job incrementally delivering important features cleanly into the language.  Here’s a quote from a blog from someone on the C# team; I think it captures the spirit of what I’m trying to communicate about the pragmatics of language design:

We’re not adding lambdas because we’re a bunch of functional language wonks who love lambdas — I mean, we are that, but that’s not why we’re adding lambdas.  We’re adding lambdas because lambdas will help massively with query comprehensions, and we’re adding query comprehensions because our research shows that pro devs could really use a language-integrated approach to querying disparate datasets without losing the benefits of static typing.

Nice.  (Check out the various documents about LINQ to learn more about query comprehensions.)

Ok, so let’s talk about a cool application of the new C# features: they make it easier to author monadic parser combinators.  What in the world are monadic parser combinators? Let’s look at each individual word.  Monads are a concept from functional languages like Haskell – they are a specific kind of abstract data type with certain properties, which I won’t go into the details of now.  A Parser is a basically a function that takes as input a string (or more generally, a stream of tokens), and returns the result of parsing the input (which might be a parse tree, or any other evaluation of the input).  A Combinator is a function that combines other functions to create new ones.  And thus, a monadic parser combinator library enables a programmer to assemble complex parsers out of simpler ones, via a library of generally useful parser combinators, which manipulate parsers that are represented using a specific sort of abstract data type.  Clear as mud, right?

Let’s look at an example.  Consider a tiny expression parser, which can parse and evaluate numeric expressions like this one:

"(100+41+1-42)/10-5*2"      // evaluates to 0

That is, the language we want to parse knows about natural numbers, the four main math operators with the usual precedence, and parentheses.  A grammar for this little language would look something like this:

expression ::= term (addOp term)*
term ::= factor (expOp factor)*
factor ::= natural | group
group ::= '(' expression ')'

It turns out that with a suitable library, we can write C# code for this parser-evaluator that looks like this:

Expr   = Term.Chainl1(AddOp);
Term   = Factor.Chainl1(MulOp);
Factor = Nat.Or(Group);
Group  = Parser.Literal("(").Then_(
         Expr.Then(e =>
         Parser.Literal(")").Then_(
         Parser.Return(e))));

Note the similarity to the original language grammar.  We end up with an expression parser that works like this:

Expr.Parse("(100+41+1-42)/10-5*2")  // returns 0

Here, entities like "Expr", "Term", and "Nat" are parsers that take a string and evaluate it down to an integer.  The various functions like "Chainl1", "Or" and "Then" are the parser combinators, which build up complex parsers out of simpler ones.  I’ve left out the definitions of some of the parsers for this expression evaluator (like "AddOp" and "Nat") to keep things simple for now.  (In fact, I posted the code for "Nat", which parses a natural number, in my previous blog entry.)  But hopefully you at least get some vague sense of what parser combinators are, based on this quick glimpse at an example.  There is still a lot of mystery, but it’s time to quit writing for today.

Over the next few days I intend to write some more blog entries that delve into the details of monadic parser combinators.  Apart from being an interesting application to show off new features of C# in practice, a monadic parser combinator library demonstrates an effective way to rapidly author parsers, letting you succinctly code at a very high level of abstraction (on par with the grammar of the language being parsed).  The other chief advantage of parser combinators is that they are expressed as a library in a familiar host language (e.g. C#), rather than being expressed in a separate language (with its own toolset) that then generates parsers in a host language (as with such tools as yacc or ANTLR).

If you want to preview some of the material I intend to present, you can check out a talk I gave a few years back about monadic parser combinators in C++ using FC++.  Or check out the Parsec library for parsing in Haskell.  Regardless, stay tuned to this blog space to learn about monadic parsing in C#.

Fun!  :)

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 30 other followers

%d bloggers like this: