Inside F#

Brian's thoughts on F# and .NET

Archive for May, 2010

Dear Proggit: graphs are cool, but I prefer F#, so I graphed the subreddit interconnections with F# and DGML

Posted by Brian on May 26, 2010

Today I saw this and I thought, hey, that’s pretty cool!

So of course I had to code my own version with F#.  First off, the results, when starting from the “music” subreddit:

SubRedditConnections

(To see the full size version, click here.)

Each node contains the name of the subreddit and the number of readers.  The font size is proportional to log(numReaders).  The color is determined by community age, with young ones being red and rainbowing across to old ones being blue.  And of course the graph arrows are the links to other related subreddits.

All the info is scraped from the reddit sidebars using some hackish Regexes and XML parsing.

Of course I used F# async for the non-blocking I/O to grab multiple web pages concurrently so that the program runs fast.  I used an agent and .NET 4.0 concurrent collections to manage mutable updates to state without creating data races.

All told, just about 70 lines of code to scrape reddit for the info and 40 more to generate the DGML that VS2010 then renders into the pretty graph.  I just hacked it together tonight, so the code is maybe not awesome, but it is good enough to share.  Code is below, have fun with it.  F# and DGML are cool!

// reddit uses XHTML, hurrah
open System.Xml.Linq
let XN name = XName.Get(name, "")

// state used by the workers (multi-threaded)
let results = new System.Collections.Concurrent.ConcurrentBag<_>()

// state managed by the agent (which is logically-single-threaded)
let mutable visited = new System.Collections.Generic.HashSet<string>()
let mutable started = new System.Collections.Generic.HashSet<string>()
let allDone = new System.Threading.ManualResetEvent(false)

type Message =
    | EnsureVisited of string
    | FinishedVisiting of string

printfn "Periodically showing number of known remaining links to follow (as progress indicator)..."
let rec agent = MailboxProcessor.Start(fun mbox ->
    let numIters = ref 0
    let rec Loop() =
        async {
            let! msg = mbox.Receive()
            match msg with
            | EnsureVisited url -> 
                if not(visited.Contains(url)) && not(started.Contains(url)) then
                    started.Add(url) |> ignore
                    visit url |> Async.Start 
            |  FinishedVisiting url ->
                started.Remove(url) |> ignore
                visited.Add(url) |> ignore
            incr numIters
            if !numIters % 10 = 0 then
                printf "%d " started.Count 
            if started.Count <> 0 then
                return! Loop()
            else
                allDone.Set() |> ignore
        }
    Loop())

and visit reddit = async {
    use wc = new System.Net.WebClient()
    let! xhtml = wc.AsyncDownloadString(new System.Uri("http://www.reddit.com/r/"+reddit))
    let y = System.Text.RegularExpressions.Regex.Match(xhtml, "<span class=\"age\">a community for (\d+) years?</span>")
    let m = System.Text.RegularExpressions.Regex.Match(xhtml, "<span class=\"age\">a community for (\d+) months?</span>")
    let months = if y.Success then 12*int(y.Groups.[1].Value) elif m.Success then int(m.Groups.[1].Value) else 0
    let getTitleBox (xhtml:string) =
        // extremely quick and dirty way to parse out just the bit I want
        let start = xhtml.IndexOf("<div class=\"titlebox\">")
        let mutable finish = xhtml.IndexOf("</div>", start)
        while(finish <> -1 && try XElement.Parse(xhtml.Substring(start,finish-start+6)); false with e-> true) do
            finish <- xhtml.IndexOf("</div>", finish+1)
        if finish = -1 then failwith "could not parse"
        xhtml.Substring(start,finish-start+6)
    let getAttrVal attrName (e:XElement) =
        let s = e.Attributes(XN attrName)
        if Seq.length s = 1 then Some((Seq.head s).Value) else None
    let xe = XElement.Parse(getTitleBox xhtml)
    let numReaders = xe.Descendants(XN "span") 
                  |> Seq.filter (fun e -> match getAttrVal "class" e with None -> false | Some s -> s="number") |> Seq.head 
    let urls = 
        xe.Descendants(XN "a") |> Seq.choose (getAttrVal "href") |> Seq.choose (fun u -> 
            let m = System.Text.RegularExpressions.Regex.Match(u, @"^(?:http://(?:www.)?reddit.com)?/r/(\w+)/?$")
            if m.Success then Some(m.Groups.[1].Value.ToLowerInvariant()) else None) |> set
    results.Add(reddit, numReaders.Value, urls, months)
    for u in urls do
        agent.Post(EnsureVisited u)
    agent.Post(FinishedVisiting reddit)
}

// kick it off with a starting reddit
agent.Post(EnsureVisited "music")
allDone.WaitOne() |> ignore

// make DGML of the results
let sb = new System.Text.StringBuilder()
let add (s:string) = sb.AppendLine(s) |> ignore
add @"<?xml version=""1.0"" encoding=""utf-8""?>"
add @"<DirectedGraph GraphDirection=""BottomToTop"" Layout=""Sugiyama"" xmlns=""http://schemas.microsoft.com/vs/2009/dgml"">"
add @"  <Nodes>"
for (reddit,numReaders,links,months) in results do
    add <| sprintf "    <Node Id=\"%s\" Label=\"%s
%s\" NumReaders=\"%d\" Months=\"%d\" />" 
            reddit reddit numReaders (System.Int32.Parse(numReaders, System.Globalization.NumberStyles.AllowThousands)) months
add @"  </Nodes>"
add @"  <Links>"
for (reddit,numReaders,links, months) in results do
    for l in links do
        add <| sprintf "    <Link Source=\"%s\" Target=\"%s\" />" reddit l
add @"  </Links>"
add @"  <Styles>"
add @"    
    <Style TargetType=""Node"" GroupLabel=""NumReaders"" ValueLabel=""Function"">
      <Condition Expression=""NumReaders &gt; 0"" />
      <Setter Property=""FontSize"" Expression=""Math.Max(12,3*Math.Log(NumReaders))"" />
    </Style>
    <Style TargetType=""Node"" GroupLabel=""Months"" ValueLabel=""Function"">
      <Condition Expression=""Months &gt; 12"" />
      <Setter Property=""Background"" Value=""#FF6666FF"" />
    </Style>
    <Style TargetType=""Node"" GroupLabel=""Months"" ValueLabel=""Function"">
      <Condition Expression=""Months = 12"" />
      <Setter Property=""Background"" Value=""#FF66FF66"" />
    </Style>
    <Style TargetType=""Node"" GroupLabel=""Months"" ValueLabel=""Function"">
      <Condition Expression=""Months &gt;= 9"" />
      <Condition Expression=""Months &lt; 12"" />
      <Setter Property=""Background"" Value=""#FFFFFF44"" />
    </Style>
    <Style TargetType=""Node"" GroupLabel=""Months"" ValueLabel=""Function"">
      <Condition Expression=""Months &gt;= 5"" />
      <Condition Expression=""Months &lt; 9"" />
      <Setter Property=""Background"" Value=""#FFDDBB66"" />
    </Style>
    <Style TargetType=""Node"" GroupLabel=""Months"" ValueLabel=""Function"">
      <Condition Expression=""Months &lt; 5"" />
      <Setter Property=""Background"" Value=""#FFFF6666"" />
    </Style>"
add @"  </Styles>"
add @"</DirectedGraph>"
System.IO.File.WriteAllText(@"graph.dgml", sb.ToString())

Posted in Uncategorized | 1 Comment »

F# for Silverlight 4 available

Posted by Brian on May 17, 2010

Today the final Silverlight 4 Tools for Visual Studio 2010 were released (go here for download link).  These tools include the F# runtime (FSharp.Core.dll) for the Silverlight 4 runtime.  For those who may have previously been held up developing with F# for Silverlight 4, today is the day to get unblocked!

To commemorate the occasion, I made a tiny F# ‘hello world’ Silverlight application in the traditional fashion (a C# app with an F# library).  I’ll walk you through the steps.

(Ensure you have already installed the final version of Silverlight 4 tools for VS2010.)

In VS, go to the ‘New Project’ dialog and select the ‘F# Silverlight Library’ template

FS_SL_Library

Then when it asks to choose a Silverlight version, pick Silverlight 4:

FS_SL_PickVersion

Then, for the purposes of this example, I replaced the code in Module1.fs in the new project with this code:

namespace MyFSharp

type MyType() =
    static member FilterOutZs (strs:seq<string>) =
        seq { for s in strs do
                if not(s.StartsWith("Z")) then
                    yield s }

We’ll see how I’ll use this code shortly.

Next, right click on the solution in Solution Explorer and ‘Add… New Project’ a ‘C# Silverlight Application’:

CS_SL_App

Then it will pop up a dialog about hosting the new Silverlight app, I choose to uncheck the ‘Host the Silverlight application in a new web site’ box.  Once again, be sure that ‘Silverlight 4’ is selected as the Silverlight version in the dialog.

CS_SL_Dialog

Next I added the highlighted bit to the MainPage.xaml in the C# app:

CS_SL_Xaml

and then in the C# code-behind, MainPage.xaml.cs, I had this handler:

private void TheText_MouseEnter(object sender, MouseEventArgs e)
{
    this.TheText.Text =
        string.Join(" ", MyFSharp.MyType.FilterOutZs(
            new[] { "ZZZ", "Hello", "ZZZ", "from", "F#", "ZZZ" }).ToArray());
}

which calls my F# code.  To make this compile, I need to add a project reference, so right click on the C# project, ‘Add Reference…’ select the ‘Projects’ tab in the dialog, and select the F# library from your solution. 

Right click the C# app in Solution Explorer and select ‘Set as StartUp Project’.  (If I’d created the app first, and then added the library, rather than the other way around, I wouldn’t need this step.)

Now I can press F5 to run it, and I see in my browser:

HelloFromFS

Not the most enthralling app ever, but it shows that F# is working with Silverlight 4.  Of course you already know how to make more exciting F# Silverlight apps.

Have fun enjoying Silverlight 4 with F#!

Posted in Uncategorized | 2 Comments »

More F# screencasts

Posted by Brian on May 10, 2010

My previous two screencasts (‘Getting started with F# in VS2010’ and ‘Editing F# source code in VS2010’) have been moved to MSDN (click here), which means they’re now available in a variety of video formats.  A new screencast (‘Managing F# projects in VS2010’) has also been added (direct link), so check it out!  I also hope to do a screencast about using the Visual Studio debugger with F#.  (In the meantime, check out the Visual Studio Tips blog, which has a number of recent tips about breakpoints that are relevant to debugging.)  There’s even an older F# video by former Microsoftie and current Googler Chris Smith linked on the F# videos page – ah nostalgia!

(In addition to the other videos linked from the F# Dev Center, you can also find more videos on F#, like Tomas’ webcast series, or Andre’s videos, or F# for Visualization, or a number of other F# videos you may find via a web search.  I’m glad to see so many people producing F# content!)

What other F# screencast topics would you like to see?  (I’ve been focused on VS2010 UI features that lend themselves to screencasts, as opposed to language features that can be adequately covered by blogging prose, but I am not completely opposed to other ideas.)

Posted in Uncategorized | Leave a Comment »

Off-topic

Posted by Brian on May 5, 2010

I should probably be doing “real work” now, but don’t feel like it.  And so then I should probably be writing a “real” blog post now, but I don’t feel like that either.  (Though I do have some ‘real’ content in the pipeline; watch this space for another F# screencast soon.)  But I feel the need to write something to get “un-stuck” on the “writing” front, so I’m going to do an atypical, completely self-indulgent blog post where I talk about all the ideas I have sitting in my “to blog” folder, teasing you with great ideas and rants, most of which I’ll never find the time to write or publish.  And I’ll ask for your opinion and then probably ignore it.  Life is not fair!

Despite that awful disclaimer, here you are, still reading, so let’s get to it.

Brian’s random blogging ideas

In absolutely no particular order.  And just a sample of the full list.

Write a prolog unification engine and EDSL in F#.  Well, duh, of course, I did this in a past life, and it’s a simple general component with a number of potential uses.  Oh, and I said this.  Hm.

Encapsulation using classes versus functions.  The classic “how to model classes using closures over mutable state” stuff you might cover in a LISP/Scheme class, but in F#.  Not all that interesting; cool the first time you see it, but then nothing special.

F# Units of Measure.  Something on the topic, of course illustrating things like Smoots and “rad” as an alias for the “1” measure, and whatnot.

Bling.  I need another lifetime to go be a designer and learn all that cool stuff, darn.  But it looks very, very suitable to an F# EDSL.

LazyLists versus IEnumerables.   Discuss the one thing IEnumerables suck at (“tail”) and when and how to apply the PowerPack LazyList type.

Tubes.  Something like Yahoo Pipes, but with an IObservable programming model and a visualization layer and cool combinators and whatnot and it would be cool somehow maybe.  (That’s a concrete design proposal, right?)

What are types?  I’m a fan of static typing, but it’s useful to step way back and talk about types as a set of values, or an interface of operations, closed versus open, automatically separately checkable, abstraction-plateau, puzzle-piece-shaped, design objects, places to hang encapsulation boundaries or invariants or semantics or docs, nominal versus structural and ducks, type system ‘strength’ (too weak –> useless, too strong –> complex/verbose), exceptions, effects, …  That’s a pretty sexy topic.  To me, anyway.

Functional calisthenics.  Like object calisthenics, but for FP rather than OO.  Like maybe no mutables, no classes, no for/while.  No recursion (only map/fold/etc).  Who knows.  I don’t really like the idea, but sometimes it’s fun to push things to ridiculous extremes and see what happens.  (Haskell?)

Comparison/survey of async/parallel technologies on .NET.  Probably out of my scope, alas.

Write a Scheme interpreter in F#.  Had the idea, but then a month later, someone already beat me to that.  Awesome, saves me the time.

Learn a new language every two years.  I guess kinda based on this; the “standard” advice I always hear thrown around is “a new programming language every year”, but I think that is too fast.  This is also a topic-excuse to do my own auto-biographical how-I-got-here-and-learned-all-this-stuff kinda blog entry.

A CSV file reader.  Or more hot uses of the F# “dynamic” (?) operator.

The best tool for the job is rarely the right tool for the job.  A whole rant about the common advice for picking tools, and the real world factors you should use instead.

F# VSIX extensions.  VSIX is hot, the “app store” of Visual Studio.  I will have more to publish on that front soon, I expect.  I wrote an F# VSIX extension, but it was kinda lame, for a rarely used feature (signature files), and it exposed product bugs, so I’m not publishing it.  But a good trial run.  There is still future stuff here, I expect.  (Jared wrote lots of VsVim in F#, how awesome is that?)

Make blog tags.  My blog is not very searchable.  I am also not fond of my blog host.  :)  But it is too unfulfilling right now to do real work to make progress here.

Architecture explorer and DGML.  The high-end VS SKUs have some cool features like Intellitrace (that everyone knows about), but there are cool less-well-known features too.  I wrote a little app to display the “link graph” of my blog in DGML, and it was fun and easy.  DGML has potential.

Rants about languages.  Even C++ has lambda now.  How can anyone bring themselves to use C or Java by day, without hanging themselves that night?  A language without lambda (and generics) is like a day without sunshine.  (Which is like night.)  Gaaaah!

Observations about “good enough”.  Back in school when I was neck-deep in C++, I really loved the factoring of the STL data structures and algorithms.  And then after being all awash with various iterator categories like RandomAccessIterator, I learned .NET, where for the most part you use IEnumerable (which essentially waters down the whole abstraction to “forward-only”)… and I was like, how will I be able to live like this?  And as it turns out, I can count on my fingers (and maybe just one hand, or perhaps even zero) the number of times I’ve “missed” the expressiveness of STL.  IEnumerable is the sweet spot, the 95% case, and there are always things like IList and arrays you can fall back on, and well, it’s amazing what you can gain from ‘simplicity’ when you trade off a little on the absolute ‘expressiveness’ scale.  This is one of many cases where the “the good is the enemy of the best”, rather, “the best is the enemy of the good” comes into play.

How to teach/learn X.  Or what to teach/learn.  In (practically every level of) school I learned almost nothing useful (for work or life).  Ok, that’s too strong, but like, in ten years of college and grad school I learned almost nothing about threading or server-side scaling, or practical programming languages, or source control, or how to work on teams, or build systems, or the Joel Test, or a bunch of other important stuff for my day job.  (I did learn big-oh, and basic abstraction, and gain lots of experience programming and using various languages… it wasn’t all for naught.)  It’s very hard to teach what’s important, or a good way to do things, to people who don’t have experience (prioritizing, or doing things badly), because they can’t appreciate what’s important/good.  I dunno.

Uhm…

Probably other stuff too.

See, this whole blog entry is going nowhere!

What would you like to see me write about?

Tough luck, comments are off right now because I’m trying to fight off the comment-spam-bots with the inadequate tools of my blog host.  Sigh.  Reddit was having problems today too!  The whole internet is broken – how will you be able to express that someone (like me) is WRONG!?!?  How frustrating!

(The comments will probably get turned on again soon and then you can tell me how my one-sentence summary opinions/rants above are wrong, or vote for your favorite topics.  Not that it will affect anything.  I try not to write a blog on a topic unless I’m feeling passionate about it, and “votes” are unlikely to light that fire in me.  Completely unfair to you!  I gave you the disclaimer at the beginning of the blog entry, so don’t say I didn’t warn you.  :P  )

Let’s all hope this got my blog-writing juices flowing so I can churn out some ‘real’ content in the near future.

Posted in Uncategorized | Leave a Comment »