Inside F#

Brian's thoughts on F# and .NET

An RSS Dashboard in F#, part four (speech agents)

Posted by Brian on February 9, 2010

Previous posts in this series:

Last time I covered exposing RSS feeds as IObservables, and we created the basic infrastructure for getting new information from the web published as an IObservable event stream.  Today we’ll switch gears entirely to cover the next technology piece of the app: synthesizing speech so that titles can be read allowed over the speakers.  We’ll encounter one important design consideration which is neatly handled using asynchronous agents.

Making your computer talk

One of the terrific things about .Net is that it seems like no matter what you want to do, there’s a library for that.  :)  When I was originally designing what I wanted my RSS Dashboard to do, I was apprehensive about the speech part, because I had no idea what it entailed.  I was thrilled and relieved to discover that I could simply add a reference to System.Speech, and then write just two lines of code:

let ss = new System.Speech.Synthesis.SpeechSynthesizer()
ss.Speak("Hello world!")

and get my machine to speak to me.  Hurray!

Of course, it’s not perfect; try things like

ss.Speak("How do I make a GUI in F# using WPF and XAML?")

and discover it sounds like “How do I make a gweh in eff number sign using WPF and ex-ay-em-el”.  So after experience listening to a bunch of question titles from StackOverflow, I authored a function to “wash” the text before I hand to the speech synthesizer.  For example

let SpeechWash (text:string) = 
    text.Replace("GUI", "gooey")
        .Replace("#", " sharp")
        .Replace("XAML", "zah-mull")

cleans up the previous sentence pretty nicely so that

ss.Speak("How do I make a GUI in F# using WPF and XAML?" |> SpeechWash)

is understandable.  We’ll return to this at the end of today’s blog entry.

A blocking operation

Of course if things were just that simple, this would be a really short blog post.  There’s one important consideration for using this API in the context of my dashboard app.  I’m going to be receiving events about new items on the RSS feed, and need to draw updates to the UI and speak some titles aloud.  But the Speak() call is a blocking operation – if I call it from the UI thread, the UI thread will be blocked while it is speaking.  We can demonstrate this in a simple console application:

let ss = new System.Speech.Synthesis.SpeechSynthesizer()
ss.Speak("Until I finish saying this, I will not print what's below")
printfn "Done, press a key to quit"
System.Console.ReadKey() |> ignore

Until the computer finishes speaking, it does not reach the “printfn” line.  So if we call Speak() on the UI thread, we will end up blocking the UI for many seconds, which is a big no-no.  So we need to speak from a background thread, so that we can keep the UI ‘live’ at all times.

A first try at avoiding blocking the UI

There are various ways to do work on background threads, however a number of them will solve one problem by creating another.  For example, Threadpool.QueueUserWorkItem (QUWI) is an API to schedule work on a background thread, and we could use it like this:

let ss = new System.Speech.Synthesis.SpeechSynthesizer()
System.Threading.ThreadPool.QueueUserWorkItem(fun _ -> 
    ss.Speak("Now I can talk and continue on")
    ) |> ignore
printfn "Done, press a key to quit"
System.Console.ReadKey() |> ignore

If you run that code, the computer starts to speak the sentence, but the QUWI call does not block and so the console UI thread continues on and prints the message while the voice is just starting to speak.

Problem solved?  Apparently… until you consider that the dashboard app will be seeing lots of new feed items, and if it uses this strategy to speak each item title as it arrives, we’ll run into the issue that the app is ‘talking over’ itself.  For example, if I do the same code as above but with two QUWI calls to speak multiple sentences:

let ss = new System.Speech.Synthesis.SpeechSynthesizer()
System.Threading.ThreadPool.QueueUserWorkItem(fun _ -> 
    ss.Speak("Now I can talk and continue on")
    ) |> ignore
System.Threading.ThreadPool.QueueUserWorkItem(fun _ -> 
    ss.Speak("But saying multiple things at once is a problem")
    ) |> ignore
printfn "Done, press a key to quit"
System.Console.ReadKey() |> ignore

Now the computer is talking over itself, speaking two different sentences at the same time from two different background threads.

What we really need is a way to have a non-blocking call where we can pass a string to speak, but if there is already something else being spoken, the strings get queued up so they are spoken one by one.  That is, I want an API with a function like EventuallySpeak(someString), where calling the function returns immediately (non-blocking), and the string is either spoken right now (if the voice is not saying anything else), or else queued up to be spoken once the current voice work is finished.

It just so happens this is a perfect job for an asynchronous agent.

Let your agent do the talking

An agent-based architecture models a system as a set of independent autonomous processes (called “agents” or “actors”) that communicate via asynchronous message-passing.  Erlang is perhaps the archetypal programming language and runtime here.  Rather than digress into a long discussion of this style of programming and what its strengths are, I’ll stick to just describing how agents in F# work and what they can do for you.

Here’s an approximate mental model for an F# agent.  Imagine you have some background thread in your app that’s entirely devoted to a particular agent.  What an agent does is this: it runs a continual loop where it waits for a message to arrive in its in-box, and when a message arrives, it carries out some action based on that message.  Then it waits for the next message.  That’s it.  Agents just sit in a loop: wait for a message, do something with it.  The message can be any piece of data, that is, agents are generic in the type of data they accept as messages; a particular agent instance expects a particular data type for a message.

For my app, I need an agent that accepts a string as a message, and speaks the string.  Then my posited EventuallySpeak(someString) function can be implemented by just sending an asynchronous message to that agent.

To create an agent in F#, we use the MailboxProcessor type.  Agent boilerplate code looks like this:

let agent = new MailboxProcessor<T>(fun inbox -> 
    let rec Loop() = async {
        let! msg = inbox.Receive() 
        // do something based on the message
        do! Loop()}        
    Loop())
agent.Start()    

That is, we new up a MailboxProcessor<T>, where T is the type of message this agent can receive.  The MailboxProcessor constructor is passed a lambda that takes the ‘inbox’ (conceptually, the message queue managed by the MailboxProcessor, from which we can Receive() messages; in fact, just a reference to the MailboxProcessor object itself), and returns an arbitrary async computation that defines what the agent will do.  As I just described, a typical agent sits in a loop: “get a message, do something with it” – and so that’s what the code template above does.

For my “speech agent” for the RSS Dashboard app, I need an agent that accept strings and speaks them:

let ss = new System.Speech.Synthesis.SpeechSynthesizer()
let speechAgent = new MailboxProcessor<string>(fun inbox -> 
    let rec Loop() = async {
        let! text = inbox.Receive() 
        ss.Speak(text)
        do! Loop()}        
    Loop())
speechAgent.Start()    

Simple, right?  Now I can define my EventuallySpeak function thusly:

let EventuallySpeak(text) =
    speechAgent.Post(text)

Calling Post to a MailboxProcessor just asynchronously sends the message (fire-and-forget), which the agent will eventually receive and process.

This means I can now write

EventuallySpeak("Now I can talk and continue on")
EventuallySpeak("and saying multiple things is no longer a problem")
printfn "Done, press a key to quit"
System.Console.ReadKey() |> ignore

and achieve success.  I can call the EventuallySpeak function whenever I like, it doesn’t block the thread I’m calling from, and it queues up items to speak so the voice doesn’t talk over itself.

Some technical details

Earlier I described an approximate mental model for an agent: a background thread entirely devoted to spinning in the little loop that defines your agent code.  This makes for a simple mental model, but if that were the actual implementation it would be quite wasteful.  Threads in .Net are expensive (they take up a lot of memory, and you’ll typically use at most tens of threads in an app), and so you don’t want to create a thread for every agent you need, for a couple reasons.  First, most agents spend a lot of time sitting idle: in my RSS Dashboard app, if I only poll the RSS feeds every hour, then my speechAgent will probably spend 59 minutes idle, followed by 1 minute of activity reading new item titles, each hour.  Second, an agent-based architecture may involve thousands of agents (e.g. in an ant colony simulation, you could have an agent to model each and every individual ant).  So the one-thread-per-agent model is a non-starter in many cases.

Fortunately, F# async comes to the rescue here.  A MailboxProcessor runs an async loop in the .Net threadpool.  Async means that when an agent is idle (waiting to receive a message), it is not consuming a thread.  There’s just a callback saved (costing just a handful of bytes of memory), so that when a message arrives that callback will wake up and start running.  And when it does start running, it will grab an arbitrary thread from the threadpool.  Thus many agents can share just a handful of threadpool threads.

The implications are that agents provide immense scalability, but at the same time have the conceptual simplicity of being logically-single-threaded entities.  You can reason about agent code as though it were single-threaded, a constantly running loop which at any point in time has a single ‘program counter’ pointing at a next line of code to run, that may periodically ‘block’ while waiting for external events.  But the implementation is much more efficient.  That’s what makes the F# async programming model so powerful.

Summing up

Here’s all the code for today:

// Based on experience listening to the speech synthesizer trying to speak titles of StackOverflow questions
let SpeechWash (text:string) = 
    text.Replace(".", "dot ")
        .Replace("SO", "stack overflow")
        .Replace("IDE", "i-dee,eee")
        .Replace("GUI", "gooey")
        .Replace(" vs", " versus")  // before lowercasing, avoid e.g. VS2008
        .ToLowerInvariant()
        .Replace("#", " sharp")
        .Replace(" api", " ay pee i")
        .Replace("sql", "see-quill")
        .Replace("msbuild", "MS build")
        .Replace("utf", "u-t-f")
        .Replace("xaml", "zah-mull")
        .Replace("()", "")

let ss = new System.Speech.Synthesis.SpeechSynthesizer()

// A mailbox processor is perfect for the speech task
//  - we want to queue up things to speak without blocking the UI
//  - we want it to be logically-single-threaded (so it doesn't 'talk over itself')
let speechAgent = new MailboxProcessor<string>(fun inbox -> 
    let rec Loop() = async {
        let! text = inbox.Receive() 
        ss.Speak(text |> SpeechWash)
        do! Loop()}        
    Loop())
speechAgent.Start()    

let EventuallySpeak(text) =
    speechAgent.Post(text)

The biggest portion is just dealing with properly pronouncing the myriad acronyms and idiosyncratic words we developers constantly use.  After that, the speech agent is just 10 lines of code that neatly solves the problems of not blocking the UI thread while speaking, as well as not speaking multiple things at once.  An agent is a useful implementation strategy to provide a simple programming model over a singleton fixed resource (like your computer speakers) to ensure that you use it properly.

Next time

For the next entry in the series, I need to get information about what links I’ve visited in my browser.  COM interop and Javascript code ahead!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: