BoozBot‘s second appearance was at the Eyebeam Mixer this past Saturday, and although I wasn’t there to witness it due to a previously scheduled and essential retreat into nature, the stories that I have heard so far have been amazing. People really seem to enjoy talking to BoozBot, and the process of programming him has lead to all kinds of interesting questions and challenges
Steve Lambert now joins the BoozBot Operator Roster along with Scott V. from Oakland. Steve sent this chat transcript, which is a bit hard to read because it is only one side of the conversation, but funny nonetheless.
David Jimison, my collaborator on the project, also posted his thoughts about the project and the evening.
How He Works
My role in the creation of BoozBot was mostly on the software side, with Dave doing most of the building. The main questions that interested me were:
- How can I make software that contains its own personality, even though the operator might change?
- How can I make a text interface that is extremely easy to use, and minimizes the disadvantage of having to type in responses in a real-time conversation?
- Where is the sweet spot between automation and human intelligence?
- And, one of my favorite new questions: are there any general principles for writing funny software?
So I thought I would give a little explanation of how he currently works, and how I answered each of these questions.
First of all, for the sake of simplicity, we use puppet terminology to talk about BoozBot. For instance, the person who is controlling BoozBot is the puppeteer, BoozBot is the puppet, and the software that runs on the laptop attached to BoozBot is the puppet software. We decided pretty early on that we wanted to try to minimize the requirements for the puppeteer by not requiring any additional software beyond a simple Skype client, so all of the functionality for BoozBot exists on the puppet side. This made for some interesting interface challenges.
The technical requirements for the puppet software were that it be able to
- receive Skype video calls from a puppeteer
- receive Skype messages from that puppeteer and interpret them
- read text aloud using a text-to-speech engine, and fire viseme events to trigger mouth shapes
- send serial commands to an Arduino board in response to commands from the puppeteer
- take photographs from “eyes” and send them to Flickr
After much research and deliberation, I reluctantly decided to go with C# for this first version of the software. At first, I was determined to use an open language. I wanted to develop it in OpenFrameworks as a mac-only application. I wanted to figure out how to get access to the Carbon Speech API, and use the Skype.framework with the wonderful iVox voice fonts, but it ended up being way too difficult. I couldn’t figure out how to create wrappers for the Objective C code for C++, and I didn’t have the time to really dive in and figure it out. C# just makes stuff so damned easy, and I had already bought a bunch of AT&T NatrualVoices voice fonts, so I eventually caved in. Furthermore, in the end, we decided that Microsoft Mike – one of the default Microsoft voice fonts – was perfect for our needs since BoozBot was supposed to be kind of low tech.
For the 1.0 version, I plan on re-writing the software in openFrameworks or Java, depending on a little additional research.
The full list of resources I used:
- Visual C# 2008 Express Edition
- Skype4COM
- EmguCV (although the features aren’t yet active)
- DirectShowNET a C# wrapper for DirectShow
- Microsoft Speech API
- Flickr.NET with help from the Coding4Fun tutorial
- ForecastXML service from Weather Underground AP
The following list is the commands that the puppeteer could send to boozbot. To connect to BoozBot, the puppeteer would sign onto Skype and simply call the ‘boozbot’ user. BoozBot would automatically answer the call (if you are on his friends list) and start sending the puppeteer a video feed. The puppeteer can then send boozbot any text or command he wishes. The commands work just like IRC – anything that starts with a ‘/’ is considered a command, and everything else is dialog.
- Any text sent to BoozBot that does not begin with a slash (‘/’) will be treated as dialog and read aloud.
- /list, /l Prints out a list of available commands
- /p This puts BoozBot into “processing mode” (a whirring, ticking sound starts and BoozBot says “transmission recieved. processing”, and should be sent immediately after a customer finishes talking and expects an answer. This will buy you time to type in a response.
- /pour [drink#] Without the drink#, this command sends you a list of the available drinks and then waits for you to pick one. If you do provide the drink#, it pours the specified drink immediately.
- /shutup, /s Stops whatever BoozBot is saying.
- /repeat Repeats the last thing BoozBot said.
- /blink, /b [optional blink#] Without the blink#, this command sends you a list of the blinks that BoozBot can do and then waits for you to pick one. If you do provide the blink#, it executes the blink imediately.*
- /smile Makes BoozBot smile.*
- /frown Makes BoozBot frown.*
- /voice [optional voice#] Without the voice#, this command sends you a list of the available voices and then waits for you to pick one. If you do provide the voice#, it switches to the voice immediately.
- /macro, /m [macro#] Without the macro#, this command sends you a list of the available macros and then waits for you to pick one. If you do provide the macro#, it performs the macro immediately.*
- /time – Makes BoozBot to read out the time.*
- /pic - Grabs a still from the video stream and sends it to its Flickr account.*
- /weather - Makes Boozbot read out the weather forecast for New York.*
- /turn [degrees] Tell BoozBot to turn his head.�
- /sports – reads some sports scores for the local team.�
- /quote – Makes BoozBot read out a random wise quotation.�
- /parrot – repeats the audio that has been recorded since BoozBot last finished speaking.�
- /remember [name] Remembers someones name for use in later automatic speech.�
* These commands will be called automatically if no command is sent to BoozBot for more than 45 seconds.
� Not yet implemented. Included in this document to build excitement.
These features are also covered in the BoozBot (beta) handbook. The macros were a big breakthrough in terms of developing a character using the software. The repetition of phrases made BoozBot seem more robot-like, which I think made people less sure that there was a real person controlling him.
Programming the puppet software was just a matter of wiring up the Skype API, the Microsoft Speech API, some webcam stuff, and learning a very little bit about serial ports, and then just writing a small command interpreter. The hardest part by far was figuring out the DirectShow webcam stuff. I ended up using a program called WebCam Splitter by Very Soft to split the signal coming in from the webcam so that I could grab frames while Skype still had access to the video feed so that the puppeteer could see his customers.
For some reason, I never could get OpenCV to recognize the “virtual webcam” created by WebCam Splitter, which is why I had to resort to using DirectShow to get a list of the video capture devices. The examples on the DirectShow .NET Sourceforge page show you how to do just about anything you want to do, with the exception of gaining access to the individual frames coming from a webcam. As a result, I never got EmguCV/OpenCV playing nicely with my webcam.
Source Code
Here is the Subversion repository for the puppet software. It comes with all of the DLLs, so I *think* the only thing you’d have to install is Visual C# Express.
http://svn.digitalsituations.com/boozbot/
Future Plans
- “most commonly used commands” command, which will query a remote database for the most frequently used dialog or commands by any operator
- “recent commands”, so that operators have access to their last X commands for easy retrieval
- more complicated macros that include movement/blinking/etc, and text replacement macros, so that the operators can plug words into existing macros.
- add the ABSML parser as an additional web service so that this puppet software can be used for James Chimpton, just like the weather or news commands
- Eventually, we’d like to have a family of BoozBot-like puppets in different bars. Although this will be some time in the future, we will be preparing for this by building features that will react to a face entering the frame by pinging any registered puppeteers who is logged onto Skype and invite them to control BoozBot. We will also build in some kind of payment system, so that the puppeteers will be paid for their time.
- Make BoozBot more self-contained – smaller hardware incorporated into his body. Soon we will be sending out a request for an intern to help us with building the new body.
- Fake beathalyzer
- more daemon/automatic activities
- fake face/ID scanner
2 Trackbacks
[...] I am more speaking to the spirit of the idea, and not to that actual execution. I just read a fascinating blog entry on how Boozbot works, and I love the complexity of it, but it’s…how can I put this?…a bit out of my [...]
[...] http://work.jeffcrouse.info/archives/81Steve Lambert now joins the BoozBot Operator Roster along with Scott V. from Oakland. Steve sent this chat transcript, which is a bit hard to read because it is only one side of the conversation, but funny nonetheless. … [...]