DuckDuckGo chat bot: im@ddg.gg

 

DuckDuckGo now has a chat bot at im@ddg.gg (or im@duckduckgo.com) that will respond to your instant messages with Zero-click Info, search results, and real time topic summaries.


The following is a guest post by Dhruv Matani (@dhruvbird), who is a software developer at Directi and who made the new DuckDuckGo chat bot.


What is a chat bot?

A chat bot is a like a friend on your roster (contact list) who is actually a computer and can respond to any messages you send it (or send messages of its own accord). Examples of chat bots include bots that:

  1. Periodically alert users of the score during a game of cricket (todays-special@appspot.com).
  2. Message the user whenever a commit is made (github).
  3. Post any message you send it to your twitter and facebook accounts (List of twitter bots).
  4. Try to answer your question by asking other users (aardvark).
  5. Can get the latest news (only pt_BR available).
  6. Perform web searches, bookmark pages, do calculations, etc... (clisearch).
  7. Show you the current weather (defunct).

You can find other XMPP bots on these pages.


What does this chat bot do?

im@ddg.gg is one such chat bot who can do DuckDuckGo searches for you on your chat account. Just send this bot your query and she will do a search and send the results back to you - all while you are using your favourite chat client. You will get not just the web results, but also the 0-click results and definitive answers to your questions (if any). This can be useful when you quickly want to lookup the meaning of a word, find some quick fact about a monument you don't know of, compute the md5 of a string, and so on...

The bot can also give you real time topic summaries for many broad keywords. These summaries are the result of going out to the top links in real time and looking for relevant paragraphs that match your particular search term.

Though the idea has been tried before, the proliferation of XMPP has made bots a lot more accessible and chat accounts more plentiful. That apart, the large set of data sources that DuckDuckGo looks at to provide answers provides a single window into a lot of searchable data.


Why did I even venture into it?

The reason I wrote this chat bot was because a few of my friends have Blackberry accounts with only mail and chat (no internet) enabled and they wanted to be able to just search the web. They said that it would be enough for them to see just the information that was returned in the result snippet as opposed to the result page. Being on a mobile also meant that they couldn't effectively scan complete web pages and a minification of information would definitely be better. As it turns out, a lot more people with such requirements exist. In fact, one of my friends mentioned that he used this bot to look up the definition of a word while he was watching a movie!

You can try it now by adding im@ddg.gg to your roster on any jabber network (jabber.org, pandion.im, livejournal.com, gmail.com, etc...)


Technical stuff

We used ejabberd to serve the ddg.gg domain over jabber. The im@ddg.gg bot itself is a standard jabber client written in javascript and runs on node.js.

Ejabberd was chosen because it is one of the best jabber servers out there, is highly configurable and is fairly easy to set up.

Node.js was chosen as the platform for the bot (as opposed to choosing Javascript as the language). The inclusion of Javascript was just a side effect of the former decision. I would like to mention here that even though I am a python fanboy, I chose node.js, not because of the language, but because it was just so well suited for the task at hand.

Node.js is ideal for such routing/switching based interactions where it takes data from one end, pumps it to the other and does the reverse. The whole idea of everything being asynchronous (across protocols - XMPP, HTTP, SMTP and others) appealed a lot to me and I literally had a working (no frills and did the job right) jabber bot (fully performant) in under 100 lines of code on node.js. I would like to thank astro for being helpful when I ran into issues with node-xmpp.

We had an option of either making the bot a jabber client or a jabber component and we went with the former only because of its simplicity. In case of a client, you are assured that you are one user whereas in case of a component, you need to be aware of which user you are pretending to be at that point in time. The obvious drawback with a client is that most external jabber servers place a limit on how much data the client can send/receive if the client directly connects to their network. I would think that such restrictions don't exist (or are very generous) for components - if the jabber server does in fact allow components to connect to it. However, since the bot connects to our own jabber server, we can always be generous with shaping traffic to it. Besides, for this use case, the roster size doesn't matter - since that is the most common argument I have read against a bot as a jabber client.


Server federation

We had a tough time figuring out the exact rules as far as google's XMPP server was concerned. Google runs a service called Google Apps which you can set up on your domain on this service and once you do that, you can chat with users on any other apps or gmail account. The catch here is that XMPP requires you to add an SRV record to every domain if you want the jabber servers serving those domain to talk to each other. However, since they (gmail and google hosted domains) are all served by the same jabber server (talk.google.com), it is not necessary for users on these domains to chat with each other. What happens as a result of this is that anyone who is served by talk.google.com can NOT chat with people on other servers (if they haven't set their XMPP SRV server record to talk.google.com - it is required for server federation).

Hence, we initially decided to run 2 bots, one on ddg.gg (ejabberd) and one on (talk.google.com), both serving the address (im@ddg.gg). However, even gmail accounts started talking to the bot hosted on talk.google.com (gmail seem to have some internal condition that detects that the target domain is apps hosted and they bypass SRV lookup - even if an XMPP SRV server record for that domain exists). This is undesirable since google has a limit on the amount of traffic that a single user can generate and the bot would most certainly cross that limit very easily (since it is a jabber client and not a jabber component). I'm not sure if talk.google.com allows jabber components (If anyone knows anything about whether this is possible, please do let me know). Hence, we got rid of the apps hosted bot and now there is just one bot serving im@ddg.gg (on the domain ddg.gg that is served by the ejabberd instance).

As of today, an apps account can talk to the bot ONLY if it has the XMPP SRV server record set to talk.google.com (for server federation).

Additionally, talk.google.com seems to disregard vCards that the bot sends to the server. The gmail browser client too doesn't show the image that the bot sets in its vCard.

Other jabber servers didn't give us too much grief since there wasn't anything non-standard about them (that we encountered).


Yegg was very helpful in this whole exercise so a big thanks to him!! :-). If you have any ideas for improvement, please include them in the comments below.

References: http://www.quora.com/What-is-the-best-way-to-build-a-fast-scalable-instant-messaging-IM-bot/

powered by TinyLetter

If you have comments, hit me up on Twitter:
I'm the Founder & CEO of DuckDuckGo, the search engine that doesn't track you. I'm also the co-author of Traction Book, the book that helps you get traction. More about me.

About Me

RSS. Email.