Recently in Duck Duck Go Category

Hack Hack Go

 

iostat.png

I want to make Duck Duck Go a better search engine for programmers like me. If you're a programmer, I'd appreciate your feedback and ideas.

Duck Duck Go is intended to be a general purpose search engine and that isn't going to change. Our user base certainly reflects this purpose, i.e. is quite varied on every metric I've tried to measure.

Yet there are certain search niches like casual research where Duck Duck Go really excels. I'd like programming to be one of those areas.

To that end, here's what I've got so far.

  • A general search engine. The good news here is I know a lot of programmers who use it as their primary search engine. It works and (at least some) people really like it. I'm always willing to add new features whose absence are preventing people from switching. Currently on that list are some maps and images.

  • Zero-click Info. There are red boxes above links on some searches with info you can get without clicking, i.e. on-site. We have a lot of info that is specific to programming topics. Of course we have Wikipedia, e.g. Dijkstra's algorithm. But I've also added software sources, i.e. github, freshmeat, download.com, versiontracker, and sourceforge.

  • Category pages. I've mined sources to create to useful topic lists for browsing/learning, e.g. Search Algorithms.

  • Disambiguation pages. I've created pages to help you isolate programming topics in common query terms, e.g. cookie links to HTTP cookie, which has results more geared toward that meaning. There are also programming specific disambiguation pages, e.g. nearest neighbor.

  • Crowd-sourced links. I also mine links from crowd-sources sites, e.g. coroutine.

  • Wikipedia paragraphs. I've deep-indexed Wikipedia at the paragraph level. You don't have to match a topic nearly exactly anymore to get some Zero-click Info, e.g. python switch statement. This is way more than a regular search index, as it is sub-section/section/title aware and uses some NLP for relevancy. I hope to make that matching algorithm even more sophisticated over time.

  • Bang. There are a few hundred !x shortcuts that can be used, e.g. !cpan Net::DNS

Here's what I'm thinking of doing.

  • O'Reilly Paragraphs. I think it would be awesome if I could index all O'Reilly books at the paragraph level, like I've done for Wikipedia. This content is well-written, encyclopedia-like, is largely in paragraph form, and has surrounding contextual information (section titles, etc.) that will make the relevance matching excellent. Problem is, I don't know anyone at O'Reilly. I think it's a win-win because it can link right to their Safari product or individual book pages. And I don't think it canabalise Safari because you're getting people in a very different context (when searching). Anyway, I thought I'd start by writing them an email. I did that and haven't heard back yet.

  • More topic sources. I'm going to add man/info pages, so you can type in a command and get a description. I could also do packages for distributions/languages in a similar manner if people think that would be useful to them. I've explored indexing these at the paragraph level, but the content doesn't seem to work well for that purpose. Other, more general sources, may be incidentally useful to programmers like Amazon product descriptions. I'd love your thoughts here.

  • Bang documentation. The current bang commands aren't documented. I'll document them as well as add more that are useful to programmers. Any you want?

  • Zero-click Info by IM. I'm thinking of making a chatbot that will respond to you via IM with Zero-click Info (and links). So you send it a search query and we'll send you back a description along with a few links. Would you use that?

  • API integration. I wrote the Perl binding for Wolfram Alpha. I'm exploring ways to use it to integrate good WA content. I'm open to using other APIs, but I'd strongly prefer to get dumps instead so I can ensure speed. Another one I'd like to integrate for programmers is ErrorHelp.com (previously bug.gd).

That's where I'm at right now. If you're a programmer, my questions for you are:

  1. Do you find the above compelling?

  2. Do you have any particular feedback/ideas?
Feel free to comment below, on HN, on reddit, or email me directly.

Duck Duck Go in Philly Inquirer

 
inquirer.jpgI'm really proud to be in the Philly Inquirer today in both the offline (business, pg3) and online versions (Philly Deal$ blog).

Thank you to Joeseph DiStefano at the Inquirer for the nice write up and to Charles Knight of The Next Web, Search for mentioning us to him.

Here's the excerpt:

Knight is also attracted to simplified search engines like Gabriel Weinberg's Valley Forge-based DuckDuckGo.com. "It's Google Light," says Weinberg. "They strip out all the garbage - video, ads. And it's intelligent. You search for 'wolf,' it'll ask, 'What wolf do you mean?' and list some choices."

DuckDuckGo.com is the brainchild of Weinberg, a twentysomething graduate of MIT who sold his Web site, NamesDatabase, to Classmates Online Inc. in 2006, and retired to raise his child and invest in new companies with his wife, a GlaxoSmithKline P.L.C. statistician, in 2006.

"Around M.I.T., we had a lot of people starting companies," he said. "We started this group, Hackathon."

His Philly chapter "is growing slowly over time," with help from people at the LiquidHub consulting group, among others. They meet every month, sometimes in an office at Cira Centre, sometimes at the Bear Rock Cafe in King of Prussia. "There's random people making sites," he explained. "We try to put them together."


For the record, I have a few minor corrections...
  • I can't take credit for the "Google Light" quote, which must have been from Charles. I usually say "better results and less garbage."
  • I'm thirty and I'm not retired; in fact, quite the opposite :)
  • The Hackathon group is unique to Philly. I started it upon moving to the area trying to reconstruct some of the entrepreneurial spirit I felt around MIT. I'm actually at our monthly hackathon right now!

Duck Duck Go on CNN Home Page!

 
Duck Duck Go is on the top of the CNN home page!  Well, our logo is :). Sadly, there is no link to us in the story, but I'm not complaining at all.  Seriously.  

I couldn't be more happy and proud.None of my projects have ever been on CNN, let alone in the lead featured article.I'm even more proud because the other 5 companies pictured all have ridiculous funding. While they have over $150M in combined funding, we are bootstrapped.

Believe it or not, even without the link, we're still getting a decent amount of traffic though. t's funny--the Go of Duck Duck Go is not in the picture so we're getting a lot of requests from people searching "duck duck" in Google and CNN.

Anyway here's the story link for when it is off the home page. And here's a snapshot that documents this was real!

Duck Duck Go Architecture

 
I often get asked what Duck Duck Go "runs on."  This post basically answers that question by outlining the major moving parts that serve queries, i.e. its architecture.  I'll detail in another post what, in particular, makes it fast, i.e. tunables and other specifics.

Caveat: this architecture was designed for maximum query speed for our initial soft launch.  While also somewhat designed for eventual scalability, we don't have that much traffic yet (though we are growing at a nice clip).  So don't take this as advice like you might get at High Scalability.  It's really just for your amusement.  However, my last startup did have some scale (relatively speaking of course) so I know a bit about what I'm doing...

  1. DNS served by DNS Made Easy.  I used to serve it myself via djbdns, but DNS Made Easy is faster, makes it easier for me to deal with fail-over, and cheap.

  2. All requests come into nginx. I used to use two instances of Apache, one for dynamic requests and one for static files.  But nginx is faster, uses less memory, and is more stable.

  3. If a static file, nginx serves it directly, e.g. the home page.  It's really good at that.

  4. Otherwise, nginx checks my memcached store.  I hadn't used memcached before this, and find it a big win.

  5. If not in memcached store, nginx proxies to FastCGI processes that are running in the background.  I hadn't used FastCGI before this, as I always had used mod_perl with Apache. 

  6. The FastCGI processes are managed by daemontools (as is memcached).  At first I was worried about stability in these processes, but it hasn't proved to be an issue yet.

  7. Internally, the FastCGI scripts are written in Perl and run by the FCGI::Engine Perl module.

  8. The Perl scripts access a PostgreSQL database (when needed) to retrieve our zero-click information, among other things.

  9. The whole thing runs on FreeBSD.

  10. For fail-over and scalability purposes, I have EC2 images that replicate the above except that they run on Ubuntu (since, at the time, FreeBSD wasn't available).

  11. All of our site icons and zero-click info images are hosted on S3.

  12. We also reference some external YUI JS files.

Any questions?  

Also, I'd love any feedback on this architecture.  I'm always looking for ways to speed it up!

Update: additional comments can be found here.

About

   

I'm a solo founder of a new search engine and an angel investor. There is more about me on my home page.
I'm also doing a book on getting traction. Get updates about it:

Online Karma

-
From a new search engine

Online Profiles

-
From a new search engine