January 2011 Archives

Using external APIs to improve search

One of my original premises for starting DuckDuckGo was that for every query there is usually a vertical search engine somewhere that answers that query better than a general search engine. I still believe this premise to be true, and I'd like to do whatever I can to get you direct information from that optimal vertical search engine whenever possible.

Our bang syntax gets you to those other search engines directly, e.g. !cpan Net::Server. But in that case you have to know what vertical engine you want beforehand. You often don't, if for no other reason that you don't know hundreds of vertical search engines even exist. 

As such, I think it is more powerful to put a snippet of zero-click info right on the results page. Our stack overflow integration is a good example of this effort, which uses an index on our side derived from their creative commons dump

However, often times a dump (and resulting server-side index) is not possible. Reasons vary from fast-changing info to licensing to deep processing that needs to happen on the fly. So when this happens, an external call needs to be made. 

I've taken this external concept the furthest with our Wolfram Alpha integration, which will generate a lot of instant answers for you on DuckDuckGo. I think it has worked well, and so lately I've been working on integrating a lot more external APIs from other vertical search engines. Here are the latest integrations (still works in progress of course). 

Qwerly (example search: yegg). When twitter results appear in the results, we use qwerly's API to find other profiles for that particular person. You can click on the icons to go directly to those profiles. In the future, I'd like to expand this to other domains, as they add those into their API.

Numote (example search: glee). When you search for a TV show, we grab air time and episode information from the Numote API. You can click on the episodes to be taken to episode summaries. In the future, I'd like to add a link to watch the show online (if available).


Fanvibe (example search: flyers)When you search for a sports team, we get schedule and score info (including in progress games) from the Fanvibe API. You can click on a game to be taken to engaging chatter about that game. In the future, I'd like to expand this to college teams and other leagues.

SeatGeek (example search: chemical romance). When you search for a band, we get upcoming show info (including secondary ticket price info) from the SeatGeek API. You can click on a show to be taken to a seating chart and other ticket and venue info about that show. DuckDuckGo will get a commission if you subsequently purchase. In the future, I'd like to expand these listings to be location aware and also to show for sporting events.

Amazon (example search: modern perl book). When you search something shopping related, e.g. a book, we get product information from the Amazon Product Advertising API. You can click to Amazon for more info, and we've deep linked to other useful places on Amazon and to a WorldCat library lookup (for books). Like with SeatGeek, DuckDuckGo will get a commission if you subsequently purchase from Amazon. In the future, I'd like to expand the links for other products to link to manuals and other useful stuff.

There are several other integrations in various stages of development. I'd really appreciate suggestions for improvement as well as other information/services to integrate.

You'll notice that four out of five of these integrations are with startups. I really like working with startups not only because I'm one, but because they're often doing innovative things with data and because they're flexible such that we can produce the best possible search integration.

Using external APIs is not without its problems, however. Because these are supposed to display on top of the results, there are timing issues, and if they come in after the fact, things can jump. I've been working on ways to mitigate this problem by leaving appropriate space and setting various timeouts.

That said, I think it is definitely worth it. I think these types of integrations really improve the search experience, and it is where search is headed: more zero-click info, more of the time. The key is though it always has to be highly relevant, i.e. false positives and so-so info must be kept to a negligible amount.

Search leakage is not FUD. Google et al., please fix it.


Lately I've been accused by some of spreading fear, uncertainty and doubt (FUD) by trying to let people know their search terms are being leaked to the sites they click on. I hope to address those concerns in this post.

For those of you who have no idea what I'm talking about: when you click on a link on the Internet, where you clicked from gets automatically sent to the site you clicked on (most of the time). 

For example, if you're on yahoo.com and you click to a story at the New York Times, your browser will send to newyorktimes.com some information that you came from yahoo.com -- namely, the Web address of the page you were just on. This info is called the Referrer.

At issue here is that sometimes the Referrer contains personal information. In particular, when you use most search engines, your search terms are included in the Referrer. That is, when you search on Google/Bing/etc., and you click on a link, your search terms are sent to the site you clicked on. This search leakage doesn't happen at DuckDuckGo.

Now, let's take the FUD arguments in turn.

One site having one of my search terms is irrelevant. That may generally be the case, but unfortunately, tens of millions of sites run ads from just a handful of ad networks. Those ad networks can aggregate your search terms and piece together a large percentage of your search history. 

So the question then becomes do you care if third parties (not associated with your search engine and not bound by its privacy policy) have a significant % of your search history? If you don't care about that, then you probably don't care about Referrers. 

It's not Google's fault. Your browser sends that stuff. That's true, but Google et. al. could easily fix it. It is a technically trivial fix. In fact, Google had done it for a bit when they switched to using Ajax.

So the question then becomes if you're a company that cares about user privacy and can easily stop third-parties from piecing together your users' search histories, why wouldn't you do it?

In other words, I find this FUD argument to be a straw man argument. While you can fault the browser or the Internet, that doesn't mean someone who is able shouldn't come in and fix it.

It would hurt SEO. The only reason I've heard to not prevent search leakage is that marketers use Referrer info to do better search engine optimization (SEO).

But the information doesn't have to disappear, just the current mechanism of transferring the information in a personally identifiable way. Google et al. could provide sites with the information in an anonymous fashion. At that point, I think the only thing marketers couldn't do would be to dynamically serve you different pages based on your personal search terms.

So the question then becomes is that trade-off worth it? 

Google Webmaster Tools (GWT) doesn't provide that full information. Matt Cutts wants me to stop saying GWT can solve this marketer problem because while GWT provides a lot of information, it does not currently provide all the terms people search for to get to your site. That's true; sorry Matt. 

But the key word is currently. There is no reason I can see why it couldn't provide a more comprehensive view into this data. 

Google provides ways to opt-out.  The only thing I know that somewhat protects you from Referrers is Google's encrypted version, which doesn't protect you fully (because https->https traffic still sends Referrer headers).  

Most people have no idea that the encrypted version is related to this problem, or that it even exists. Furthermore, you still can't just type in https://google.com/ to get there (you have to add the www.).

But all that is besides the point, because you shouldn't have to opt-out of this search leakage in the first place. Your search results won't suffer -- Google still has your history. 

Therefore, it should be the default. Matt says SSL can't be the default because of latency, but that is another straw man argument IMHO. You don't need SSL to solve this problem as evidenced by their Ajax incident and DuckDuckGo.

You're just attacking Google when Bing et al. do it too. I want everyone to solve this issue and I've tried to put "et al." in this post a lot. However, the reality is Google is synonymous with search. Despite what search market share #s say (I still don't grok them), pretty much everyone I talk to about search talks about Google. 

In any case -- Bing, Yahoo, etc. -- if you're listening, please solve this issue at your search engines too.

To summarize, here's my basic argument:

1) Search engines say they care about user privacy.

2) They are currently allowing third-parties to aggregate user search history by not blocking the browser from sending search terms in the Referrer header.

3) There is an easy fix.

So why isn't the fix a no-brainer?

Here is a representative example of feedback emails I get on this subject. I got this user's permission to share.

I just replaced Google with DuckDuckGo as my default search engine. I'm VERY tired of having advertisers jump all over me everytime I do a search for, well, anything.

For example: watching THE TUDORS on iTunes, one of the characters had gout. I wanted to know if gout was a recognized disease during the time of the Tudors. So I Googled "gout", and checked out the wikipedia entry on the subject. Turns out it was in fact a recognized disease at the time (although they had no idea what caused it). I don't have the disease. I don't personally know anyone who does. I certainly don't have any need for medications that treat gout. But now I'm constantly bombarded with ads for all kinds of drugs intended to treat it. 

All I did was get currious, just once, about a disease suffered by a TV character on a show I like to watch, and now every advertiser on the planet is apparently convinced that either I, or someone I know, has gout, and they're not about to pass up even the most minuscule chance of selling me something.

Here's the official response Google gave to Wired:

"It's unfortunate that DuckDuckGo is preying on people's fears and offering incomplete information in order to garner attention," a company spokeswoman said in an e-mailed statement.

"For example, it is inaccurate to say that Google uses sensitive health-related terms to target ads on affiliated web pages."

"All search engines and websites use referrer terms as part of the architecture of the web, but we recognize our responsibility to protect the data that users entrust to us and we give them meaningful choices to protect their privacy."

The meaningful choice here would be to drop the personal information from the Referrer. 

Finally, I'm not alone in this call to action. Christopher Soghoian, who previously worked at the FTC and had been a Google intern, filed an FTC complaint in October of last year on this very subject. Here's his post on it and the associated WSJ post.

On not hiring


Hiring is hard. Not hiring can seem even harder, but often isn't.

At my last company we went from entrance to exit without hiring one employee. I'm now three years into DuckDuckGo, and still haven't hired.

Needless to say, I'm an outlier. So don't take what I say about hiring too seriously, but perhaps I have something useful to say on not hiring.

Most angel pitches I get seem to suggest the use of funds will go to salary, both to the founders and to immediate new hires. The assumption is of course that these new hires will move the company forward, faster.

Yet every time I see that pitch, I look at my own experience and question this assumption. I'd much rather see initial use of funds around figuring out distribution, i.e. testing out different traction verticals. And then once one or more customer acquisition channels are flowing, then hire.

So why do people want to hire so early?

We need to build x, y and z, ASAP. Before you've figured out distribution? What evidence do you have that x, y and z, once built, will make customer acquisition any easier?

Beta customers saying they want things isn't enough. There isn't a good reason to add x, y and z to your product, i.e. complexity, unless you really know it will propel you faster to new customers. Yes, you can never really know, but I see a lot of people who certainly don't know. 

I understand their position, however. They like engineering; they like working on hard problems; they like the idea of running a team. That's great, but it doesn't make it the right business decision.

We need a real designer because we suck at design. Have you really tried yet? Really? Usually not.

I'm not the world's best designer by any means, nor am I "classically trained" in design as they say, but I like my results, and I bring in the big guns when needed as freelancers. I'm sure you can do the same for at least your initial versions.

It just takes time and effort. Yes, that is time and effort you may not want to spend, but do you need to hire a full-time position because you're insecure/lazy/etc.? No.

Instead, lean on the powers of incremental improvement and the Pareto principle (80/20 rule). Spend time each week looking at a specific parts of your design, and iterate on them. It will get better if you put in the time. And then for finishing touches, e.g. nicer images (the last 20%), outsource via 99designs/freelancers/etc.

A corollary to this one is user experience (UX), i.e. interaction design. I agree this is super important. I also still think the founders should be doing it. Again, iterate, based on real feedback from users, and then bring in consultants and tools to give you ideas and polish.

We have too much to do. Any startup can easily grow to fill 100% of your time. That doesn't mean you're spending your time on the right things, or that hiring someone new and filling 100% of their time will increase outcome potential for your startup.

In addition, there are three main problems with hiring.

The wrong person can negatively impact your startup. There are horror stories, but more run of the mill is they're just mediocre or don't have a true startup mentality. Their presence can turn your company more mediocre, and that is not good.

People also tend to underestimate the time it will require post-hiring and post-ramp-up to manage your hire(s). You've just added lots of meetings and other communication channels. Hiring takes a lot of time, both before and after. Your employee will not be inside your head.

And finally, hiring takes money. It increases your burn rate significantly. Companies before product/market fit, i.e. traction, need to stay around long enough until they get it. That can take a lot of time, like years. There are countless cases where companies folded only to miss their moment and see other companies rise up where they might have done so.

One approach I like that some of my portfolio companies are taking is to tie hiring decision points to traction milestones, e.g. once we hit $xK/month in revenue we'll do our next hire. 

The nice things about this approach are that it allows you to a) manage the burn rate issue and b) take a long time to plan your hire. The latter allows you to make sure you're getting the right person in the right position and that they will have a positive impact on the startup. 

Early on it is not entirely clear what that right position will turn out to be. You have a lot of short term needs, but that doesn't mean they should turn into full time positions.

* Photo is courtesy of The American.

Philly Open Angel Forum is now accepting applications

app-65bfdbf045136231c23bb5895b177d44.jpgI'm proud to announce that Open Angel Forum (OAF) is coming to Philly on March 16 (in about two months). We're now accepting applications for startups looking to raise angel rounds in that time-frame. Here's a short URL to share: http://ye.gg/oaf. The applications are powered by WizeHive, a local Philly startup.

For the unaware, OAF is a group started by Jason Calacanis to serve as an alternative to the traditional angel pitch events that either cost startups money or take up too much of their time because of the slow group processes often involved to approve investments. Here's the actual mission statement: 

The Open Angel Forum (OAF) is dedicated to providing entrepreneurs with access to the angel investor community based solely on merit (and without fees). Additionally, we strive to build collaboration between angel investors and to inspire high-net worth individuals to become angels.

Morgan Lewis (a law firm that has a great local startup practice) has been gracious enough to host to event for free. Philly's startup scene is heating up, but we've had great angel investors for a while. We should have 20-30 active, independent Internet/software startup angels in the room.

You do not need to be from or associated with Philly to apply, though I do hope we get some good applications from the area. As you might have guessed from the header of my blog, one of my goals is to help the startup scene flourish here.

If you're a local angel and I haven't talked to you yet about it, please let me know.

Update: deadline for applications is midnight (EST), 2/18.

Update 2: you do not need to be associated with Philly to apply. In fact, we're looking for great startups from all over, especially from under-represented places like MD, NC, VA, etc.

Update 3: check out the list of angels planning on attending.