November 2010 Archives

Address books and social graphs

Lately I've been noticing a new viral strategy popping-up, and I'm not quite sure what to make of it. Here's how it works. You upload your Google contacts to the site, perhaps to find other people you know or as part of some other functionality. (Google works better than other sites because of the way Gmail implicitly adds a contact for each correspondence.)

Then I come in to the site through a referral email. Without entering in anything, the landing page can be tuned to my social connections based on my email's appearance in address books' of my contacts. That is, the service has saved all the previous contact lists so they can make a behind-the-scenes social graph for use in converting me. 

Of course when the site is bigger they can use their inherent social graph for most of this logic. But when they're just getting started (or entering new networks), this address book component can really increase their viral coefficient.

This technique isn't definitively bad. You could imagine a big disclosure on the front end (when uploading). On the other side (me coming to the site), it is improving my experience by putting the site into the context of people I know.

But it can get creepy too. I doubt even with seemingly valid disclosures, people realize that their info would be used outside their own account. Also, it can lead to interesting "people you may know" suggestions, sort of the equivalent on twitter of people your friends follow that also follow you, but you otherwise have no shared connections. Those recommendations always leave me wondering, how did they make that connection?

Taking that a step further, the site could continually recommend me to invite people that aren't currently on the site, without me uploading my address book. That is, they know this person is not yet a member of the site and they also know their name and email based on previous uploads. So they could just present the person's name to me, get my authorization, and then kick off the referral email on my behalf without me entering anything.

DuckDuckGo/blekko search partnership


I'm happy to announce DuckDuckGo (the search engine I run) is partnering with blekko (another search engine startup). I'm sure the partnership will evolve over time, but right now we're using some of their auto-firing slashtags and they're using some of our Zero-click Info.

On our end, we're using their technology to improve results in really spammy categories like health. So now if you search for cure for headaches the top links will be from more trustworthy sources. We're also using their APIs to offer sort by date functionality, which has been an oft-requested feature since launch. The former (slashtag stuff) will happen automatically; the latter (date stuff) you can get to through the drop down next to the search box (select Sort by Date) or by putting !date after your query, or sort:date.

On their end, blekko is currently using some of our Zero-click Info to offer snippets on Web sites. For example, if you search for DuckDuckGo on blekko, you can now click 'info' underneath many of the link titles, and get a pop-up with info about that site.

The long-term goal of DuckDuckGo is to get as many people as possible to want to use it as their primary search engine because of a great search user experience. We use a lot of APIs to do that, and now blekko is one of them. They're doing some cool things and I'm glad we can work together at search innovation.

Read more:

Startup altruism


There is a lot of altruism in the startup community. It's more than just karma, although I believe in startup karma too.

Offer HN was startup altruism. It's people contributing on It's experienced entrepreneurs taking meetings with first-timers just to help them and to open up their networks so that other people can help them as well.

You can try to find the quid pro quo in anything, and if you try hard you can usually come up with some convoluted way to explain nice things way. The whole concept of altruism suffers from this line of reasoning. 

But that is largely conspiracy theory, at least in this restricted air space. All the altruism I see really seems like altruism to me. That's the Occom's razor answer and that's what I believe.

This aspect of the startup community is one of the reasons I love to be in it. Yet when you get outside of this small world, it can fade away fast. That reality is disheartening and often jarring, at least for me.

Matt Cutts recently wrote a post entitled What would you do if you were CEO of Google? On that thread was this comment:

I'm sorry Matt, but I think this question is kind of greasy (although I highly doubt that was the intention).

My issue is this:
If someone suggests a truly remarkable idea, something which Google takes and adapts into a full-scale offering, Google (one of the largest companies in the world) is simply poaching this idea from the thread.

While I believe your intent is the true progress of the internet (through Google... and thus Google monetization) you're creating the scenario where you (Google) can profit greatly off someone else's idea, without any commitment or even mention of compensation.

As we all know, a single idea can be worth a large fortune in the online world. It's greasy to take other's ideas, simply because you can, and the people donating the idea might not fully understand the scale or brilliance of what they have come up with.

Now Google isn't a startup. I also happen to hold the contrarian belief that some rare ideas are actually quite valuable. But, even with those caveats in mind, this comment really bothers me.

Maybe it is that I see it occasionally in the startup world too, e.g. people offering ideas and in the same breath mentioning eventual compensation after you execute them. Or maybe it is that Matt is clearly part of the startup community and it feels wrong even though he has Google behind him.

Bottom line though is that type of comment is just not productive. In this particular context, Google can extract massive value (and even do things others can't) because they have Google resources. A random commenter is not likely to be able to get even close to execution on the ideas mentioned on that thread.

That's the rational side. The larger picture is Chris Dixon's notion of builders vs extractors. Offering ideas altruistically gets to the essence of building. Talking about compensation for a few paragraph comment gets to the essence of extracting.

I guess what really bothers me is the naïveté of it all. If you had invented facebook, you would have invented facebook.

How-to not log personally identifiable information


DuckDuckGo doesn't log personally identifiable information (PII). We simply don't save it.

Sometimes I get asked how to implement this privacy policy. It's pretty simple, but I wanted to explicitly spell it out in hope that others can more easily adopt similar practices.

The basic procedure is to go to everywhere you log stuff, and then drop all the PII where you see it being logged.  This procedure will probably amount to you dropping IP addresses and user agent strings from your Web server logs. For most Web apps, that's often all there is to it.

I use nginx (pronounced engine-x). Here's the default log format for nginx:

    log_format  main  '$remote_addr [$time_local] "$request" '
                      '$status $request_time $body_bytes_sent "$http_referer" '

The $remote_addr variable is the IP address and the $http_user_agent variable is the user agent, which can also unique identify people. You could just remove them, but that might break other log processing software. 

Instead, you can just replace them. Here's what I do:

        set $user_agent '';
        if ($http_user_agent ~* [\+\(]http) {
          set $user_agent 'Bot';

        log_format  main  ' [$time_local] "$request" '
                          '$status $request_time $body_bytes_sent "$http_referer" '

These changes have two effects. First, they will print for everyone's IP address. Second, $http_user_agent becomes the $user_agent variable, which is blank for everyone but bots, which get logged as 'Bot'. I do that so I can exclude Bot traffic from reports. If you really wanted some user agent information you could simplify it to FF for Firefox, etc.

For Apache, it looks pretty similar, i.e.

    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined


    LogFormat " %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"-\"" combined

Then you're going to want to double check your application logs to make sure you're not writing IP addresses to them either. I honestly haven't used a lot of the modern frameworks, so I can't easily say whether this happens by default or not. 

Yes, it really could be as easy as changing one line in one file. Note that doing so doesn't prevent you from using Google Analytics. DuckDuckGo doesn't use it, but I think dropping PII from your logs is a step in the right direction regardless of whether you additionally use external analytics software. (I still am able to use awstats to produce reports like this.)

Also note that even if you don't want to commit to this forever, you can still do it today and start logging sometime in the future when the need arises. You don't even have to change your privacy policy as you'll be doing something more private anyway.

If you have some form of accounts, it is obvious that you may necessarily store some PII. However, that doesn't mean you have to store any for the random Web surfer who hits your site.

Help me start a FOSS Tithing movement

A tithe is a voluntary tax, usually paid yearly to a religious organization. I'd like to adopt this concept for free and open source software (FOSS), which in many ways is like a religion.

Please help me start a FOSS tithing movement. I've set up to keep track of company pledges and amounts donated. I also set up a Google group for discussion. 

I'll go first: DuckDuckGo hereby pledges to tithe each year to free and open source software projects. I hope to keep this up for many years to come.

This year a DDG tithe amount based on income would essentially be non-existent because income is such, so for this year (and next) I'm changing the pledge from net income to gross revenue.

It's still relatively small this year since we haven't made that much revenue yet, but it's non-zero! Actually, it should be about $1,500 based on about $15K in revenue. If you're wondering how we made any money :), here's the breakdown:
  • A few hundred dollars off of context ads from experiments at the beginning of the year.
  • ~$9K from sponsorship banners.
  • ~$5K from amazon affiliate sales, which is currently the only way we're making money. This happens primarily by using our !amazon (!a) bang.
Nevertheless, our traffic has quadrupled this year, and I of course hope to see significant growth next year. So, in turn, I hope our pledge amount goes up significantly.

In terms of allocation, I'd like to split it 50/50 with half directed by the company and half directed by the DDG user community. For the company side, I'd like to donate to projects that DDG actually uses extensively, e.g. nginx, FreeBSD, PostgreSQL, Perl, etc., perhaps to specific sub-projects like finally bringing FreeBSD to AWS. For the user side, it's wide open, i.e. up to you. Feel free to nominate projects or offer other allocation ideas.

I'm really hoping more companies will join me. Of course they'd be free to make their own requirements and allocation decisions, e.g. 10% of employee time vs income or allocations towards open source bounties, etc. I'll report back with pledge and donation amounts after the end of the year.

Code icebergs


A lot of good products have features that appear somewhat trivial to replicate, but in reality would be quite complex to do so. I call these features code icebergs because they expose what a casual observer or competitor imagines is a weekend hackathon, but underneath there is a humongous mass of necessarily complicated code that makes everything work as seamlessly as it appears.

In my experience, the iceberg part of a code iceberg often involves handling of a lot edge cases. These edge cases are sometimes actually created by making the user interface simpler, e.g. less or free-form input fields.

At my current startup, DuckDuckGo, a good example is the seemingly straightforward task of taking Wikipedia and turning it into good Zero-click Info to display against queries.  At first blush it's trivial--I mean come on, the Wikipedia dumps output something called abstract.xml with a description of "extracted abstracts for Yahoo."

Yet when you get into it and start exposing it to real users, you surface all those edge cases. That dump in particular is actually completely unusable IMHO and I ended up discarding it within a few days of discovering it. It chokes on lots of things. 

Wikipedia has templates, disambiguation pages, initial warnings and infoboxes, redirects, malformed/complicated sentences, etc. etc., all of which you want to deal with if you don't want glaring errors. And then once you're in there, you might as well start capturing more good stuff like related topics, categories, the right images, good external links, etc. etc. And what about updating it in real time? It starts to really add up.

I like code icebergs. They're really a marvel to look at when you can see the whole picture. They also lure competitors in, who often get sunk (at least initially) not understanding the scope of the problem. They're good barriers to entry, fuel in build vs buy decisions, and the underpinnings of good UX.

Update: good comments on HN.

Are you in a startup career path or are you one and done?


Sometimes peoples' first startups are successful. More power to them. I've been pretty lucky, but not that lucky. 

I had a bunch of startup failure before success. But that was OK because I was in a startup career path, which enabled me to think a bit more long-term. 

In a startup career path, failure becomes experience for the next startup, which unfortunately will probably also fail. Repeat until success. And then repeat some more.

Of course many people never get a hit, and that is the harsh reality of the startup career. However, if you approach your startup life from the one and done mentality, I think your chances are much lower, if for no other reason than you have less chances.

Yet there is more to thinking of startups as a career than rationalizing failure. If you consider you're going to be doing this line of work for 20 years or more, it makes sense to invest in skills and relationships that may payout later. 

If you don't know much tech stuff, maybe you should take the time to learn some now. If you don't know any investors or members of the startup press corps, maybe you should get out there and start meeting them. Whereas in the one and done mentality you can't be distracted by that long-term stuff because this is your only chance and every hour needs to be spent on critical path.

There are dangers in thinking this way though. For one, it becomes easier to give up and move on to the next project because you know there will be next projects. I've made that mistake. 

You can also get too distracted on learning new stuff and never actually get anything done. Luckily, I haven't made that one. You have to strike a good balance between critical path and long-term stuff. I try to use that stuff as fuel to avoid burnout.

Perhaps most importantly though, thinking about startups as a career makes it easier to really commit. It's too easy to half-ass it if you are going to do one and be done with it. There are just too many fall-backs, and you can fall into traps that kill your startup from the inside.

If you're on the startup career path though, this is it. You become a real member of the small startup community, and are, at least in my book, immediately respected for drawing that line in the sand.

Update: some good comments on HN.