June 2011 Archives

The real Filter Bubble debate

 
Google Search Filter Bubble - Duck Duck Go.png

If you're not familiar with the concept of the The Filter Bubble, check out this infographic we put out a few days ago about how it applies to search engines.

Currently, much of the debate seems to be whether segregating results based on personal information is good or not. I think that is the wrong debate and a false dichotomy. It can be good, e.g. isolating movie listings based on your zip code. And it can be bad, e.g. limiting the display of certain political viewpoints based on your search and click history. 

From my perspective, the real debate is over a) which personal signals should be used; b) what controls should we (as users) have over how our personal signals are used; and c) how results that arise from the use of our personal signals should be presented. Personal signals are fundamentally different than other signals because as soon as you introduce them different people start getting different results. 

The central point of the Filter Bubble argument is that showing different people different results has consequences. By definition, you are segregating, grouping and then promoting results based on personal information, which necessitates less diversity in the result set since other results have to get demoted in the process. Of course you can introduce counter-measures to increase diversity, but that is just mitigating the degree to which it is happening. Consequences that follow from less diversity are things like increasing partisanship and decreasing exposure to alternative viewpoints. 

My view is that when it comes to search engines in particular, the use of personal information should be as explicit and transparent as possible, with active user involvement in creating their profiles and fine-grained control over how they are used.  Personalization is not a black and white feature. It doesn't have to be on or off. It isn't even one-dimensional.  At a minimum users should know which factors are being used and at best they should be able to choose which factors are being used, to what degree and in what contexts.

If you do not do that, and instead rely on implicit inference from passive data collection (searches, clicks, etc.), then the search engine is just left to "guess" at your personal profile. And that's why the examples from The Filter Bubble seem creepy to a lot of people. It seems like the search engine algorithm has inferred political affiliation, job, etc. without being explicitly told by the user.

This is not a conspiracy to segregate people and I'm the farthest from a conspiracy theorist you'll probably find. It's just a natural consequence of algorithms that cluster people. 

The questions then become 1) are they clustering me correctly; and 2) even if they are, do I want the fact that I belong to this cluster to influence my results for this particular search or type of search?

Some people may want restaurant recommendations based on the implicit guess of their race and income class. Whether you care about that sort of thing largely determines where you come down on the debate. If you don't care at all, then you probably don't care if you ever know that your results are different from other people and how they differ based on your personal information. 

On the other hand, if you do care, then you might want to know how and why a result based on your personal information got in front of you. You might also want to have much more fine-grained control over how particular personal signals are used (akin to privacy settings).

In other words, some people prefer to self-segregate and are interested in any and all forms of "personalization." And some people would prefer that segregation should not occur without explicit user choice.

Please note I'm not disputing that showing people different results may result in "better results" for people. I agree that there is no universal best result for all queries. 

What I'm saying is that you can get to that better result for a particular person in a number of ways. And I think that when it comes to search engines in particular, personal signals should be dealt with delicately and with active engagement from the user. It's two paths to the same thing, but the latter involves vastly more user choice and control. 

How do you act on all that product feedback?

 
funny-pictures-cat-ignores-your-phone-call.jpg

Over the past few years, I've become totally convinced that being as close to your users as possible (via minimum viable products, feedback mechanisms, etc.) greatly increases your chances for product success.

However, once you have a decent sized user base you immediately run into the problem of what do with all that feedback. The suggestions you're getting quickly outpace your ability to act on them, and clearly you shouldn't act on all of them anyway.

I don't have the silver bullet answer of course but I'd like to suggest that the get satisfaction model of counting up the ones with the most votes and acting on them in that order may not be your best move in all cases. Consider the following:

  • Really small changes/tweaks can make a big difference. At my last company we ran significance tests on single word copy changes, and some led to dramatic percentage increases in funnel outcomes.  

  • Small changes that make a big difference are often non-obvious/intuitive. That is, many iterations are often required (hill climbing) before you really nail something. 

  • Adding features can lead to interface clutter. 

  • Product polish has non-linear effects. If you absolutely nail a product experience (polish) it can go word-of-mouth viral (induce non-linear effects).

In other words, you are often faced with a real choice given your resource constraints: you can go in the direction of being a jack of all trades, master of none (delivering many minimum viable features people are asking for); or, you can go in the direction of the master of one (or few). 

Both are certainly viable strategies and have led to many successes. And I suspect that there are certain situations where one or the other makes more sense.

However, my hunch is that many startups fall into the former category (jack of all trades) almost accidentally because they don't have the will/vision/stubbornness/whatever to buckle down and do the latter. That is, they are not making an explicit choice, which may ultimately not be in their best interest.

The reason is that delivering features people ask for is the path of least resistance. Not delivering them requires you to essentially ignore (or at least gracefully put off) huge obvious feature requests and focus diligently on stuff that seems much smaller, and to the untrained eye, perhaps trivial.

And that's the key. Are these small things really trivial or are they part of a larger product vision where you end up with a truly polished product? It's often hard to tell, and sometimes really a probabilistic bet. You really never know if you can nail a product experience until you do.

It's a counter-intuitive strategy and often involves working on some features that no-one even notices but makes their experience smoother or a series of "advanced" features that 5% of your users will use but a different 5% for each feature (meaning that almost everyone adopting has a smooth experience). 

It's also counter-intuitive because it seems harder to defend from other companies. You're not adding more features to a feature chart. But what's not easily understood is your small changes are actually hard to copy because you've made a ton of small decisions that others won't implement the same way, and so the copy-cat will end up with very different funnel results.

Bringing this all back to product feedback, you shouldn't ignore that one-off request/comment just because it is one-off. It may be a real piece of the puzzle.

On weird botnet traffic

 
Botnets keep sending DuckDuckGo weirder and weirder traffic, and frankly I don't get it. For a while now I've seen a lot of requests like these:


I suppose those forms make some sense. I presume they are looking for sites running exploitable software, and so they set up automated queries to search engines to find new sites. 

However, what doesn't make sense is sending the same query hundreds of thousands of times a day from each machine. Someone presumably took the time to carefully construct these queries, given that they generally appear to be in the right form. And yet they send back the same results a tenth of a second later, so why would you keep repeating them? A computer will pop-up and will just start hammering on the query. If I unblock it even days later, it is still doing it.

But that's not the weirdest behavior. For the past several weeks I've been getting tons of the exact same request:


These requests come in slower per machine but from a much greater number of machines. I honestly don't understand the point of them at all. Does anyone out there?

As you may know, DuckDuckGo does not save IPs (here's how). So if you're wondering how we go about blocking them, it happens all at the firewall level, which is dissociated from query data. If we didn't block the most egregious botnet machines and abusers, our machines would almost instantly be under water.

This discussion now makes me wonder if other search engines include this errant traffic in their query counts. We work hard to keep them completely out because they would overwhelm our real direct queries #s and therefore distort our perception of progress. We also separate out API requests for the same reason, which now also makes me wonder whether everyone else is doing that too.

On anonymous feedback

 
I just received another angry accusatory email that unfortunately I can do nothing about because it was anonymous. I'd love to get to the bottom of the accusation and correct any bugs if they exist, but I can't because there was not enough information provided to do so.

I get a few of these a week, and a lot more (non-angry not-accusatory) anonymous feedback via the DuckDuckGo feedback page and on the DuckDuckGo forum. We accept anonymous feedback both to maximize the amount of feedback we're receiving and also because it is in the spirit of our privacy policy.

For general suggestions and minor bugs, e.g. spelling corrections, anonymous feedback is fine. The problem is that for anything complex we generally need to correspond with the poster to figure out what is happening.

Many bugs involve the specifics of your browser, such as what extensions you have installed, what DuckDuckGo settings you have turned on/off and in many cases what query you performed. I've lost count at how many anonymous emails I've received where I don't have enough information to do anything worthwhile in response.

I understand the need to vent and that you may not want to associate your identity with your tone, or otherwise reveal your identity at all. However, if you're taking the time to write up this problem presumably you want it resolved (maybe not). Well, if you do, I need a way to contact you, which could be a throw-away email account.

I would suggest to provide as much specificity as possible in the original message, but there always seems to be something missing. And I usually really would like to reply in any case, if only to let you know when it is fixed or provide you with an extended explanation. 

About

I'm the founder of DuckDuckGo and an angel investor.