Duck Duck Go Searches Are Now Externally Anonymous

 

duckduckgo.pngDuck Duck Go searches have been internally anonymous for a while. IP addresses are not stored. Neither are full user agents.[1] 

When you search at Duck Duck Go, there is no way for me to know who you are, or tie your searches together. For more info, check out the privacy policy. It's short--206 words; compare that to Google's 2,137 words.

When you search on Google, not only is your info stored, but also when you click on a link, your search terms are passed on to that site via the Referer header. A lot of sites use this information to tailor content and advertising to you specifically. Your searches also show up in analytics tools, which people use for SEO and other tracking purposes. This information leakage creates legitimate privacy concerns.

If you use the encrypted version of Duck Duck Go, the Referer header is not sent as per the HTTP standard. However, not everyone wants to use the encrypted version because it is slower to initially connect.

As of today, the Referer header is also not sent when you use the normal http version of Duck Duck Go. In other words, your search terms are not leaked to the sites that you click on, regardless if you use the encrypted version or not.

Referer headers are sent by your browser (client side) automatically, so I can't control it from my servers.[2] As a result, I'm currently using a meta refresh to force a client side redirect, and if meta redirects are turned off, a JS location.replace from that redirect page.[3]

After a lot of testing, I've determined there is negligible slowdown with this process due to the way I've implemented it. (In fact, most search engines make a background request for click tracking already--I don't.) The meta page is sent in memory via the nginx echo module. As you're already in a keepalive state, this happens essentially instantly. 

However, I realize that some people may want to turn this off, e.g. if you want your search term leaked. In conjunction with this release, I've also added a redirect setting to do just that. You can also force it one way or another using URL parameters.

A related issue was brought to my attention a few months ago on reddit, which has to do with the use of images served from Amazon's S3 service. When those images (mainly favicons, but sometimes in the 0-click box) are requested, your browser sends to Amazon the Referer header, which includes your search. In response, I had made four changes to address this issue.
  1. I added a setting to use POST requests, which solves the issue completely.
  2. I added a setting to disable favicons.
  3. I added a setting to disable the 0-click box.
  4. I started making all calls to S3 over https, so the headers would not be sent in plain text.
There are two issues with these. First, they all impact usability. POST requests break the back button and URL copying; you may want to see the images; and https slows things down a bit. Second, a couple days ago it was pointed out that despite these changes, the Referer header is still sent to Amazon, albeit encrypted, and they could be storing it. 

I decided to ask Amazon about their logging policy. They have a setting called Server Access Logging, which I do not have turned on, and so their logging policy in this case was unclear. Apparently they do log even if Server Access Logging is turned off.

All of the information exposed via Server Access Logs is in our internal logs - including referrer strings.

There were a lot of good suggestions on Hacker News on how to address this issue, but they all similarly impacted usability in one way or another.

I have now solved this problem by setting up a reverse proxy between me and S3. This costs me more bandwidth and server resources, but it is worth it to solve the privacy problem for you. Additionally, it actually improves usability because a) I set up a cache on my end and b) I can now turn off https to S3.

Furthermore, it is even more private than simply dropping the Referer string. Since you are no longer making the request on your side, your IP address isn't being sent to them at all. I can also explicitly set the Referer string (using the nginx more headers module), which I set to 'http://duckduckgo.com/';

I welcome feedback on these new processes. As they are new, I'm sure there are bugs to work out and further optimizations to work in. I already have a few in mind myself.


[1] Actually, user agents currently are not stored at all. In the future, however, I may compress user agent strings to short codes, e.g. FF for FireFox. For reference, the current nginx logformat is as follows.

logformat  main  '127.0.0.2 [$timelocal] "$request" $status $requesttime $bodybytessent "$http_referer"';

[2] As I noted, if you click on an http link from an https page, the Referer is not supposed to be sent. However, if you have the server redirect you from an https link to an http link, clients will pass the Referer header through. Annoying!

[3] Note that this client-side, so if you have a client that doesn't behave, it may not work. I've tested it on most modern browsers, including Chrome, Safari (including iPhone/iPad), Konqueror, FireFox, Avant, Opera, and IE (including IE6).  

 

If you have comments, hit me up on Twitter.
I'm the Founder & CEO of DuckDuckGo, the search engine that doesn't track you. I'm also the co-author of Traction, the book that helps you get customer growth. More about me.