Recently in Technology Category

Google Web spam

 
Yesterday, @mims wrote this post on "content-mills," which prompted this discussion on HN about Web spam. Many of the comments are by moultano, who is on Google's search quality team. This particular comment really drew my attention:

I doubt you'll find MFA spam to be better on DDG than on Google, but please, if you see a query where they are beating us. Send it over. :) I can guarantee you that I'll get a lot of eyes looking at it.

At DDG, I mainly crawl looking for these types of spam domains. On my last crawl, I identified about 37.8M domains as spam in the com/net/org/biz/info/us TLDs. I found Web sites at another 61.3M domains; the rest timed out. So roughly 40% of the domains I visited (with sites) were spam.

I just took a random sample of those spam domains and checked them against Google's index. All of this code as well as the sample and results are now on github.

First I started checking against Google's Web site directly, but their bot detection quickly shut me down. I was able to check 589 domains before being shut down, using the site: syntax. The results are here. The second column is the # of results reported in the index. For example, you can verify the first one with this query.

Of those I checked, 302 came up with at least one result, i.e. are in their index in some form. That means (extrapolating) roughly 50% of my spam domains are in Google's index, or about 19M domains.

Once shut off, I moved to Google's search API to process the full 10K sample. Interestingly though, it apparently returns very different results. For example check out web vs api. The Web shows 1 result, whereas the API shows none. 

Weird. I carried it out anyway though. Of the 10K full sample, I found 719 in Google's API index, or 7%. If you extrapolate that to the full list, that would be ~3M spam domains in the index. 

In any case, these #s are pretty conservative estimates because a) I'm only covering about half the domain space (missing all the country tlds except .us), and b) I know I still have a lot of false negatives (please send me them when you see them).

On the other side, the way I do the identification, there are minimal false positives at the time of identification. However, sites turn from spam/non-spam all the time, and since it takes me a while to crawl, there are certainly a few false positives in there. 

There are also legitimate false positives, and if you see those, please report them as well. I did nothing to hide those from view here, so you can see for yourself in the results.

Of course this says nothing about how much they appear in the rankings. I tried to find the modern equivalent of Metaspy to get some random queries, but I couldn't find such a such a service in existence. Nevertheless, half of the spam domains are not in the index, so it begs the question why the difference? 

If people have lots of links from Google results saved, I'd be happy to run them against my list.

Weird eHow Web spam

 
fnboelwein.com redirects to http://www.ehow.com/apply-card-credit-online/. As does bankofelgin.com.

If you actually go to the link, you get this message at the top:

Hi There! bankofelgin.com isn't available, but you're still in a good place -- ehow.com. We think we might have what you're looking for.

Doubt it.

Both domains have 64.74.223.39 as an A record, which is different than the redirect IP. And both have proxied whois records. However, they all have the same Server headers:

Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET

So I'm inclined to believe these domains are actually powered by eHow, which isn't too surprising since eHow is owned by Demand Media.

I wonder how many of these domains are out there? I found these two because they happen to both be part of my spam/parked domains training set.

I'm starting crawl #18 to detect and weed out such domains from DuckDuckGo. Each time I start a crawl I make sure I have no existing false positives or negatives in my training set.

Interestingly, every time a lot of domains flip from parked to unparked and visa-versa. These two fell into the false negative category since I wasn't labeling these pages as spam. Maybe I should...

Top linked domains from Facebook pages

 
I've been messing around with Facebook pages for an upcoming DuckDuckGo integration, and I came across some data that seemed interesting enough to share. These are the top linked domains from Facebook pages.

  1. myspace.com (269588)
  2. twitter.com (97669)
  3. youtube.com (54238)
  4. facebook.com (50234)
  5. flickr.com (12541)
  6. en.wikipedia.org (10578)
  7. reverbnation.com (10144)
  8. fotolog.com (7840)
  9. imdb.com (5250)
  10. purevolume.com (5217)
  11. last.fm (3740)
  12. linkedin.com (3730)
  13. soundcloud.com (3713)
  14. ilike.com (2652)
  15. cdbaby.com (2377)
  16. apps.facebook.com (2213)
  17. it.wikipedia.org (2019)
  18. sites.google.com (1984)
  19. etsy.com (1977)
  20. vimeo.com (1941)
  21. bebo.com (1939)
  22. sonicbids.com (1923)
  23. modelmayhem.com (1780)
  24. profile.myspace.com (1692)
  25. wix.com (1633)
  26. amazon.com (1613)
  27. soundclick.com (1401)
  28. tinyurl.com (1364)
  29. fr.wikipedia.org (1348)
  30. cafepress.com (1304)
  31. bandzone.cz (1279)
  32. freewebs.com (1264)
  33. es.wikipedia.org (1236)
  34. google.com (1195)
  35. itunes.apple.com (1113)
  36. zazzle.com (1004)
  37. dailymotion.com (996)
  38. friendster.com (982)
  39. imeem.com (901)
  40. bit.ly (895)
  41. profiles.friendster.com (893)
  42. new.facebook.com (872)
  43. virb.com (728)
  44. yelp.com (707)
  45. groups.yahoo.com (686)
  46. picasaweb.google.com (673)
  47. web.me.com (669)
  48. metroflog.com (657)
  49. geocities.com (633)
  50. bbc.co.uk (584)
In particular, this list aggregates domains extracted from links within the 'Website' sections of Facebook pages. For example, on the DuckDuckGo Facebook page there is link to the homepage (duckduckgo.com) and to the DDG twitter steam (twitter.com) within that section. Each of those domains would get one point in the aggregated list. If duckduckgo.com had appeared twice, it would still just get one point.

Of course, real people took the time to link to these domains in the context of promoting their online Web presences, so it was interesting to me what they chose in the aggregate. This data confirms my anecdotal evidence I keep seeing where people promote their FB and Twitter together. I was also intrigued by how high myspace was; I suppose a lot of bands still use it and/or haven't updated their old FB pages.

There were a few sites I actually hadn't heard of, e.g. some of the music stuff, wix, modelmayhem & virb. Not that I should hear of every site, but these must have a lot of traction already to be that high in these lists. 

If you just look at "high quality" FB pages (custom urls, no default images, etc.), you get a similar but slightly different list & ordering.

  1. twitter.com (32597)
  2. myspace.com (28125)
  3. youtube.com (11511)
  4. facebook.com (11007)
  5. flickr.com (3225)
  6. reverbnation.com (1413)
  7. linkedin.com (1267)
  8. en.wikipedia.org (1016)
  9. ilike.com (787)
  10. last.fm (763)
  11. purevolume.com (685)
  12. soundcloud.com (681)
  13. vimeo.com (608)
  14. imdb.com (483)
  15. apps.facebook.com (411)
  16. bebo.com (389)
  17. cdbaby.com (376)
  18. sonicbids.com (372)
  19. bit.ly (337)
  20. tinyurl.com (279)
  21. itunes.apple.com (273)
  22. google.com (264)
  23. fotolog.com (263)
  24. friendfeed.com (259)
  25. nscs.org (242)
  26. imeem.com (241)
  27. etsy.com (212)
  28. it.wikipedia.org (209)
  29. modelmayhem.com (200)
  30. itunes.com (188)
  31. amazon.com (178)
  32. cafepress.com (171)
  33. delicious.com (154)
  34. yelp.com (153)
  35. zazzle.com (153)
  36. dailymotion.com (146)
  37. virb.com (133)
  38. ustream.tv (111)
  39. soundclick.com (100)
  40. bbc.co.uk (99)
  41. legacyrecordings.com (97)
  42. friendster.com (93)
  43. blogtalkradio.com (92)
  44. digg.com (92)
  45. formspring.me (90)
  46. picasaweb.google.com (90)
  47. lululemon.com (90)
  48. woodstock.com (86)
  49. groups.yahoo.com (86)
  50. de.wikipedia.org (86)

Here are the top types (counted for pages with at least some info on them).

  1. Musician (421154)
  2. Other Business (417494)
  3. Other Public Figure (232868)
  4. Professional Service (140365)
  5. Non-Profit (129352)
  6. Website (106490)
  7. Products (95604)
  8. Education (76957)
  9. Store (64733)
  10. Visual Artist (61575)
  11. Club (61043)
  12. Restaurant (56447)
  13. Health and Beauty (51983)
  14. Sports / Athletics (49001)
  15. Fashion (46189)
  16. Food and Beverage (43737)
  17. Communications (36180)
  18. Athlete (30520)
  19. Religious Center (29811)
  20. Technology Product / Service (28728)
  21. Actor (28101)
  22. Hotel / Lodging (27358)
  23. Sports Team (27322)
  24. Online Store (26261)
  25. Film (25190)
  26. Religious Organization (25093)
  27. Writer (25082)
  28. Bar (24957)
  29. Politician (24139)
  30. Consumer Product (22421)
  31. Comedian (22359)
  32. Real Estate (22296)
  33. Technology and Telecommunications Service (21176)
  34. Model (20944)
  35. Event Planning Service (20671)
  36. TV Show (18739)
  37. Museum / Attraction (17960)
  38. Game (16729)
  39. Travel (16097)
  40. Pets (15511)
  41. Retail (15123)
  42. Travel Service (15018)
  43. Automotive (14530)
  44. Cafe (14237)
  45. Government (13601)
  46. Medical Service (12909)
  47. Automotive Dealer / Vehicle Service (9729)
  48. Home Living (9227)
  49. Home Service (8247)
  50. Library / Public Building (7086)

Note that data from this crawl was completed before the whole open graph/like thing, so these were all "real" pages. I'm currently crawling all the new stuff and working on ways to "keep it real," so to speak.

My Gmail is fast again

 
Gmail_logo.pngAfter my super-slow Gmail post was picked up on HN and on NYT, Google reached out to me. I gave them my username and 39hr later my account is back to normal.

I got approval from the person who communicated with me to share the following snippet of our conversation.

"The team is still looking into your account slowness, but it initially appears that the problem is isolated to a small subset of Gmail users...They are still investigating the root cause of the slowness but in the meantime have moved your account to a different set of servers, which should help."

Gmail has become unusably slow

 
When I switched to Gmail in 2004, I believed the hype. Never delete a message again--no need. We have tons of space, and you can search it all really fast like Google.

That time has passed. Gmail has gotten slower and slower for me, and as of the last few weeks it has become unusably slow. Before you ask, yes, I've tried it across lots of browsers and computers.

It can take 20sec to switch labels, and even longer to search for something. But here's the worst part--it takes just as long to send a simple message!?! Why? What does sending have to do with anything?

It's become the bottleneck in my day, and I don't know what to do about it. And I'm not alone.

A few days ago I decided to start taking action. First I emailed support. OK, first, I tried to email support. 

Have you ever tried to email Google support? It's almost impossible to find the contact form. Here's the support home page. I dare you to find out where to report this slowness issue.

You get to this page on slowness. After going through the wizard, you click on 'report your issue' at the bottom, and it takes you here. Wait, that's not a contact form, and you can't get to one from that page! Anyway, here is a contact form; I found it going through another problem wizard.

Needless to say, I haven't heard a response :)

Next step: I disabled chat, buzz & tried the older versions of Gmail. No luck. Then I disabled all labs, after which I perceived a very modest improvement, but still unusable.

Next I removed most of my labels. I have four now (down from 32). This seemed to help a bit as well, but still not much.

So this morning I went drastic. I deleted all my contacts and started deleting mail. Ridiculous huh? That totally breaks the original selling point of Gmail, but like I said I'm at wits end here.

Deleting stuff has resulted in the biggest improvement so far, but it's still slow. Perhaps a bit better than unusable now, but still terrible.

You are currently using 4247 MB (56%) of your 7459 MB.

In a last ditch effort, I bought some extra storage from Google thinking maybe I'd get some kind of premium level service. So far, no.

Google's been recently launching lots of cloud products, most recently a storage product to compete with Amazon's S3.

In other words, they obviously have the resources to make Gmail fast. So what's the deal? They must know about the slowness. The only reasonable explanation is that they are consciously under-resourcing it. Again, why?


Update: there are also a lot of good comments on HN.

Update 2: after a bunch of testing with my account, I'm confident at least my slowness involves something around having more than 4GB of mail. I deleted a lot of messages and got down to 3.6GB. It was then relatively fast again. I then sent myself a 25MB file (the limit) repeatedly until I got back up to 4GB. Right after 4GB, it got slow. Go figure.

Update 3: Google reached out to me and "fixed" my account. Here is what they said

A FB ad targeted at one person (my wife)

 
ad.png
The other day I gave a presentation with Steve Welch on the use of social media in politics. Steve was was walking through (live) the process of creating a Facebook ad. 

He started targeting the ad by location and interest, and the number of potential people he was reaching began decreasing on screen (Facebook tells you dynamically). Then I got to thinking--could you target an ad at literally one person? 

In theory, it wouldn't seem that difficult, given you can target by a lot of different things: interests, school, location, workplace, etc. If you could actually see their profile it would be super easy, but I don't think you even need that much. With just basic facts about them, e.g. their LinkedIn, you could target an ad sufficiently narrowly to reach essentially just him/her.

So of course the next step was to actually try it.

My wife goes on Facebook a lot to look at pictures and stuff. She's also mentioned many times how she actually likes the ads because they're targeted pretty well at her interests. So I thought she'd be the perfect subject.
reach.png

First I made the ad (above). Toast is a name we sometimes call our son Eli. Yes, I even messed up grammar in the title of my ad...

Then I started targeting. It proved just as easy as I thought it would be. First I  targeted to literally just her by using the stuff on the right plus her major and gender. Btw, I couldn't get FB to say it would target any lower a number than 20 people (although it does say fewer.)

But when I logged into her account, and it wasn't showing all the time on top.

So I increased my CPM bid. But it still wasn't showing all the time on top (it showed at least occasionally then though). Side note: you can see a bunch of ads by going to the ad board page.

Then I backed off a bit (to the targeting on the right) so I'd be included as well. I logged in and found the ad on my account and "Liked" it as well as clicked on it. My thought was maybe their ad system would then perceive it as a better ad and show it more. This seemed to work, but it is hard to tell whether that really had the effect or not. In any case, it started showing up a lot more. Not all the time, but when it did show up it would stay on top for a number of page views.

Then I waited until the night and subtly prodded her to check out Facebook. We went through old photos of Eli and it was just sitting there on the right on many of the pages, but she didn't notice!

I saw first hand why CTR is so low on FB. I steered us towards the album with the picture I used for ad and literally there was the big version of the picture and then the ad on the right (below). 

And then she noticed it. 

She immediately got what was going on, she looked at me and we broke out laughing pretty hard for a while.


ad2.png
This was of course all in good fun, but I also think there could be some good business cases for this technique :)

Twitter RT Test Results

 
Test: I asked @duckduckgo followers to RT this tweet.

tweet1.png

I also RTd it from @yegg (my personal account) with slightly different text.

tweet2.png

Hypothesis: I wasn't sure what to expect, but figured I would get a bunch of RTs because my followers seem pretty solid (not spam, auto follows or other non-sense).  After that, I thought maybe I'd get some 2nd level RTs. I wasn't even holding out hope it would go viral, and of course it didn't.

Results: I tallied up the RTs using twitter search. The @duckducko tweet was RTd by 18 people using Twitter's RT system. The @yegg tweet was RTd by 6 people. Then there were 11 people who RTd it on their own, 4 of which got RTd 1 time each. This totals 39 RTs.

All of these people for the most part aren't spammy either, i.e. their accounts look real, with real followers. Counting them up they had 4,406 followers (avg 126, min 6, max 408). I threw out one outlier who had 22,150 followers but was also following 24,308.
 
As you can see from the tweet, I linked to a special URL, dukgo.com, that hardly anyone uses, and so I used it to track clicks. All told, 73 clicks. So that's ~2 clicks per RT--not too good.

I also considered RTs/followers. @duckduckgo has 848 followers, so that's 3.4% who RTd it. At the second level you have the 4,406 followers tied to those RTs plus my 911 followers for 5,317 followers. At 10 RTs, that's 1.8%. So you can see the drop off.

Here are my takeaways:

  • You need a lot of followers. I'd say you'd need two orders of magnitude more, i.e. 100K real followers, to make this at all worthwhile. Then we're talking on the order of 10K clicks.

  • To go viral, you'd need RTs by important people. I'm really grateful for all the RT support, but no one had a ton of meaningful followers. I think you'd need that celebrity push to get it out there, which may kick off other celebrity RTs.

  • To go viral, you'd probably need more compelling content. Of course this test was business related, but you'd probably need it tied to either more of a fad/news story or have more of a hook, e.g. a super interesting Web page on the other side.

  • Viral coefficient is not 0. There were second level RTs. If the content also lent itself to RTing, i.e. it was a game or something that involved tweeting, you might be able to bring that up and keep the chain going.

I also tried Fiverr, a service where people say what they'll do for $5.  I spent $15 on 3 people who seemed legit and said they would retweet to all of their x thousand followers. Only one has done it so far, and that yielded a RT by 1 person (none of which I counted above). So I'm guessing that is not going to be a good advertising channel.

What I installed (and uninstalled) on my new computer

 
I just bought a new desktop. Right before it arrived I was listening to Chris Dixon on Mixergy opine that Skype probably couldn't happen today because people don't trust downloads anymore. Somewhat tangentially it got me thinking that people probably download less software now as well because of the ascendence of the cloud, and I probably don't need to install/download much software on this new computer. 

I'll let you decide whether that was the case or not. I wrote down everything I installed (and uninstalled), in order.

  1. Windows updates. I had to go through a few rounds of rebooting to get them all installed. That's pretty amazing (and annoying) since Windows 7 just came out, but whatever.

  2. Google Chrome. My Web browser of choice (at the moment). I love how it syncs my bookmarks now too, which is one of the main reasons I installed it first.

  3. Adobe Acrobat Reader. First thing I did was check my email and someone sent me a PDF. Really, you can't pre-install a PDF reader? There might be a better one to install, but I just went with what I know.

  4. Adobe Flash Player. Next thing I did was go to a Web site that required flash...

  5. Skype. I use Skype all the time, especially to enable video chat between my son and my parents.

  6. Vodburner. This is a Skype add-on that I pay for to help me record Skype video chats for my traction interviews. While I was installing Skype I figured I might as well get this set up too.

  7. Nvidia GeForce GTX 260 drivers. I have two 28" HannsG monitors (HG281) at 1900x1200 resolution (1080p). Yet they were rendering at 1920x1080 and everything was blurry. First it took me a long time to figure out the resolution was wrong. Then it was was really annoying to fix because it wouldn't let me set a custom resolution. So I went to the nvidia site, which sent me to the hp site and their their download said it wasn't compatible with my computer. So I went back to the nvidia site and found the latest drivers. After install, everything worked fine. Side note--now Windows wants to install an "update" of the old drivers. I told it to "hide" that update :).

  8. Putty. Once the text was clear, I wanted to check something on my servers. I use putty for that.

  9. Sonos Desktop Controller. This whole time I was listening to Pandora over my Sonos system. A sucky song came on and it was clear it was also too loud. So I installed the controller that lets me control the music in the house from my desktop.

  10. PGP Desktop. I keep my passwords and other important docs on an encrypted virtual drive that gets mounted as a regular drive by this software. I needed my passwords for facebook, twitter, etc. (I use random passwords), so this was next.

  11. Adobe Illustrator CS2. I have a folder for software to install with this and partition magic in it as I purchased both of them via downloads a long time ago. I saw it next to my PGP folder, so I went ahead and installed it next as I know I'll need it soon enough.

  12. ITunes. Eli (my son) watches videos through here, and it syncs my iPad and iPod Touch, which I use for development.

  13. Firefox. I debug stuff in Firefox, and got an email about a bug, so I decided to download it next.

  14. ForecastFox, Web Developer, Firebug, YSlow. These are the Firefox add-ons I use regularly. While I was installing Firefox I thought I'd go ahead and add these.

  15. Safari. I use it just for testing. But while I was doing Firefox I thought it would be good to just do now.

  16. Opera. Same story.

  17. Uninstall Norton Stuff. Ugh, I hate this stuff and wish I had an option not to have it pre-installed in the first place.

  18. ClamAV. This is my replacement for the virus part of Norton. The other parts I'm fine with the pre-installed Windows firewall and Windows Defender. I have a smoothwall setup in my house for more firewall protection.

  19. WinSCP. This is the other piece of software I use to routinely connect with my servers (for transferring files). I needed to transfer an image, so this was next.

  20. Quickbooks 2008. I went down to the basement to get CDs. This was one of them. I use it to do company accounting.

  21. Adobe Photoshop Elements 5.0. On CD. I use it to do image manipulation. Great deal actually--I've found I've never needed more than this "elements" version.

  22. Picasa. I manage my photos in picasa. Photoshop made me think of it.

  23. Adobe Premier Elements 2.0. On CD. I use it to edit video sometimes, though I don't recommend this program. I just don't have a good alternative at the moment.

  24. Vmware Server. I use it to develop with. I have a FreeBSD image that mimics my servers. I wanted to fix some bugs so this was next.

  25. Uninstall Microsoft Works, Office Home and Student Trial, PowerPoint Viewer, Compatibility Pack for the 2007 Office System. I wanted to install Office 2010 (from BizSpark), but it wouldn't let me install the x64 version before installing all remnants of x32 versions (this stuff). I find this odd since I have an x64 version of Windows--so why would they pre-install x32 versions.

  26. CutePDF. While the uninstalling was going on, I needed to PDF something (I save receipts this way).

  27. WinRAR. Someone emailed me a giziped file, and this is my decompressor of choice.

  28. Microsoft Office 2010. Once that other office crap finished uninstalling, I installed this.

  29. Gmail notifier. Alerts me of new emails.

  30. Gmail notifier https patch. Come on--are you ever going to update the notifier to include this natively? I use https gmail and it doesn't work with notifier without this patch.

  31. VNC. I use this to connect to my desktop from my laptop (usually to get a password). When I went back to my laptop I noticed it was missing :).

I could be an outlier, but that sure seems like a lot of software to me! All in all it was ~20 downloads and 3 CDs. If I wasn't a developer at all, I think I'd still have done ~10 of them.

My personal URL shortener, ye.gg

 
To my surprise and delight, I noticed yesterday that the domain ye.gg was available, and I quickly gobbled it up. .gg is the country code for Guernsey, one of the Channel Islands. It wasn't cheap (GBP 88.00, ~$135 USD), but it's worth it to me!

Side note: in this process I found Domainr, which helps you find short domains.

Unlike godaddy et al., it took ~24hr for the .gg domain to be setup in DNS. So while I was waiting yesterday, I searched for a provider to run my URL shortener. The two providers I found that seemed like they may work were bit.ly Pro and awe.sm, which TechCrunch apparently uses.

I was quickly accepted into the free beta of bit.ly Pro (thanks!), but it has two limitations that prevent me from using it. First, they won't redirect ye.gg/ (with no shortcode) to my Web site. Second, they share the hashspace with everyone else, meaning I can't make ye.gg/1 ye.gg/2 etc. because they're already taken by regular bit.ly users.

Awe.sm looks cool, but they're in closed beta or are charging $99/mo. I'm only going to be making a few short URLs a month (for blog posts), so that price seemed way too steep. I emailed asking if I could get it in on the beta, but haven't heard back.  I can't blame them for not turning around in minutes, but I'm itching to get this thing up!

So I decided to roll my own thing for now--the most basic thing I could come up with in a few minutes. Here's what I did.

  • Pointed the DNS to my server that runs this Web site (via DNS Made Easy).

  • Cooked up this small Perl package.
package yegg;

use nginx;

sub is_rewrite {
    my $r = shift;

    my $uri = $r->uri || '';
    return 0 if !$uri || $uri =~ /[^0-9a-d]/o;

    my $rewrite = 0;
    my $file = qq(/usr/local/ye.gg/$uri);
    if (-f $file) {
        open(IN,'<',$file);
        $rewrite = <IN>;
        chomp($rewrite);
        close(IN);
    }

    return $rewrite;
}

1;

This is intended to run within nginx (my Web server), using the embedded Perl module. All it does is look for the existence of a file matching the URL in the /usr/local/ye.gg/ directory. If found, it opens the file and returns the URL within it. So if I want to make http://ye.gg/angel work I just create the file '/usr/local/ye.gg/angel' and put 'http://www.gabrielweinberg.com/angel.html' in it.

  • Added this code to nginx conf.
    perl_require "/usr/local/etc/nginx/yegg.pm";
    perl_set  $rewrite  '
sub {
  my $r = shift;
  return yegg::is_rewrite($r);
  return "";
}
';

This just uses the the above package and puts it into the $rewrite variable. So when a request comes in, it sets that variable by running the function I defined in the package (is_rewrite).

  • Added more code to my nginx conf.
    server {
        server_name  ye.gg *.ye.gg;

        if ($rewrite) {
          rewrite ^. $rewrite permanent;
        }

        location / {
          rewrite ^(.*) http://www.gabrielweinberg.com permanent;
        }
    }

This says if $rewrite exists (there is a URL to go to), redirect to it. Otherwise, always redirect to my home page.

And that's it--it works! One issue with this setup that I couldn't immediately solve is it checks for the file existence on every request, regardless of whether they are ye.gg request or for other domains. That is, perl_require and perl_set don't seem to operate within server blocks. Not sure why. Anyway, I'll leave that for another day unless anyone has any insight.

Do people subscribe to blogs less now? My blog's #s.

 
Maybe I have rose-colored glasses on, but I remember it being easier to get blog subscribers (a few years ago). Right now I'm getting ~0.05% conversion, extrapolating from these FeedBurner and Google Analytics numbers.

feedburner1.png
analytics2.png
That is, 10K visits for a blog post yields about 50 new FeedBurner subscribers. The sharp increase at the beginning of the year correlates to my increased post frequency.

My sense is that the increased posts not only draw more visitors per unit time, but also keep the blog more present in peoples' minds, making them more likely to subscribe. From Apr 2008 to Jan 2010 I had 48,097 new visitors and then 82,733 new visitors since Jan 1 of this year. But my FeedBurner #s have more than doubled over that period.

Here's the data from the past 30 days.

feedburner2.png
analytics1.png

What I find interesting is that the major posts did not spike FeedBurner in a similar way. It's still a steady increase. My guess is to get a major spike you need someone major recommending your blog in a post like this

Yet to get on a list like that seems sort of random. I think you have to be out there putting out good content regularly so that when someone does make a list like that, they think of you.

Over the whole period, these posts have been the biggest.

analytics6.png
The first column is unique page views. If you sum the %s (taking out the home page), these top 9 posts (out of 107) make up 52%.

Here's where all this traffic comes from.

analytics7.png

Thank you Hacker News and reddit! Without you, my blog #s would be pretty pathetic.

The Google stuff is pretty much all to one post I wrote on Skype high-quality video, which seems to capture a lot of people searching about that. I find that a bit odd in that I used to remember getting a lot more random organic traffic.

With all this in mind, do people subscribe to blogs less now? My hunch is yes and it is due mainly to a few factors.

  1. The rise of social link sharing has really taken the compelling reason out of subscribing to blogs, i.e. that you will miss something awesome. The argument is that if it is so awesome someone will share it with you. I don't think this is quite true, however. As someone who subscribes to a lot of blogs, at least half of the good content I see I don't see on those services. 

  2. Remember when RSS readers were hot? Well now they're not. The business models never really seemed to pan out, and I think that deflated a lot of the interest (and in turn innovation) in the ecosystem. Related to that is they never seemed to really break mainstream as a lot of people thought they would.

  3. The twitter fan relationship. A lot of people seem to opt to follow on Twitter instead of subscribing to RSS to the extent that some people completely ignore RSS in favor of twitter. On Twitter, you get more than straight links, so maybe that is part of the appeal. Again, I disagree, though. I find often I just want the posts and don't want to miss anything. It's real easy to get behind on Twitter and all the UIs really make it too easy to just give up on old Tweets.

About

   

My home page.

Online Karma

-
From a new search engine

Online Profiles

-
From a new search engine