Recently in Programming Category

Perl submissions for Google Summer of Code

 

DuckDuckGo is mainly written in Perl. If you're a student and looking for something cool to do this summer, consider messing around with Perl. That's how I originally got into it! 

This is a guest post by Mark Keating, who is the Managing Director of Shadowcat Systems Limited and a Director/Secretary of the Enlightened Perl Organisation. He regularly writes on Perl Blog and Work Blog.  

The Perl Foundation (TPF) has once again been accepted as a mentoring organisation for the Google Summer of Code (Official Site) an annual initiative to encourage student participation in open source software and programs.

The efforts on behalf of the Perl Foundation this year are being co-ordinated by Florian Ragwitz (Rafl) and there is a dedicated channel (#soc-help) on irc.perl.org where help and advice can be obtained.

Students can suggest any topic to work on, but for those wishing to try something new, or are looking for general guidance, there is a list on the EPO Wiki which outlines some suggestions and ideas from a number of Perl projects. Once you have a project idea, speak to the other participants in #soc-help and we will advise you on application and match you to a mentor.

The program is accepting submissions from students and mentors, until Friday 8th April, so now is the time to get involved. The Perl community welcomes students who are new to its ranks alongside those who are already members.

Working around Android's screen.width bug

 
This android bug has been causing me trouble lately. When fetching a cached version of a page, Android sometimes sets the screen width to the previous page instead of the page you are on. This has the annoying side effect of completely ignoring your media -max-device-width CSS.

So when people first go to DuckDuckGo on their Android phone, it formats fine via the media CSS block targeted at mobile devices. Then they click on a link and click back, and all of a sudden it is no longer mobile formatted. Or they bookmark the homepage and then open it again, and the homepage isn't mobile formatted. 

I made a small demo to see the bug in action. Just visit it on your Android phone. It prints the screen.width variable out at the top--should be under 700. Then click the wiki link and wait for wikipedia to fully load. Then click back to the page (might have to do it twice to go through their mobile redirect). Now the screen.width says something different. Ugh.

I've had been trying to find a workaround off and on for a while to no avail until @amonti came up with something that works. You can (currently) just target later versions of the Android browser via CSS through a -webkit-min-device-pixel-ratio:1.5 media block, e.g.

@media only screen and (-webkit-min-device-pixel-ratio:1.5) {

I say currently because presumably other browsers may implement this feature. But in the meantime (while we wait for an Android fix) this workaround actually seems to work.

Usability issues with adding search engines to Web browsers

 

I get a lot of feedback around adding DuckDuckGo (a search engine) to users' Web browsers. I thought I would synthesize that feedback in hope that these usability issues might be addressed.

ieaddons.pngIf you have a search engine for your site (and most big sites do), you want people to be able to easily use it in their Web browsers. In Internet Explorer (IE) 7, Microsoft introduced a native JavaScript function called AddSearchProvider that lets users add a search engine to their search bar, and optionally set it as their default engine. The dialog box looks like this.


addtosearchprovider1.png

That means that you can put a link on your site that users can click on, which will pop the above dialog. And the dialog itself is pretty simple and straightforward. If I were to really nitpick, I'd say just delete the search suggestions line if not available (instead of graying it out). I also like how Firefox's equivalent says "Start using it right away" instead of "Make this my default search provider" because it is more inline with what the user is thinking.

Microsoft also created and still provides another function called IsSearchProviderInstalled that allows you to dynamically check whether your search engine was already added. You can then use this knowledge to hide the add link on your site (if the user already added it).

In conjunction, these two functions produce pretty great usability when adding a search engine to IE. Really my only complaint is that IE doesn't let you edit the search URL string after the fact, which is something more advanced users want so they can do things like force language preferences. To be perfect, I'd add that functionality along with the ability to submit POST requests.

Nevertheless, IE has by far the least usability issues with this process. Unfortunately, the functions Microsoft created don't work that great outside of IE. 


DuckDuckGoBar.png
On the other extreme is Safari, which is the absolute worst. Not only do they not implement the mentioned functions or equivalents, but you can't even add a search provider at all in the browser preferences! 

There is only one relevant dialog box, which currently gives you only three choices: Google, Yahoo! and Bing. Apparently these search engine choices are hard-wired into the browser itself, and the only way to change them without installing additional software is to hack the binary.

addtosearchprovider2.png

Additional software solutions aren't that great either. I was looking forward to the recent inclusions of extensions in Safari 5, but they didn't open up the search bar in the extensions API! Victor Quinn made a DuckDuckGo extension (it's open source btw), but it has to add a toolbar to provide search functionality.

addtosearchprovider3.png

Many people understandably hate the bar taking up all this extra space. The other current alternative is to install Glims, which is a plugin that does a lot of things including tweaking the search box. It is a bit of overkill though if you just want to make that one change. Perhaps the best solution right now is to make one plugin dedicated to this purpose like Inquisitor, but that is a lot of custom work to solve this one issue for Safari users.


edacflbhpmcimdanpfcibgafeknkgpia.png
Chrome causes me the second most complaints by far. It's also the most frustrating to me because it is so close to being good, but yet still so far. You'll see what I mean.

There is a specification called OpenSearch that allows you to add some meta tags to your site that describe the search engine associated with it. For DuckDuckGo, those meta tags like this:

<link title="DuckDuckGo" type="application/opensearchdescription+xml" rel="search" href="/opensearch.xml">link title="DuckDuckGo (SSL)" type="application/opensearchdescription+xml" rel="search" href="/opensearch_ssl.xml">

These have the effect on IE and Firefox of letting the user know via the search box that they have options to potentially add. Here's what that looks like:

addtosearchprovider4.png
addtosearchprovider5.png

On Chrome, however, there is no equivalent drop down because the address bar and the search bar are one and the same--a really cool feature that I truly like btw. What they've chosen to do with opensearch is add any search engine that comes up in meta tags automatically to a list of possible providers you could use as the default. The first issue with this process (easily fixable) is that they add new engines to the bottom of the list.

addtosearchprovider6.png

The problem is that when people go looking to add a search engine they immediately have trouble finding the one they're trying to add. In particular, they generally expect the one they're looking for to be at the top, especially since it usually corresponds with being on that site at that moment. Trouble is that the location of the engine is in the exact opposite place you'd expect, i.e. at the bottom. 

And since Chrome adds every site that has an opensearch plugin, and lots of sites have them, it can be a very long list. Furthermore, if you decide to change providers but originally visited the site you want a while ago (often the case), it's probably somewhere in the middle of the list. I've gotten many reports that people just couldn't find it when it turned out to be "hidden" in the middle.

Second, Chrome provides a dialog box (see below) to add a new search engine from a link via the AddSearchProvider function, like I described for IE above. Trouble is, if it is already in that mega-list AddSearchProvider silently fails and refuses to pop the dialog. Given that you usually use opensearch meta tags to surface the above pictured functionality for IE & Firefox, this dialog box basically never can show up, rendering it completely useless.

addtosearchprovider7.png

Nevertheless, it suffers from another problem, namely that there is no option (unlike IE/Firefox/Opera) to also make it the default search engine. So if you somehow manage to get the dialog, you're left wondering what happened and why isn't it working in the search bar? What happened is that it got added to the bottom of that long list, and to make it the default you'll have to open that list in your preferences, scroll to the bottom, and click 'Make Default'. 

As if that wasn't enough annoyance, IsSearchProviderInstalled is also essentially rendered useless. Since again everyone generally uses opensearch, and Chrome uses it to add providers automatically, search providers are always "installed" and so this function pretty much always returns true. This behavior means that you can't easily tell if a user added your engine or not and thus you are forced to keep showing 'Add to Chrome' links everywhere, even if the user already made it their default engine in Chrome. (Note sites often use cookies to save this info, which will work at least until the cookies are cleared.)


addons.opera.png
Opera causes problems too. My main issue with Opera is that they don't implement AddSearchProvider or IsSearchProviderInstalled or use opensearch at all, which means the user is forced to do everything manually.

They do have a path to do it, both in the preferences and via a right-click shortcut, but both cause a lot of confusion. The easiest way to find the preferences is by clicking the down arrow in the search bar and then 'Mange search engines...'. Simple enough, though annoying they don't just populate choices automatically from the meta tags, like IE and Firefox (pictured above).

addtosearchprovider8.png 

There is then an 'Add...' button that gives you this dialog box.

addtosearchprovider9.png

The trouble with this dialog is all the empty boxes. An average user has no idea what to type in any of these boxes, and no example text is given whatsoever! So instead, the path that I now sent people is on is to right click the search box and select 'Create Search...' like this:

addtosearchprovider10.png

This has the effect of at least partially populating the dialog box like this:

addtosearchprovider11.png

This still has a bunch of problems. First of all, you just can't click OK -- it is literally grayed out. At least let the user click OK and then tell them what they're doing wrong! It turns out you have to fill in a 'Keyword' to proceed, e.g. 'd'. I do give them extra points for adding a POST option though, which no one else does.


addons.mozilla.png
Which brings me to Firefox...  They're second behind IE in doing it right, but they still have two issues that cause significant complaints. They do get the basic process right though, which is great.

You can use AddSearchProvider to pop this dialog box and easily set an engine as the current search provider.

addtosearchprovider12.png 

This is arguably the best dialog box in the whole lot. It's simple, straightforward, and it works!

The issues in Firefox are more around the edges. First, the IsSearchProviderInstalled function is not implemented, which means you cannot easily tell if you've already installed an engine or not, which means you can't stop showing the 'Add to Firefox' links. Sound familiar?

The second issue annoys advanced users. Just like in IE, you cannot actually edit the search string for the search engine once added (or while being added). But what if you want to use URL params to customize the engine? Sorry, the only way to do it is to create a new opensearch plugin, which is why you start getting all of these.

***

In short, no one does this process perfectly. I compiled feedback on these issue in hope that this part of the browser user experience gets fixed in all these great pieces of software. The ideal process is pretty simple actually:

  • Make AddSearchProvider and IsSearchProviderInstalled functions work as one expects them to, i.e. not to fail silently, always return false/true etc. If you do have a concept of a default engine, let IsSearchProviderInstalled see that too, or add another function to query that boolean value.

  • Make the dialog box that results from AddSearchProvider allow you to a) make it the default/current search engine; and b) change the url string via an advanced section (that offers useful help text).

  • Use the well-established opensearch meta tags appropriately, i.e. to suggest engines to add (as opposed to ignoring them or adding them automatically).

  • Make it obvious how to change providers and edit them after they have already been added. Executing AddSearchProvider could pop an edit dialog, for example.

How-to not log personally identifiable information

 
funny-pictures-cat-shreds-paper.jpg

DuckDuckGo doesn't log personally identifiable information (PII). We simply don't save it.

Sometimes I get asked how to implement this privacy policy. It's pretty simple, but I wanted to explicitly spell it out in hope that others can more easily adopt similar practices.

The basic procedure is to go to everywhere you log stuff, and then drop all the PII where you see it being logged.  This procedure will probably amount to you dropping IP addresses and user agent strings from your Web server logs. For most Web apps, that's often all there is to it.

I use nginx (pronounced engine-x). Here's the default log format for nginx:

    log_format  main  '$remote_addr [$time_local] "$request" '
                      '$status $request_time $body_bytes_sent "$http_referer" '
                      '"$http_user_agent"';

The $remote_addr variable is the IP address and the $http_user_agent variable is the user agent, which can also unique identify people. You could just remove them, but that might break other log processing software. 

Instead, you can just replace them. Here's what I do:

        set $user_agent '';
        if ($http_user_agent ~* [\+\(]http) {
          set $user_agent 'Bot';
       }

        log_format  main  '127.0.0.2 [$time_local] "$request" '
                          '$status $request_time $body_bytes_sent "$http_referer" '
                          '"$user_agent"';

These changes have two effects. First, they will print 127.0.0.2 for everyone's IP address. Second, $http_user_agent becomes the $user_agent variable, which is blank for everyone but bots, which get logged as 'Bot'. I do that so I can exclude Bot traffic from reports. If you really wanted some user agent information you could simplify it to FF for Firefox, etc.

For Apache, it looks pretty similar, i.e.

    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

becomes

    LogFormat "127.0.0.2 %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"-\"" combined

Then you're going to want to double check your application logs to make sure you're not writing IP addresses to them either. I honestly haven't used a lot of the modern frameworks, so I can't easily say whether this happens by default or not. 

Yes, it really could be as easy as changing one line in one file. Note that doing so doesn't prevent you from using Google Analytics. DuckDuckGo doesn't use it, but I think dropping PII from your logs is a step in the right direction regardless of whether you additionally use external analytics software. (I still am able to use awstats to produce reports like this.)

Also note that even if you don't want to commit to this forever, you can still do it today and start logging sometime in the future when the need arises. You don't even have to change your privacy policy as you'll be doing something more private anyway.

If you have some form of accounts, it is obvious that you may necessarily store some PII. However, that doesn't mean you have to store any for the random Web surfer who hits your site.

Code icebergs

 
iceberg-756070.jpg

A lot of good products have features that appear somewhat trivial to replicate, but in reality would be quite complex to do so. I call these features code icebergs because they expose what a casual observer or competitor imagines is a weekend hackathon, but underneath there is a humongous mass of necessarily complicated code that makes everything work as seamlessly as it appears.

In my experience, the iceberg part of a code iceberg often involves handling of a lot edge cases. These edge cases are sometimes actually created by making the user interface simpler, e.g. less or free-form input fields.

At my current startup, DuckDuckGo, a good example is the seemingly straightforward task of taking Wikipedia and turning it into good Zero-click Info to display against queries.  At first blush it's trivial--I mean come on, the Wikipedia dumps output something called abstract.xml with a description of "extracted abstracts for Yahoo."

Yet when you get into it and start exposing it to real users, you surface all those edge cases. That dump in particular is actually completely unusable IMHO and I ended up discarding it within a few days of discovering it. It chokes on lots of things. 

Wikipedia has templates, disambiguation pages, initial warnings and infoboxes, redirects, malformed/complicated sentences, etc. etc., all of which you want to deal with if you don't want glaring errors. And then once you're in there, you might as well start capturing more good stuff like related topics, categories, the right images, good external links, etc. etc. And what about updating it in real time? It starts to really add up.

I like code icebergs. They're really a marvel to look at when you can see the whole picture. They also lure competitors in, who often get sunk (at least initially) not understanding the scope of the problem. They're good barriers to entry, fuel in build vs buy decisions, and the underpinnings of good UX.

Update: good comments on HN.

Productive Programming

 
BIG disclaimer: I'm not formally trained in computer science (aside from two classes at MIT in 2000) and I haven't worked closely with that many programmers or teams (maybe 10 or so).

I've gotten to be a lot more productive as a programmer since I started casually 15+ years ago and seriously about 10 years ago. By productive, I mean shipping something that actually works in as short a time as possible. 

Lately I've been thinking about the origin of all this productivity. I think it all comes down to doing a few things well. Say you want some code that does X.

  1. Tools. Often the quickest way to do X (especially if X is complicated) is to grab something off-the-shelf. My goto repository is CPAN, or even higher level some essentially (to me) black box Apache project. Once I reach the limitations of that tool, then I might jettison it, but when the goal is to ship quickly, nothing beats using existing code.

  2. Simple Algorithms. If I have to do it myself, I initially try to accomplish X in the simplest way possible by breaking it down into trivial steps. I know there is a famous programming quote that pertains to this process, but I can't find it right now--someone in the comments help me out please :)

    T
    his breakdown process has certainly become easier over time for me as more tools/data structures/design patterns/etc. pop into my head immediately just from having used them a lot. But the real key is to push off all complexity until later. Get something working, then add to it incrementally.

    Pushing off complexity is not as easy as it sounds. A lot of things that seem essential are really not. What I do is make a critical path walk through of what is absolutely needed to accomplish X. Anything that improves X but is not on that critical path, I write down on a list to revisit later.

    When you do the initial breakdown, it is quite possible that some of those (still non-trivial) steps can be accomplished by Tools, i.e. you can cobble together some stuff of your own with other people's stuff to get X working quickly. Just because one tool doesn't do X outright, doesn't mean you should do everything yourself.

  3. Debugging. I've seen people waste the most time here, literally aimlessly changing lines of code hoping it will solve their problems. When approaching debugging, I first find out where the problem is, i.e. the exact line of code or config file or whatever that is causing the problem. That should not take long, and before you ask anyone else about it you should know exactly where the problem lies.

    If it is my code and I've been working on it recently, I usually have a pretty good mental model for how it all works and can get to the relevant line almost immediately. If it is not my code, e.g. an open source tool or sysadmin thing, I usually make educated guesses and then essentially do a binary search printing variables out as I go until I see where it messed up.

  4. References. If you are stuck in either Algorithms or Debugging, it is likely people have talked about your problem on the Internet already, e.g. on HN, StackOverflow, a mailing list, tutorial, FAQ or in a book. Knowing how to use references is key to doing things quickly. I can't tell you how many people I've tried to help who needed to RTFM

    Before you ask anyone, you should look online. If you're really stuck, ask someone on IRC or post your own question to an online forum. Then work on something else while you give people time to respond. When you're really stuck, there is no reason to spin your wheels endlessly. I've also found sleeping on it often magically produces a solution in the morning.

Those are the main things that I think contribute to my programming productivity. Other than those, there are some largely idiosyncratic habits I've formed that certainly help me but I'm not sure how useful they would be to you:

  • I run code constantly, literally every change I make I re-run stuff to see if it improved/is working. This process makes it so I usually only have one bug at a time to deal with. It's part of how I work through that whole simple algorithm thing above.

  • I put # For debugging lines in everywhere when debugging and then I leave them there so when I come back I can easily uncomment them and get some decent debugging output for that particular component.

  • I try not to be clever and comment anything that looks complicated with a short explanation right by that piece. I used to try to be a lot more clever, but it never worked out well.

  • I'm not an early adopter of new tools/languages unless they seem to fit X exactly. I'm still using Perl with no framework :). But I have started using nginx, memcached, etc.

Update: more good comments on HN here, including this one from xentronium: "There were awful lot of tasks I tried to achieve via pure coding and rewriting / rewiring blocks here and there. All they actually needed was some analysis on paper." I have had the same experience.

Update2: this post is re-printed in the Software Developer's Journal.

Hack Hack Go

 

iostat.png

I want to make Duck Duck Go a better search engine for programmers like me. If you're a programmer, I'd appreciate your feedback and ideas.

Duck Duck Go is intended to be a general purpose search engine and that isn't going to change. Our user base certainly reflects this purpose, i.e. is quite varied on every metric I've tried to measure.

Yet there are certain search niches like casual research where Duck Duck Go really excels. I'd like programming to be one of those areas.

To that end, here's what I've got so far.

  • A general search engine. The good news here is I know a lot of programmers who use it as their primary search engine. It works and (at least some) people really like it. I'm always willing to add new features whose absence are preventing people from switching. Currently on that list are some maps and images.

  • Zero-click Info. There are red boxes above links on some searches with info you can get without clicking, i.e. on-site. We have a lot of info that is specific to programming topics. Of course we have Wikipedia, e.g. Dijkstra's algorithm. But I've also added software sources, i.e. github, freshmeat, download.com, versiontracker, and sourceforge.

  • Category pages. I've mined sources to create to useful topic lists for browsing/learning, e.g. Search Algorithms.

  • Disambiguation pages. I've created pages to help you isolate programming topics in common query terms, e.g. cookie links to HTTP cookie, which has results more geared toward that meaning. There are also programming specific disambiguation pages, e.g. nearest neighbor.

  • Crowd-sourced links. I also mine links from crowd-sources sites, e.g. coroutine.

  • Wikipedia paragraphs. I've deep-indexed Wikipedia at the paragraph level. You don't have to match a topic nearly exactly anymore to get some Zero-click Info, e.g. python switch statement. This is way more than a regular search index, as it is sub-section/section/title aware and uses some NLP for relevancy. I hope to make that matching algorithm even more sophisticated over time.

  • Bang. There are a few hundred !x shortcuts that can be used, e.g. !cpan Net::DNS

Here's what I'm thinking of doing.

  • O'Reilly Paragraphs. I think it would be awesome if I could index all O'Reilly books at the paragraph level, like I've done for Wikipedia. This content is well-written, encyclopedia-like, is largely in paragraph form, and has surrounding contextual information (section titles, etc.) that will make the relevance matching excellent. Problem is, I don't know anyone at O'Reilly. I think it's a win-win because it can link right to their Safari product or individual book pages. And I don't think it canabalise Safari because you're getting people in a very different context (when searching). Anyway, I thought I'd start by writing them an email. I did that and haven't heard back yet.

  • More topic sources. I'm going to add man/info pages, so you can type in a command and get a description. I could also do packages for distributions/languages in a similar manner if people think that would be useful to them. I've explored indexing these at the paragraph level, but the content doesn't seem to work well for that purpose. Other, more general sources, may be incidentally useful to programmers like Amazon product descriptions. I'd love your thoughts here.

  • Bang documentation. The current bang commands aren't documented. I'll document them as well as add more that are useful to programmers. Any you want?

  • Zero-click Info by IM. I'm thinking of making a chatbot that will respond to you via IM with Zero-click Info (and links). So you send it a search query and we'll send you back a description along with a few links. Would you use that?

  • API integration. I wrote the Perl binding for Wolfram Alpha. I'm exploring ways to use it to integrate good WA content. I'm open to using other APIs, but I'd strongly prefer to get dumps instead so I can ensure speed. Another one I'd like to integrate for programmers is ErrorHelp.com (previously bug.gd).

That's where I'm at right now. If you're a programmer, my questions for you are:

  1. Do you find the above compelling?

  2. Do you have any particular feedback/ideas?
Feel free to comment below, on HN, on reddit, or email me directly.

Things about Web Images I Just Learned

 
I thought I knew everything you needed to know about Web images.  But, of course, I didn't. Here's what I just learned when launching the new icon bar on the homepage of Duck Duck Go. We wanted the it to function sort of like the Apple dashboard (and on the Web like Schmedley's bottom bar).

  • img{-ms-interpolation-mode:bicubic}. Short version: if you resize images dynamically, they will look bad on IE unless you put this in your CSS.

    Longer version:  We ended up using the YUI Animation Library to do the animation.  But no matter how we did it using 1 image, it always looked terrible on IE.  Even if we used an image exactly as big as the big size, and did the smaller image exactly half of the bigger size (which should be easy to resize), it still looked bad.

    So then we tried using two images, which sort-of worked, but had its own issues.  Sometimes it would slow down the animation. It used almost double the image size and requests (a big no-no), and the actual resizing still looked bad (as opposed to the endpoints)!

    This was unacceptable, so I decided to dig deeper on the Web about this issue.  It turns out modern browsers use Bicubic interpolation to resize images and make them look good in the process. For whatever reason, IE7+ has decided to turn it off by default. I'm guessing this is because it takes some processing power, but it renders resized images looking terrible so I personally don't think this is a good trade off.  Anyway, if you add that above CSS to your page, IE7+ will use this method and your images will look good. I suppose I never hit this before because usually you shouldn't be resizing images dynamically. But there are cases where you want to do it...

    Unfortunately, it still doesn't work for IE6, on which you need to use the good ol' AlphaImageLoader (sizingMethod='scale') if you want to support that browser.

  • Photoshop/Illustrator's 'Save for Web...' does not fully optimize. Perhaps my versions of Photoshop and Illustrator are too old, but I suspect this is still the case with the newer versions. I pretty much used these blind, assuming they were optimizing correctly. And don't get me wrong, it does a decent job, but its just not the best. Instead, run your images through Yahoo!'s smush.it site.

  • If you really do not need PNG-24, use PNG-8. PNG-8 is really a better GIF. But it is limited in color palette and transparency with respect to PNG-24. That being said, often you don't need the difference, especially for things like icons. When you can, use PNG-8 because you'll get much smaller file sizes.

    That being said, you might think you need PNG-24 when you really don't. I did. I had these icons made that had full transparency. I knew, however, they were going to be on a white background, so I really didn't need all the transparency. Yet when I tried to save it as PNG-8, it just looked bad. The colors were all off. So it made me think that I needed PNG-24, but in reality it was Photoshop's optimization stuff that was being poor. In their defense, I wasn't helping them out by setting the white background ahead of time, which leads me to:

  • If you want to save a PNG-24 image as PNG-8, put in the background first. Once I made a white background layer, Photoshop then did a great job of saving it as PNG-8. And in fact, I could reduced the file size even more by using even less than 256 colors. Of course, I still had to run it through smush.it.

  • CSS sprites may reduce your page load (and image size further). CSS sprites are a way to group your images into one big file and then split them into separate files via CSS. There is a useful Web site to help you make them at csssprites.com. I couldn't figure out how to use it with my resizing requirements, but in the general case it should be at least tried, especially for icons where the color palette for your icons are similar. You get a win in image size. But you get a bigger win in reducing HTTP requests.

  • Custom icons are not that expensive. We got $40 custom icons and $10 recolored icons from iconshock.com. We talked to other icon designer firms as well, and prices were similar. Full disclosure: we created more than 3 icons (7), so we got a bit of a bulk discount. I did have a bad experience with iconeden.com, however. So I'd stay away from them.
For more image optimization tips, check out Yahoo!'s presentation.

Update: additional comments can be found here.

A Harsh CSS Environment for Testing Widgets

 
Embedded widgets can face harsh CSS environments, but usually not this harsh:

#harsh * {
border: thin dotted #00FF00 !important;
display: block !important;
margin: 20 !important;
outline: 1px dotted red !important;
padding: 20 !important;

background: #00ff00 !important;
cursor: move !important;

clear: both !important;
float: left !important;
height: 0 !important;
max-height: 0 !important;
max-width: 0 !important;
min-height: 100px !important;
min-width: 100px !important;
visibility: hidden !important;
width: 0 !important;

bottom: 100px !important;
clip: rect(100px, 50px, 100px, 50px) !important;
left: 100px !important;
overflow: visible !important;
position: absolute !important;
right: 100px !important;
top: 100px !important;
vertical-align: sub !important;
z-index: 100 !important;

color: red !important;
direction: rtl !important;
font: oblique small-caps 900 20px/50px arial !important;
font-size-adjust: .01 !important;
font-stretch: ultra-expanded !important;
letter-spacing: 20px !important;
list-style: decimal inside !important;
text-align: right !important;
text-decoration: blink !important;
text-indent: 100px !important;
text-shadow: #000 30px !important;
text-transform: uppercase !important;
unicode-bidi: embed;
white-space: pre !important;
word-spacing: 20px !important;

border-collapse: separate !important;
border-spacing: 30px !important;
caption-side: bottom !important;
empty-cells: show !important;
table-layout: fixed !important;
}

If your widget looks OK inside <div id="harsh"></div>, then it will probably look OK anywhere.  I made this HTML example (view source) for easy testing.

Why does this matter? Suppose a site has a black background and white text, but your widget has a white background but no text color set--none of your text would show.

To deal with a harsh environment, you need some armor:

<style type="text/css">
#armor, #armor * {
border: none !important;
display: block !important;
margin: 0 !important;
outline: none !important;
padding: 0 !important;

background: #fff !important;
cursor: auto !important;

clear: none !important;
float: none !important;
height: auto !important;
max-height: none !important;
max-width: none !important;
min-height: 0 !important;
min-width: 0 !important;
visibility: visible !important;
width: auto !important;

bottom: auto !important;
clip: auto !important;
left: auto !important;
overflow: auto !important;
position: relative !important;
right: auto !important;
top: auto !important;
vertical-align: top !important;
z-index: 1 !important;

color: #000 !important;
direction: ltr !important;
font: normal normal normal 11px/14px tahoma,sans-serif !important;
font-size-adjust: none !important;
font-stretch: normal !important;
letter-spacing: normal !important;
list-style: none !important;
text-align: left !important;
text-decoration: none !important;
text-indent: 0 !important;
text-shadow: none !important;
text-transform: none !important;
unicode-bidi: normal;
white-space: normal !important;
word-spacing: normal !important;

border-collapse: collapse !important;
border-spacing: 0 !important;
caption-side: left !important;
empty-cells: hide !important;
table-layout: auto !important;
}

If you wrap your widget in <div id="armor"></div>, it should work OK. I made another HTML example (view source) for testing this armor.

I tested #armor cross browser using my test systems and browsershots.org. Of course, there are most likely still bugs, so please tell me about them!

To develop #harsh, I used the w3schools CSS Reference, which you can also use to figure out if you want to change the properties in #armor, or apply more thereafter. 

To apply additional styling after #armor, use ids instead of classes, e.g. id=""/# and not class=""/. because a particularly harsh use of #id * will override your classes. Of course, if you aren't that paranoid, you could back off the * in #armor and use classes instead.

You could also just use inline styling, i.e. style="". There may also be a better way to do it that I just haven't thought of yet. If you know of one, do tell...

Speeding up Perl Regular Expressions using Regexp::List

 
I spent the last 24 hours optimizing the Web crawler for the Parked Domains Project.  The previous bottleneck was obviously CPU.  After a bunch of profiling and benchmarking, I determined that a particular block of Perl regexp was causing most of the problem.

I was already compiling what I could (using /o and qr//).  I was also already trying to run things I thought would match more and faster first, as well as trying to anchor as much as possible (i.e. using /^ and $/ and just using long literal strings).  And I always use clustering (?: instead of capturing (, where appropriate.

What I didn't do, however, was mess with alternations, e.g. cat|dog|bird.  Disclaimer: there isn't a be all and end all to regexp optimizations, and what works in one situation may not work for another--it totally depends on your regexp and what you are throwing at it.  

Alternation is usually slow in Perl because the engine has to backtrack when trying each alternative.  It's much faster to give perl a character sieve up front, e.g. (?=cdb) and then factor out common prefixes and suffixes.  The problem is that when you have a ton of alternatives, doing all this is a pain and it decreases readability to almost zero.  Which is why I had avoided it to date...

Enter Regexp::List.  I've used this module before, but never as extensively and I never benchmarked it either.  It does all of this stuff automatically.  Not only did my regexp speed increase by about 5x, but my readability increased as well!  

I really didn't think that such a simple change would make such a difference.  The reason for the readability increase, btw, is that I now put all the alternatives in an array and then give that to the module, e.g.:

my @regexp = (
  'cat',
  'dog',
  'bird',
 );

use Regexp::List;
my $regexp  = Regexp::List->new;
my $qr = $regexp->set(modifiers=>'i')->list2re(@regexp);


About Me

RSS.