After optimizing my Web crawler for the Parked Domains Project, I started crawling so fast that the log process for my DNS server was eating up 20% of one of the CPUs (and wasting a lot of I/O as well).
I run a local dns cache using djbdns on each crawling server, which also runs all local dns queries. My dnscache is run by daemontools, and if you are familiar with this world, you already know I was using multilog for logging.
The logging for dnscache is basically useless when you are not debugging, and it is very extensive. So it makes perfect sense why it was taking up so much CPU and I/O.
I'm writing this post to correct the Internet. When searching for turning off logging for dnscache or multilog, you get a lot of instructions telling you to do replace your log/run file with this:
exec setuidgid daemon multilog -*
That will free up the I/O, but only about half of the CPU utilization (at least in my case). The problem is your system is still piping the log from the main process to multilog--multilog just isn't writing it anywhere.
What you really want to do is stop the logging at its source. To do so, don't mess with your log/run file at all. Instead, change your actual run file from
exec 2>&1
to
exec 1>/dev/null 2>&1
Now nothing will go to multilog at all. Of course, you need to restart or HUP the log and run processes
