Self-managed background processing with Beanstalkd and Supervisor

At my present job, I have the opportunity to work with some interesting technologies that I haven't used before. One of them is Beanstalkd, fast and simple general-purpose work queue. It fits nicely with Supervisor process control system, and together they make a powerful solution for managing background jobs. Warm recommendations if you have a need for doing some background processing in your application!

For the purpose of putting jobs in the queue and communicating with Beanstalkd in general, there's no doubt that Pheanstalk PHP client is the best choice. But Pheanstalk doesn't solve the worker part of story, that is logic for processing queued jobs. And it shouldn't, because its responsibility is Beanstalkd protocol itself. It's up to the developer to come up with worker implementation that meets application needs. And that's where things can get serious.

Challenges

Textbook worker examples are fairly trivial and simple:

$queueClient = new Pheanstalk('127.0.0.1');

$queueClient->watch('mytube');

while (true) {
    $job = $queueClient->reserve(0);

    if ($job) {
        //process job

        $queueClient->delete($job);
    }
}

Still, there are some caveats and things that you should pay attention on.

Idling

Little script from the above does the job, but if not monitored, it can create a huge impact on the load of your server, and that's just what happened to me:

Beanstalkd process

This is nothing, because in few minutes CPU usage jumped to ~90%. Problem arises in situations when there are no jobs in the queue, and script starts continuously hammering dry beanstalkd with the reserve command. Solution is to block the worker whenever queue is empty:

if (!$job) { //empty queue?
    sleep(3);
    continue;
}

//... otherwise, process job

This simple fix prevents CPU from maxing out in this situation.

Overtired worker

In theory, workers are intended to operate continuously, but in practice that may lead to some unexpected and unusual issues. One of the examples that happened to me is hitting a MySQL 'wait_timeout' after my worker was running continuously for 8 hours, which is a default value of that MySQL configuration.

Workers, as those in real life, they have to take a break, so you need to find a way to restart them periodically. This brings us to the Supervisor, which allows for having a nice mechanism for managing workers - start, stop, restart, multiple instances of a same worker, etc. Here's an example of a Supervisor config file:

[program:notificationsWorker]
command=/usr/bin/php /home/ubuntu/app/bin/notifications_worker.php
process_name=%(program_name)s.%(process_num)s
numprocs=5
directory=/tmp
stdout_logfile=/var/log/supervisor/%(program_name)s.%(process_num)s.stdout.log
autostart=true
autorestart=true
user=ubuntu
exitcodes=0
stopsignal=KILL

What this setup does is that it starts 5 workers to run in parallel (numprocs=5) and it restarts them automatically if they exit (autorestart=true). For autorestart to have any effect, logic for exiting based on either time or memory usage limit should be implemented in the worker itself:

$startTime = time();

while (true) {
    //...

    if (time() - $startTime >= 60) { //time limit of 1 minute reached?
        die;
    }
}

The result is a fully automated, self-managed background processing implementation.

Final solution

After I've assembled all the pieces together, I created a reusable worker class that I'm sharing with you via this Gist:


beanstalkd supervisor background jobs