Capitalizing on CPU Cycles with Condor Clustering

Many of you have probably heard of SETI@Home or Folding@Home, two projects aimed at using the wasted CPU cycles on millions of machines to fulfill their massive data processing and computational needs. SETI@Home searches radio telescope data for signs of intelligent extraterrestrial life, and Folding@Home does protein folding for medical research. Both projects give out software to interested people on the internet. These clients monitor the activity on the computer, and do their project work when the computer's owner is not using the machine. Using the donated downtime on thousands or millions of desktops, these projects are able to achieve computational feats they could otherwise never dream to accomplish.

What if I told you that you could do the same type of thing with your business? Well, you can with a freely available clustering solution called "Condor."

What Condor Is

Condor is a free program produced by the University of Wisconsin that installs on your Windows, Linux, or Macintosh desktop and server machines. It transforms them and their billions of unused CPU cycles into your very own centrally managed "High Throughput Computing" platform where you can submit your programs for processing. It manages how to find a free host to run the job, and lets you know when the job is finished. It can even stop a job in its tracks and move it to another machine if the current one becomes busy.

By using all of your company's desktops as a production cluster, you not only get a better Return on Investment on all those desktops, you also reduce the need for as many expensive servers and all the power consumption, air conditioning, and maintenance that goes along with them. Because Condor only assigns jobs to machines that are currently in service, it doesn't matter if someone shuts down their desktop for the night, or if the desktop is broken. Condor will simply find somewhere else to run those jobs. And remember -- there's nothing stopping you from making those big expensive servers into nodes on the cluster as well. Condor simply allows you to extend your processing capabilities beyond those few servers.

What Condor Isn't

Condor, at its heart, is an extremely powerful job distribution manager -- something like Autosys on steroids. As such, it works primarily at the job submission level. You can't install Condor and expect it to give you something that acts like a single huge machine with 300 CPUs and a few terabytes of memory. Condor does not allow you to take a single resource-hungry application and split it up over multiple machines. What Condor can do is take your nightly batch run and give you more cycles to get the tens or hundreds of individual jobs done faster.

What are the implications of this?

Well, first off, you can't take a resource-intensive monolithic application and expect Condor to magically make it run faster. That monolithic application will still be assigned to a single host in the cluster, and will run only with the resources of that host. If your application can't be split up into "jobs" that can be submitted to the cluster, then Condor is the wrong model for you. That means that programs like your Oracle database will still need big servers to get their jobs done. However, you may be able to re-architect your stream to use many smaller databases or flat files to eliminate the need for single monolithic database server.

Another implication of using Condor is one that comes any time you increase parallelism: the bottleneck can easily become your database server or other shared resources. Which resource depends on the type of work load and system architecture, but things like network, databases, or file servers may suddenly have a higher load put on them due to the greater concurrency in your processing. Be sure to take these resources into consideration when you consider Condor and dynamic job distribution for your project.

Some Examples

Okay, so what does this all boil down to? What types of workloads would benefit from Condor, and what wouldn't?

Well, things like graphics rendering, protein folding, or any other job where the data can be broken up and processed in parallel instances of the program. Think account-based transaction processing, reporting, or invoice generation. Think nightly batch cycles.

What wouldn't benefit? Anything that takes up massive resources on a server-class machine will typically need to stay on a server-class machine. Anything that must be at a known location for access by outside clients. Anything that conceptually runs forever (i.e. a service that stays up servicing requests, as opposed to a program that runs to completion and ends.) That means software like database servers, web servers, and DNS servers would not be good candidates for Condor-style clustering.

Contact us

If you think Condor clustering might be right for you, please don't hesitate to contact us for a free consultation. Of course, Condor is free software, so you can try it out for yourself by downloading it from the University of Wisconsin's project page at http://www.cs.wisc.edu/condor/. Enjoy!