The first piece of software I posted to the Software section is a new PERL library I wrote to make working with PERL's IThreads easier.
PERL IThreads make the management of interpreter threads easy to manage, but it's very general. The MultiThread.pm library builds on this generic threading framework to make common concurrency paradigms a little easier. While there are several concurrency models, MultiThread currently implements two: the Worker Pool, and the Pipeline.
The Worker Pool library takes a user-defined subroutine and starts several copies of it. The programmer then places work requests onto a single request pool. These requests are fed into the designated subroutine as quickly as possible. Because each copy of the subroutine is running in its own thread, each worker will run on its own CPU in order to take advantage of today's multi-core machines. As each request is serviced, the results are put onto a response queue to be read by the requester.
Pipelines use threads in a slightly different way. Where Worker Pools run several threads calling the same subroutine, Pipelines use one thread per step in the process. This is useful if you have a problem that has several major components to the processing, where one step might complete faster than others.
In order to move all steps along as quickly as possible, each step is run in its own thread. As it finishes, the results are passed on to the next step in another thread and the just-finished thread picks up a new work request. For example:
Using a Pipeline, each of the above steps would run in its own execution thread. In our example, Step 1 loads a row into a database. As soon as the row is loaded, a request is sent to Step 2 to run the stored procedure for that row. For the first row, Step 2 kicks off immediately, and Step 1 goes back to loading another row.
If Step 1 can load 50 rows in the time it takes Step 2 to run a single procedure call, then Step 2 will have 50 new requests sitting on its queue by the time it finishes up what it was doing. This is courtesy of the fact that Step 1 did not wait for Step 2 to complete its run before loading new rows. Step 1 will finish its work long before Step 2 does, but that just means that step 2 will start and complete each stored procedure call as soon as possible.
As Step 2 completes, its responses are fed into Step 3, which marks the final status and returns a response to the user via the response queue.
The chief difference between a Pipeline and a Worker Pool is that a Pipeline will never be doing the same thing in parallel. In other words, only a single Step 1 will ever be running at one time. Only a single Step 2 will ever be running at 1 time. However, Step 1 and Step 2 will almost certainly be running in parallel with each other. By contrast, Worker Pools could be doing any part of their processing in parallel, so the logic and required resources need to be absolutely independent.
Using a pipeline especially helps in cases where there would be resource contention in a Worker Pool model. Since each Pipeline thread is doing something different, it would be wise to separate your resource usage into different threads. That way the contention is totally eliminated. Managing this in Worker Pools would be much more difficult.
Worker Pools are easy to set up, intuitive to work with, and can dramatically speed up the processing of independent work items. Pipelines take a little more thought in the design phase, but can yield superior results in cases where resource contention is an issue.
In the end, which one you use depends on your application profile. Most of the time, the decision isn't a matter of right-or-wrong, but it would still be wise to thoroughly analyze your concurrency requirement, and maybe even try both models out under real-world loads.