PBS "Service Units" Proposal for ITC's Condo Clusters Katherine Holcomb July 2006 The major complaint most users of purchased nodes have had is that their jobs often wait a long time before starting to run. Even though they get in somewhat faster than other users, most do not recognize this and in any case, what they want is not nodes per se, but rapid turnaround. Basically, what they want is time that they can use more or less on demand. I propose a method that I believe would alleviate this problem without too much complication. It requires using PBS accounting and multiple (at least two) queues, but it should be fairly straightforward to implement. There is at least one PBS accounting package (MyPBS) that has the features of crediting and debiting accounts. It comes from the University of Maine and is free software. The unit is the Service Unit (SU), which is the unit used at the national supercomputing centers. Information about this package is at http://my-pbs.sourceforge.net/ and it appears to have more features than are actually required for my simple scheme, but it would certainly support it. In my scheme, users would purchase service units (SU). (They could get physical nodes at the end of three years if they wanted, but that's not essential.) One SU would correspond to one cpu-hour. Users with these accounts would be allowed to access a queue with considerably higher priority and, probably, a longer time limit. Users who do not purchase units would be placed into a pool, probably with a relatively short time limit in order that priority jobs would be able to get in. I don't know whether this pool would have to represent an account on its own, but it doesn't matter, as long as particular groups can access the "fast" queue. MyPBS can reject jobs that are not permitted to use the "fast" queue, so non-paying users would not be able to "steal" time. However, no user would be required to purchase time in order to use the clusters. Also, since the medium of exchange is money, no projects would be reviewed and no subjective allocation would be performed. Some concrete examples might make this clearer. Suppose we have a cluster with 100 dual nodes and an expected lifetime of 3 years. This cluster would have a total of 5,256,000 cpu-hours available over its lifetime (if I did my calculations correctly, and this ignores leap years). Further assume that we need to assume it's not available 100% of the time, so let's use a working figure of 4,750,000 cpu-hours (about 91% uptime). All of those cpu-hours would be placed into the "pool" as general service units (SU). (PBS does not allow a cluster to be overbooked, so it should not matter that we will resell some of those hours, or we could debit the "pool" as service units were purchased--I don't think it would matter too much either way.) Non-paying users would have access to a queue with a default time limit of 24 hours and a maximum of 48 hours. Users could purchase a "node" by buying 52,560 service units (24 hrs * 365 days * 3 years * 2 cpus). (The charge per service unit would be adjusted to get an appropriate amount for a node to take into account it's not up 100% of the time). Users who "purchase a node" would have access to the "fast" queue, which would have 2 or 3 times the priority of the "general pool" queue, it would be restricted to paying users, and would have a default time limit of 48 hours and a maximum of, perhaps, our current limit of 168 hours. This way the purchasers would feel they are getting something for their money, but users who are unable to pay would not be blocked. We could even have an "express" queue with a very high priority, but which might charge 2 SUs per cpu-hour. Users could thus burn their SUs any way they wanted. They could also use the general pool if they wanted to conserve their SUs for some other purpose. It would probably still be advisable to have a "testing" queue that was open to all, with a limit of 30 to 60 minutes, on four dedicated nodes.