Minutes of the ITC-Research Computing Group Meeting
June 28, 1999 at 10 AM in Astronomy 117

Members: Dawn, Dee, Ed, Hamp, Jim J., Mark S., Robin, Sue Ellen, Tim S., Tim T., Tom S., Stan
Conveners: Ed Hall
In Attendance: Dawn, Dee, Ed, Hamp, Robin, Tim S., Stan, Tom S., Mark S., and our special guest attendee Andrew Grimshaw.
Recorder: Dawn.
This webpage by Dawn.

Agenda for the meeting. The main item on the agenda of this meeting is a proposal by Andrew Grimshaw regarding PBS and the Centurian cluster.

Background: As part of his investigation into PBS, Ed discovered that Andrew Grimshaw (CS) had installed the package on the Centurion cluster. Ed contacted Andrew, discussed with Andrew our ideas on installing PBS on the new orange and teal clusters, which lead Andrew to make a proposal to our group.....

Proposal by Andrew Grimshaw
Andrew started the meeting with a quick slide show on the Centurion cluster. The basic message - he has a large cluster (256 machines) that is very fast. Currently PBS is running on 64 alpha nodes and there are plans to expand to 64 intel nodes (the Centurion cluster is composed of alphas and intel boxes, all running linux).

Andrew would like to move the Centurion cluster into more of a production mode. Andrew's proposal is that ITC take over management of the PBS queuing system (PBS) and in return Andrew would "open up" the Centurion cluster to UVa researchers.

A question was asked about the availability of compilers on the cluster. In the past the alpha machines had used the gnu compilers - the fortran compiler was often unsatisfactory. Recently a beta version of the dec fortran compiler has been installed on the alphas - this has been a significant improvement. There has been some thought to replacing linux with digital unix on the alphas. On the intel boxes are installed the pgi fortran compilers.

The next question was: "What will be the user interface to the Centurion cluster". There are currently 2 ways to use the Centurion cluster, one is with the PBS system and second is via Legion. PBS would requires that users get an account on the Centurion cluster. If you go the legion route there are 2 ways that one would typically access the Centurion cluster. If you are compiling and running your own program, the only requirement would be that the user is on a machine where vanet is "running" and that the user run the Legion set up scripts. If the user requires software that has been compiled for Legion (ie Amber) the user would need to use Legion via an account on the Centurion cluster.

Hamp reminded us all of the problems we are encountering with linux and 32 bit ids. Hopefully, if the number of users migrated to the Centurion cluster is small - this shouldn't be too much of an issue (as Hamp has put aside some uids in anticipation of situations such as these). However, it is an issue that is still looming on the horizon.

Tim S asked for more info about our responsibility regarding the hardware and OS on these machines. Hardware should not be an issue - all machines are under 3 year warranties and Andrew's group will over see hardware issues. Additionally, Andrew does not forsee needing our help/assistance with software upgrades. However, because of the requirements of the Legion project - it is likely that Andrew will be more likely to install the latest patches, tweak the kernel etc. ITC will need to coordinate with his group regarding any changes he is making to the machines, downtimes etc.

At this point Andrew left - and we continued the meeting.

Parallel jobs on the sp2
We digressed a moment to discuss the deployment of parallel jobs on the newly configured sp2. It doesn't appear that - as of yet - too many users (in particular Engineering folks) are running parallel jobs. Tim S reiterated our need to "reach-out" and be pro-active in our recruitment of users for the parallel queues.

There was a brief discussion of what applications had the capacity to run in parallel (as a way to increase parallel usage of the sp2).
(Tom S) Dyna3d and Abaqus can be run in parallel.
(Ed) The new version of Mathematrica may offer a way to run in parallel. (Hamp) Its possible that ansys and GCG can run in parallel.

Discussion on Andrew's proposal.
The group viewed Andrew's proposal favorably. There are still many issues left to be resolved however, those include (but are not limited to): what exactly are the extent of our responsibilities, with whom (and how) will we coordinate downtimes etc, the number of users we will be able to migrate, etc.

For now, we feel that we need to gain a little more experience with PBS on our end (and on our clusters) before we commit to support on the Centurion. Ed will contact Andrew and let him know we are interested but that we feel that we need to gain more experience before we move forward.

Meeting adjourned at 11 am, next meeting at: July 26 at 10:00 AM =====================

<== Go to: ITC Research Computing Committee Home Page


by Dawn Adelsberger-Mangan