Minutes of the ITC-Research Computing Management Meeting

September 26, 2005 at 10:30AM

 (in ITC-2015 Ivy Road, Room 220)

 

Members: Alice, Hamp, Jim, Terry, Tim S., Tom S., Tim T.

Attending: Alice, Hamp, Jim, Terry, Tim S., Tom S., Tim T.

Chair: Tim T. Recorder: Alice Howard

 

I. Corrections to minutes from last meeting on May 23, 2005?  No notes.

 

II. Ongoing Discussion Topics:

 

1.  Cluster projects:

·    “Condo” Cluster: (cedar.itc) 126 nodes, Dual CPU Opteron’s (1.8 Ghz) 2MB memory, 80GB hard drive. Will have its own disk storage array as new ones with birch and aspen. Will operate as 32-bit cluster for at least first year.

o         “Condo Cluster” program details preserved at:

http://www.itc.virginia.edu/research/itc-clusters/itc-linux-cluster-purchase-dec04.pdf

o        Data grid gateway server is not yet set up – but we can go ahead and set it up the same as on the test cluster --  and need not worry about any licenses issues.

o        All nodes of the cluster are powered on now – at any given time, it is likely that a few will be down.  The A/C work should start during the first week in October. 

o        Have decided to replace all the power distribution units (pdu) in the cluster – so will have to idle one frame (of 32 compute nodes) at a time – and schedule downtime for moving the head and storage nodes.  TimT will do advance notice for the downtime.

 

2.  Proposal for Linux cluster support as ‘for-fee’ service.

·    Will implement this service as a program with fees (based on an hourly rate) – and will also continue to do “odd jobs” on a time-and-materials basis.

·    Need guidelines for how to handle support for researchers who don't purchase cluster support but expect assistance and guidance from ITC for their cluster.

3.  Linux license server, "linux.license.virginia.edu".  OK.

 

4.  New “disk wedge” leasing rates.  Ability to get additional disk storage space is a CRITICAL issue for researchers.

·    Have 10 people/projects wanting an answer on this one.  Will finalize rates for a 3-tiered structure – needs some Jim time and then it goes through the process – will be offering disk in gigabyte chunks

 

5.  Updated timeline and priority order of all other projects (blue.unix upgrade, others?)

·    Blue.unix has been an unmitigated disaster so far  -- having trouble even powering on some units due to a problem with a firmware upgrade.  Have fixes for 2 software issues discovered on holmes under AIX 5.3.  Also there is a noticeable slow response while logging in to holmes which needs to be checked out.

 

6.  Acquiring monitoring tools, as discussed in March, especially MPI Link Checker:

·    Systems will order the MPI Link Checker.

 

7.  Dropping “umenu” on ITC Unix systems (e.g. blue.unix)

·    Still have lots of users using umenu – but not sure for what – could be going directly to unix – could be using the menu options.  Will start logging which commands are used.  It could take quite a bit of work to make this change – but it might be worth the effort.

 

III. New Business:

 

8.  IBM Deep Computing Symposium October 12-14, San Jose, CA.  Not clear that any staff should attend this one.

 

9.  Condo-Cluster II – kicking off next round of acquisition.  What to keep and what to change?

·    http://www.itc.virginia.edu/research/itc-clusters/cluster-purchase.html

·    Likely that we want to do quotes at the end of December – so need to publicize starting in October.  May want to consider dual core chips if the costs come down substantially – it is a timing issue.

·    We learned a lot from doing the first condo-cluster – especially about the mechanics of doing the purchase.

·    This new condo-cluster will replace Aspen – and we will take care of space (in the manned room in Carruthers), power, and A/C up front.  (Jim reports there is some new research on ways to manage the heat that are really interesting.)

 

10.  Should we have a checklist and process for cluster rollouts and upgrades? Like what’s tested, checked before release?  Things like on cedar’s rollout:  change from pico to nano; “cavalier” IPs blocked, libraries, esp. MKL libraries, NIS changes.

·    OK to have Katherine and Bill work on this.

11.  Birch maintenance contract expires 12/23/05 – will renew.

 

12.  Questions/requests from CS faculty (CS414 and another course) for Linux instructional cluster/workstations or a unix system.

·    One course is looking for places where students can install a linux overlay (for Nachos?) in their home directories – we are helping them get through it this semester.

·    But – in future – should we be offering a linux lab instead of what we have?

 

13.  For future discussion:  is there a future for Irix/SGIs?  All the applications that we know of now run on other systems (e.g. linux).  ACHS has had the most Irix/SGI users and they do not really see the need to continue supporting this platform.

 

14.  Also for future discussion:  what or what more can we be doing towards recruitment of 10 NAS-level faculty?

 

Next scheduled meeting of whole Standing Committee group:  Monday, October 31st, 10:30 AM in ITC-2015 Ivy Road, Room #102.

 

Next scheduled meeting of ITC ResComp Management team:  Monday, Nov 28th, 10:30AM in ITC-2015 Ivy Road, Room #220.