Minutes for the ITC-Research Computing Management Meeting

April 6, 2006 at 8:30AM

(2nd floor conference room, 2015 Ivy Rd)

 

 

Members: Alice, Hamp, Jim, Terry, Tim S., Tom S. Tim T.

 

Attending:  Alice, Hamp, Jim, Tim S., Tom S., Tim T.

Chair: Tim T. Recorder: Alice Howard

I.  Corrections to minutes from last management meeting on February 2, 2006?  Not reviewed.

 

II. Ongoing Discussion Topics:

 

·    “Condo” Cluster #2 – review of quotes:

o      http://www.itc.virginia.edu/research/itc-clusters/cluster-purchase.html

o      Specs for the compute nodes included:  dual processor, single core, AMD opteron 246 (2.0 GHz); equivalent Intel Xeon based on published SPEC Floating Point CPU2000 benchmarks; or faster – each with 3 GB RAM, one 36GB or larger ATA, SATA, SCSI, or SAS disk drive, Gigabit Ethernet, and 3 year warranty.

o      Quotes were received from IBM, HP, Sun, and Dell – with Dell offering the lowest quote at $1,170 per node.

o      So how many to order?  One big consideration is the amount of power and cooling that is (or will be) available in the Carruthers machine room –this cluster is likely to bring us right to the edge of the maximum amount available.  We also need to figure out how much all the other related equipment (switches, racks, etc.) is going to cost – and then figure out how much is left and available budget-wise for the nodes.

o      Should anyone want to buy into this cluster after our purchase, it looks like we will have nodes to sell.  Also other people at UVa might want to piggyback on this quote when ordering their own servers.

o      We will start off using 32 bit configuration.

·   Cedar Cluster:

o      Trying a test queue to see how it works – if it causes problems, etc.

·   Aspen Cluster:

o      Likely we’ll get rid of Aspen once the new cluster is up and running – might keep a few nodes to make a test cluster for the Research Support group to use.

·   Birch Cluster:

o      What to do with Birch as it ages?  How important is Myrinet? After some discussion, it was concluded that we should keep it running – and also we should try to assess if there is any actual value of Myrinet to users jobs.

·   What about jobs/users with BIG memory requirements?

o      Currently our IBM SMP AIX system  serves this need – but it is becoming out-dated and AIX is no longer an important factor (especially as jobs can move over to the new Blue now).  We could replace this with a small cluster with a large amount of memory per node (e.g. 16 GB/node) using Linux.

·   Grid computing – Ed back to working with Karpovich on grid computing in labs; Hamp needs to ping the AVAKI guy about the gateway.

 

The following topics from previous meetings/agendas were not discussed on 04/06/06:

·   Cedar Cluster:

o      At December meeting, Hamp said Steve would set up Grimshaw’s AVAKI gateway when cluster was down on 12/20 for maintenance.  Is it?

·   Aspen & Birch cluster:  Any issues or concerns?

o      Has intermittent power failure Birch nodes been isolated/rectified?

·   Alias/script for “passwd” command on all Linux clusters that gives message to users to change password on blue.unix has been done.  Thanks!

·   Proposal for Linux cluster support as ‘for-fee’ service.  At December meeting, Hamp said he’d translate current hourly Unix support rates into table/information showing cost for supporting cluster of X size for Y years.

·   New “disk wedge” leasing rates.  Have new disk rates discussed in December meeting been approved?

o      $29 per GB per year for first 10 GB, plus $15 per GB per year for the next 11-100 GB, and plus $13 per GB per year for anything > 100 GB.

·   Updated Timeline and priority order of all other projects (blue.unix upgrade, others?).

·   Acquiring monitoring tools as discussed in March, especially MPI Link Checker (65 to 128 nodes is $4250 less 20% = $3400). 

·   Andew Grimshaw’s request for Linux front-end to campus grid project – December meeting:  Hamp will follow-up on this.

·   Update on CSS Research Computing support staff and services relocating to Brown Science & Engineering Library and Alderman Library fourth floor in summer 2006. 

·   Any news on UVa’s participation in responding to NSF’s Request for Proposals for next generation/iteration of Supercomputing Centers?

 

·   Andrew Grimshaw has mentioned to both Ed & Katherine that he’s interested in having ITC take over Centurion 64-bit cluster.  Ed suggested he email Jim/TimS. and TimT. with this request.  Anyone heard from him?

o      No one has heard from him – but we’d be interested – should be similar to cedar except that it’s 64-bit.  We should ask Andrew if he’s interested – also need to find out where he wants it housed (currently it’s in Small).

·   Make cedar 64-bit?  We could, if there is a need or an advantage.

·   Mitch Rosen’s IT – Infrastructure Supporting Research Task Force.  (Maillist is itrtf@virginia.edu)

o      Andrew Grimshaw’s concerns that it should address needs of top 2-5% power users/researchers.

·   UCIT sub-groups:  RCS staff serving on each.  Nancy K. on “Teaching and Student Experience”, TimT. On “Social Sciences & Humanities” and Ed on “Science & Engineering”.

·   Katherine and Ed exploring way in PBS to have multiple queues and usage units (would need accounting turned on).  Cluster purchasers would get X units based on number of nodes purchased.  In general queue anyone can use it and doesn’t use any “units”. “Priority” queue requires 1 unit per cpu hour or node or whatever.  This system is similar to ones used by Supercomputing Centers.  Dr Reynolds asked about if we could implement this type system when he attended the CSS-RCS staff meeting in December.

o      Currently condo purchasers still do not have a good way to see if they are getting what they paid for – we need to do something to take care of this issue, but did not have time at this meeting to figure out what solution to pursue.

·   Should we undertake transition from clear text passwords to Eservices for Windows connections to HSM & Longtmp?  (Per emails from Steve L./Mark S. in late December?)

 

 

Next Scheduled Meeting of whole Standing Committee group: Monday, April 24, 2006, 10:30 AM in ITC-2015 Ivy Road, Room #102. 

 

Next Scheduled Meeting of ITC ResComp Management team: Monday, May 22, 2006, 10:30 AM in ITC-2015 Ivy Road, Room #220.