Minutes of the
ITC-Research Computing Management Meeting
(in
Members: Alice, Hamp,
Jim, Terry, Tim S., Tom S., Tim T.
Attending:
Alice, Hamp, Jim, Terry, Tim S., Tom S., Tim T.
Chair: Tim T. Recorder: Alice Howard
II. Ongoing Discussion Topics:
1. Cluster
projects:
· “Condo” Cluster: (cedar.itc) 126 nodes, Dual CPU Opteron’s
(1.8 Ghz) 2MB memory, 80GB hard drive. Will have its own disk storage array as new ones with
birch and aspen. Will operate as 32-bit cluster for at least first year.
o
“Condo
Cluster” program details preserved at:
http://www.itc.virginia.edu/research/itc-clusters/itc-linux-cluster-purchase-dec04.pdf
o
Data grid gateway server is not yet set up
– but we can go ahead and set it up the same as on the test cluster -- and need not worry
about any licenses issues.
o
All nodes of the cluster are powered on now
– at any given time, it is likely that a few will be down. The A/C work should start during the first
week in October.
o
Have decided to replace all the power
distribution units (pdu) in the cluster – so will
have to idle one frame (of 32 compute nodes) at a time – and schedule downtime
for moving the head and storage nodes. TimT will do advance notice for the downtime.
2.
Proposal for Linux cluster support as ‘for-fee’ service.
· Will implement this service as a program with fees (based on an
hourly rate) – and will also continue to do “odd jobs” on a time-and-materials
basis.
· Need guidelines for how to handle support for researchers who
don't purchase cluster support but expect assistance and guidance from ITC for
their cluster.
3. Linux
license server, "linux.license.virginia.edu". OK.
4.
New “disk wedge” leasing rates.
Ability to get additional disk storage space is a CRITICAL issue for
researchers.
· Have 10 people/projects wanting an answer on this one. Will finalize rates for a 3-tiered structure
– needs some Jim time and then it goes through the process – will be offering
disk in gigabyte chunks
5.
Updated timeline and priority order of all other projects (blue.unix upgrade, others?)
· Blue.unix has been an unmitigated disaster so far -- having trouble even powering on
some units due to a problem with a firmware upgrade. Have fixes for 2 software issues discovered
on holmes under AIX
5.3. Also there is a noticeable slow
response while logging in to holmes
which needs to be checked out.
6.
Acquiring monitoring tools, as discussed in March, especially MPI Link
Checker:
· Systems will order the MPI Link Checker.
7. Dropping “umenu” on ITC Unix systems (e.g. blue.unix)
· Still have lots of users using umenu –
but not sure for what – could be going directly to unix – could be using the menu options. Will start logging which commands are
used. It could take quite a bit of work
to make this change – but it might be worth the effort.
III. New Business:
8. IBM Deep Computing Symposium October 12-14,
9. Condo-Cluster
II – kicking off next round of acquisition.
What to keep and what to change?
· http://www.itc.virginia.edu/research/itc-clusters/cluster-purchase.html
· Likely that we want to do quotes at the end of December – so
need to publicize starting in October.
May want to consider dual core chips if the costs come down
substantially – it is a timing issue.
· We learned a lot from doing the first condo-cluster – especially
about the mechanics of doing the purchase.
· This new condo-cluster will replace
10. Should
we have a checklist and process for cluster rollouts and upgrades? Like what’s
tested, checked before release? Things
like on cedar’s rollout: change from pico to nano; “cavalier” IPs blocked, libraries, esp. MKL libraries,
·
OK to have Katherine and Bill work on this.
11. Birch
maintenance contract expires
12. Questions/requests
from CS faculty (CS414 and another course) for Linux instructional
cluster/workstations or a unix
system.
·
One course is looking for places where
students can install a linux overlay (for Nachos?) in
their home directories – we are helping them get through it this semester.
·
But – in future – should we be offering a linux lab instead of what we have?
13. For future discussion:
is there a future for Irix/SGIs? All the applications that we know of now run
on other systems (e.g. linux). ACHS has had the most Irix/SGI
users and they do not really see the need to continue supporting this platform.
14.
Also for future discussion: what
or what more can we be doing towards recruitment of 10
NAS-level faculty?
Next scheduled meeting of whole Standing
Committee group: Monday, October 31st,
Next scheduled meeting of ITC ResComp Management team:
Monday, Nov 28th,