Minutes for the
ITC-Research Computing Management Meeting
September 30, 2006 at
10:30AM
(2015 Ivy Rd, 2nd
floor Conference Room)
Members: Alice, Hamp, Jim, Joe S., Mark, Robin, Terry, Tim S., Tom S., Tim T.
Attending: Alice, Jim, Joe S., Mark, Tang, Terry, Tim S., Tom S., Tim T.
Chair: Tim T. Recorder: Alice Howard
I. Corrections to minutes from last meeting on August 28, 2006? No.
II.
Ongoing Discussion Topics:
· “Condo” Cluster #2 – aka Dogwood:
o Update on configuration and rollout per Jim’s email of September 17: All the nodes are up and Katherine and Ed are load testing An announcement will go out this week that Dogwood will be available for general research use starting next week.
o Any word from Mitch or SEAS faculty that wanted nodes or Infiniband cluster? Have not heard anything.
o Waiting until the HS faculty gets here to find out if there is interest in purchasing 8-16 of the existing nodes.
o Steven de Wakker (EnviSci faculty) is here and still interested in buying 4-8 nodes – he has ETF funds for this year but not sure how much yet.
· Cedar Cluster:
o OS update schedule of October 10-12 still good. Cedar users will be able to use Dogwood cluster during the down time.
o Avaki gateway setup – new software version released yet? Not heard anything yet.
o Can we set timeline for cedar becoming a 64-bit cluster? Discuss this with Bill P. once Dogwood is up. Winter Break would be best opportunity; otherwise we’re into summer and needing to setup next new cluster. Getting growing number of applications and users for it. Will need to set up test 64-bit nodes and see.
· Aspen & Birch Clusters: (schedule of OS update to Birch will be based on dogwood rollout)
o Aspen retirement will be ASAP after the Cedar update (Oct 10-12) – will announce Aspen retirement for October 16. Aspen users will be able to migrate to the Dogwood cluster.
o OS update to Birch October 17-19?
o Other issues/concerns?
· Oak Cluster: Did the round-robin login get enabled on July 5? (Tim T. will check with Chip.)
· Teal Cluster: Any issues or concerns?
· Andrew Grimshaw’s request for Linux front-end to campus grid project – still waiting on next version of AVAKI.
o However Marty Humphrey has almost finished a grid Windows client for GLOBUS that does direct file transfers and Katherine is experimenting with it.
o Hope to have our clusters supporting GLOBUS gatekeeper from the start for Marty’s client.
· Proposal for Linux cluster support as ‘for-fee’ service. At December meeting, Hamp said he’d translate current hourly Unix support rates into table/information showing cost for supporting cluster of X size for Y years.
· Update on conversion to authenticate to Eservices via Kerberos for Unix logins/connections to blue.unix, HSM, & Longtmp?
o This is in the second phase of the NetBadge CDP.
o This transition will involve a major user education issue – especially for blue.unix users.
o No timeline for this project yet – it has to wait until the password synchronization web site is up.
o We still have some remaining places/systems where the only option is to authenticate with a clear text password (e.g. ftp to blue)
· Mitch Rosen’s IT – Infrastructure Supporting Research Task Force. (https://www1.seas.virginia.edu/itrtf). (Maillist is itrtf@virginia.edu) Final draft is circulating for input. Katherine is doing the final editing. Ed did most of the appendices that detail comparisons to other institutions.
o CI-Team NSF proposal funded! Had first team meeting last Friday (9/22/06).
o UVa’s participation in ORNL’s response to NSF RFP for next generation/iteration of Supercomputing Centers: our partnership’s preliminary proposal was approved; the full proposal is due on 2/2/07.
· “Priority” access for participants in condo clusters: Katherine’s proposal for PBS “Service Units.”
o Discussed other alternatives. Need more information.
· Update on CSS Research Computing support staff and services relocating to Brown Science & Engineering Library and Alderman Library fourth floor. Things going well.
o Grand Opening/ Open House in Scholars’ Lab on October 20, 3-5pm. Karin and James are invited speakers.
o Details about the move and space available at: http://www.itc.virginia.edu/research/rcsc-moves.html
· Mitch Rosen & John Hawley’s concerns about ITC Research Computing support degradation vis-à-vis move to Brown Library.
o Meeting with them went well. No further word of follow-up to date.
· Provost’s “10 Year Plan for BOV” – The Library and IT section includes recommendations for a Center for Digital Humanities and a Center for Advanced Computational Support.
III. New Business
· Jim and Hamp are discussing ways to make Longtmp bigger by using the NETAPP for it and the HSM.
· Jim gave an update on new HSM coming on line. Current one has couple terabyte capacity, new one is 50 TB, though not all of this is for research storage space. Currently planning on dedicating about 5 TB to research storage and will migrate current data to the new one.
· Jim gave some background about the power outage problems in the Carruthers machine room on Sept 1. The problems are being analyzed and seem to involve unanticipated interactions between the generator (25 years old) and the UPS (10 years old) and which were not designed together. For the short-term, if we should have another power outage, the Dogwood cluster will be shut down immediately to reduce power needs. See email he sent about it below.
Next Scheduled Meeting of ITC ResComp Management team: Monday, October 30, 2006, 10:30 AM in ITC-2015 Ivy Road, Room #220.
Please send suggestions, additions, corrections to: Tim
Tolson or Alice Howard
Date: Sun, 17 Sep 2006 11:52:23
From: "James A. Jokl" <jaj@virginia.edu>
To: itc-rescomp@virginia.edu
Subject: Carruthers Hall Power
Everyone,
It looks like there are not likely to be any quick fixes for the
power problems in Carruthers Hall. While the equipment ratings are
more than sufficient for the load we have in Carruthers now (even
with Dogwood powered on there should have still been some
additional capacity available), the generator can not reliably
handle the current load with Dogwood powered on. Several
possibilities are being investigated. A likely problem scenario
involves interactions between our generator and the main UPS unit.
All of the power equipment in Carruthers Hall very old.
We are ready to restore power to Dogwood on Monday. The
operators have procedures in place to immediately power Dogwood
down (via the circuit breakers) as soon as we cut over to
generator. They will also be turning off Dogwood's air
conditioning and some other air conditioning units at the same
time. We'll be working over the next several weeks to automate the
shutdown of the air conditioning and Dogwood as soon as we cut over
to generator.
As part of all of this work we need to do the migration of Aspen
users to Dogwood as quickly as possible. We really need Aspen's
power back for other systems. I'm assuming that a Linux cluster to
Linux cluster migration can happen very quickly. If anyone knows
of a reason why we can't make this happen quickly, please email.
Note that it is not clear that our research computing support
will take the power hit long term. However, it is something that
we can easily control power-wise now so it is at least the initial
victim. We'll need to have discussions with other service owners
and then decide on a longer-term strategy (assuming that there is
no low-cost cure for the power problems).
Jim

Go to: ITC Research
Computing Committee Home Page