Notes for the
ITC-Research Computing Standing Committee Meeting
May 11, 2009 at 10:45 AM
2015 Ivy Road, 1st Floor Conference Room
Members: Alice, Andrew, Hamp,
Jim, Joe S., Mark, Mike, Robin, Terry, Tim S., Tom and members of Systems,
Research Computing and ACHS as desired
Attending: Alice, Andrew,
Hamp, Jim, Joe S., Mark, Robin, Terry, Tim S., Tom, Katherine Holcomb, Ed Hall,
Kathy Gerber, Dale Castle
Chair: Tom
Recorder: Alice
I. Corrections to minutes
from last meeting on April 13, 2009? (Click to review the agenda
from this meeting)
II. Old Business:
- Update on UVACSE APB and related. Queuing
policy / documentation
- New cluster node installation
- Elder will be down (from 5/12) for the ROCKS5
upgrade and the addition of new nodes to the cluster – and back up
by 5/15. Assuming all goes
well, Cedar and Dogwood will get the same treatment the following week.
- Moving Research Computing pages to UVACSE; web
page maintenance; other ÒtransitionÓ questions?
- Even though Andrew Steele is gone, there are
some possibilities for getting some web page maintenance
accomplished: identify the
deadwood in the Research Computing pages, and maybe ITCÕs Customer
Communications group would do the pruning; Andrew G. could hire some
students for a ÔprojectÕ if we can settle on what needs to be done. Research computing support web
pages are spread over ITCÕs web (for Ôthe basicsÕ), the Library (Scholars
lab, Research computing lab), and UVACSE. Tom is looking at multiple-site search options from
Google, and could talk to Tom Skalak about a ÔhomeÕ for all this
information.
- Nothing has happened with respect to determining
which Research packages are Ôbasic researchÕ and which are ÔHPCÕ. Need to look at ITCÕs research
software downloads and see if the contact info and links to document
locations are correct.
- Storage needs assessment and strategy (e.g.,
Lustre?)
- CS Department experienced an electric outage
that brought Lustre down – but Lustre coped as the nodes came back
up again – so the experience was good.
- So what do we want to do? Hamp and his group could talk to
Scott about the CS experience – and then we could try it out. There was some discussion about
usersÕ exceeding capacity and the impact of taking nodes up/down –
CS uses resource-based routing and users have to specify how much memory
they require. Hamp suggested
that we tie all our storage nodes together before trying this on our
compute nodes.
- With memory so cheap now, Andrew suggested that
we buy 2 GB per node – Hamp will price memory and we will look at
it. UVACSE could add a drive
to each node and we could add them in one at a time. We could do a better Lustre
experiment on more nodes.
- PBS Renewal (pricing?)
- The last time Katherine checked, the renewal
pricing was the same as last time (~$11,000) – Katherine will go
back and get a quote. So,
should we continue to fund this for another year? And/or try to find an alternative
(Sun Grid Engine/SGE, Torque/Maui)?
We need resource-based routing and pre-emption – it would be
nice if it had some ÔaccountingÕ.
LetÕs look at and experiment with SGE. If we are going to switch, need to figure this out
soon.
- Cray benchmarking
- Cray is up and running in CS – can be
experimented with if you have a CS account.
- Service Unit allocation
- Andrew, Mike, and Murielle meeting about this
next week.
III. New Business:
- Funding for future clusters
- As we get feelers from some
researchers/prospective-new-faculty and engage with departments about
continuing to add Ôcondo clusterÕ nodes, what kind of options could there
be with respect to form-of-payment, trade-offs for contributions to a
shared pool of resources, hardware (e.g. not one-of-a-kind), service
contracts, physical location and physical access, etc, etc.
- Terabyte storage
- MPICH / OpenMPI
- Facilitator for June 8 meeting
- Katherine volunteered to facilitate
- For that meeting we will include:
- HampÕs report on SGE
- Experiment with Lustre on 4-8 drives
- Any other Items?
- Andrew mentioned a new NSF ARI (Academic
Research Initiative) program to use Stimulus $$ for research
infrastructure at universities (e.g. networking, power/generators,
cooling, etc.) – have to make a science case that the
infrastructure is in support of science research.
IV. Adjourn by 11:45 and next
meeting
- Next Scheduled Meeting of ITC ResComp Standing
Committee: June 8, 2009 in ITC-2015 Ivy Road, Room #102 (First Floor
Conference Room)
Please send suggestions, additions, corrections to: Tom or Alice