Minutes of the
ITC-Research Computing Standing Committee
(new
location: ITC-2015 Ivy Road, Room 102)
Meeting on
Members:
Alice, Bill, Dawn, Ed, Hamp, Kathy G., Katherine H., Jim, Mark S.,
Martha, Michael, Robin, Sue Ellen, Steve, Terry, Tim S., Joe Simard, Tom S.
Attending: Alice, Bill, Dawn, Ed, Hamp, Kathy
G., Katherine H., Jim,
Martha, Robin, Tim S., Joe Simard, Tom S.
·
Under II.3. Update on
II. Ongoing Discussion Topics:
1.
VPN and IP filtering for FlexLM license
daemons.
·
DONE – Hamp noted
that the software product ERDAS has a license manager that only runs on
Solaris, so it’s being put on “solaris.license.virginia.edu”.
·
Katherine reminded us that newest version
ANSYS has a license manager that only runs on 64-bit AIX license manager, so
for now we’ll continue to run its older license manager on aix.license.
2.
Update on delay due to Watson problems – may move back to crick?
·
Watson’s problems are solved.
·
Can
proceed with plans to split crick – put an SGI (crick) behind the VPN firewall
for HIPAA use/compliance – need to identify the user community that needs
access as some may need a token from us.
·
Also check if crick still has the license
manager for Sybyl – if so, need to move it.
3.
Teal cluster.
·
Can proceed with proposal: PBSPro
upgrade. Put 2 processors removed from
crick and put into one teal node; use Craylink to
hook them together. If other 4 CPUs are
compatible, link them as well. Define a
“teal-login” machine – block interactive login on all but teal-login.
4. DNS allocations in 10.x.x.x range
update:
·
Had a meeting and circulated 2 documents (a
general description of IP addresses, both public and private – and a
description of private address space only)
·
Going to publicize this through the LSPs and the Research Computing Newsletter.
·
Need to keep the documentation somewhere as
a “permanent” announcement and keep it restricted to on-grounds access
(128.143.x.x)
·
Unix Systems will need to re-allocate Aspen
& Birch cluster nodes addresses when they’re re-built next to be in
compliance with this new policy.
5. Status of getting a 64-bit frontend of the SMP (mp0.itc)?
·
Can have one with older technology (an H70,
2 processor) from our old dump machine since we are
getting a new dump machine from SEAS – needs to be scheduled.
6.
Timeline for installation and testing of Aspen & Birch clusters to
2.6 kernel? End of
July?
· No ROCKS yet.
· The next
· Hera @
RCSC could be upgraded to Fedora Core 2 and could serve as a test platform –
Fedora Core did come out.
· Upgrade to the Intel compiler 8.0 will be done at the time of
the OS upgrade.
· Next version of IMSL will be compiled with the Intel compiler,
version 7.0 (maybe 7.1) rather than the PGI compiler. Therefore we have to be sure to retain the rpms even if Intel doesn’t keep them available at their Web
site.
· We can retain the two 7.0 licenses as long as we wish and use
them concurrently with our 8.0 licenses, since the compilers are purchased
rather than “leased.” However, the Intel
representative was not able to change our 7.0 licenses to work on aix.license.
Therefore we will need to keep an Intel compiler license manager running
on jeeves.
7. Update on SEAS SUR grant clusters – Hamp, Ed, Katherine, and Tim met with Jeff Chisolm, Sean Whipkey and Mitch
on
· IBM engineer is doing another rebuild/reinstall – they have to
use RH Enterprise 3.
· Ed & Katherine are waiting on IBM engineer to finish
re-build and then can help Mitch with testing.
8.
Linux distribution discussion. Concern of Mitch and other researchers. Unilateral announcement in
early April.
· Will put this in the next newsletter for better publicity. We might be able to offer an update service
for
III. New Topics:
9. Unix systems is considering offering Linux Cluster support as a for-fee
service – this is a change from a previous statement – add this to the next
meeting agenda.
10. /longtemp is over-subscribed, can we get more disk space?
· Need to start nagging
oldest/largest users to move off – Hamp could do a
cleanup utility that makes it automatic.
· Getting some larger
decommissioned disk arrays from Alderman and could use these.
11. Discussion about whether we should implement
a two queue system in PBS on
· Two concerns: Trying to improve throughput and
increase turnover in nodes by having a “short” and “long” queue. And need way for
users to do interactive debugging.
· Could use PBS as is and
encourage users to estimate needed time and checkpoint their jobs.
· Good now that with one
queue all nodes on each cluster are same & equal, so less critical if one
fails or needs to be removed for maintenance.
· For users who need
testing/debugging, could use Totalview on frontend, rather than have express or test queue.
· Decided to get some
stats on the use of requested time vs. actual time – and then send email to
educate users about wall time – and not implement two queue system
for now.
12. Birch crashes in April (head node down
4/10-11, 4/29 &5/1) and not reported to postnews
and downtimes not announced.
· Have a new kernel and it’s better now – do need to post downtimes and unplanned
events for the front-end.
13. Discussion of meeting with IBM on
· There was some good
discussion and it was good to get the UVa folks together. It was mostly the services side of IBM.
14.
Discussion of Common Solutions Group meeting at the Boar’s Head in early
May. Presentations at www.stonesoup.org/Meetings/0405/redux.pres.
· Research computing
support is a common challenge – some universities manage with little central
support while others have large central facilities and staff.
· We do not do a good job
of attracting corporate or grant funding.
· Other universities are
trying a “condominium” approach to clustering.
Is this something we could or should try here? Would have to demonstrate to researchers that
it would be worthwhile for them – and there is a big spectrum of what works for
different research groups. In some
models researchers keep ownership of their own nodes – in others,
they contribute their nodes to a larger cluster. Some use a three year cycle with an annual
purchase scheme – need compatibility of hardware for nodes. Would have to market this to researchers and
get their buy-in.
15. Hamp reported that
16. Next version of PBS may have a web-frontend for users, that will be
good to implement.
Next meeting is scheduled for Monday, June 28 -- but with
multiple anticipated absences it is likely to be postponed to Monday, July 26.
Go to: ITC Research Computing Standing Committee Home Page