Notes for the
ITC-Research Computing Standing Committee Meeting
March 9, 2009 at 10:45 AM
2015 Ivy Road, 1st Floor Conference Room
Members: Alice, Andrew, Hamp, Jim, Joe S., Mark, Mike,
Robin, Terry, Tim S., Tom and members of Systems, Research Computing and ACHS
as desired
Attending:
Alice, Andrew, Hamp, Jim, Joe S., Mark, Mike, Robin, Tom, Ed Hall,
Katherine Holcomb, Kathy Gerber
Chair: Tom
Recorder: Alice
I. Corrections to minutes
from last meeting on February 9, 2009? (Click to review the agenda
from this meeting)
II. Old Business:
- Update
on UVACSE APB and related. Queuing policy / documentation
- There
was some discussion about whether (or not) the new queuing policy had
been implemented – or documented.
- Also
discussion about the possibility of having a single queue with resource
routing – who might work on this?
- Andrew
wants the new policy in place (announced, documented, implemented) by the
time the new equipment comes.
Could this get done by the next Town Hall on 03/20/09? Jim and Andrew will work this out
with Hamp and Bill P.
- The
new policy could be announced on the new UVACSE web site – and we
could redirect the high-performance-computing information on ITCWeb so
that information would become re-branded as UVACSE resources.
- So
what would it take to get the new queuing policy implemented? What are the potential impacts on
users – and how much help might they need? Katherine will head a sub-group
(includes Bill P.) and they will come up with a configuration proposal, a
list of challenges that would arise – and then do goals and a
migration plan – in the next week or so.
- Storage
needs assessment and strategy (e.g., Lustre?)
- The
CS department is using Lustre – currently have 40 TB storage and
adding more – it has been effective for CS so far. There is some concern over impact
on the network so they are keeping network stats for awhile. If this experiment seems
successful it could be tried more widely at UVA.
- Andrew
recommends: (1) waiting and seeing how Lustre works for the CS department
(and Michael Shirts); (2) then trying an internal experiment. (Note: Lustre drives need to be
network-addressable and running under Linux – they are not
backed-up.)
- Meanwhile
we could add an extra drive to each new node – Andrew, Jim, and
Hamp to look into this.
- Cluster
node purchase (Update from Hamp)
- Hamp
reported that 26 nodes have been received (8 for Joe Mychaleckyj, 16 for
Michael Shirts); Infiniband came last week. Will put current version of Rocks on it and try it
– and go to the next higher version of Rocks if needed.
- PBS
Renewal (pricing?)
- Katherine
will check on pricing.
- There
was some discussion about the pros/cons of staying with PBS (costly on an
annual basis) vs. going to Torque (may have problems with throughput and
robustness).
- Leave
this on the agenda and we will reconsider it; some cost-sharing for PBS
might be an option.
- VPN
Solution for 64-bit Windows (esp. remote Matlab use)
- We
are slowly moving towards a solution – meanwhile an ssh tunnel does
work – Mark is updating documentation.
- Compiler
usage (Hamp)
- Save
for next meetingÕs agenda.
Looking for PGI compiler usage so we can evaluate whether or not
we need to keep the PGI compilers in addition to the Intel compilers.
III. New Business:
- Cluster
usage reporting to users (Andrew)
- UVACSEÕs
new hire will merge and sort what Tom sent before the next Town Hall
meeting.
- There
was some discussion about using the PBS logs to produce some regular
usage reports – Hamp already has a mysql database, created by
processing the PBS logs, that can be used for this purpose. Hamp will email Andrew the
schema, plus a username/password for access to the database.
- Ownership
of /common directories (Katherine/Andrew)
- Katherine,
Ed, and UVACSE staff will have access through a UVACSE account –
and we will remove the RCSC account (but keep the USERV account which is
used for various other purposes)
- Any
other Items?
- Remember
the UVACSE Town Meeting on March 20, 3-5pm, in MEC 205; short course schedule is coming
out too.
- CS
department got a Cray (4 nodes with Infiniband) – UVACSE is doing
some benchmarking.
IV. Action Items:
- The
sub-committee chaired by Katherine (with Hamp, Bill P., and Ed) will look
into implementing the new queuing policy (including configuration, challenges,
and a migration plan), and also investigate the possibility of
implementing a single queue with resource routing.
- Katherine
will check on PBS renewal pricing.
- As
part of the Lustre experiment, Andrew, Jim, and Hamp will look into adding
an extra drive to each new node.
- Hamp
will look into PGI compiler usage.
- Hamp
will email Andrew the schema for the mysql database of PBS logs info
– and give Andrew access to this database.
- Add
UVACSE account and remove RCSC account.
V. Adjourn by 11:45 and next meeting
- Next
Scheduled Meeting of ITC ResComp Standing Committee: April 13, 2009 in
ITC-2015 Ivy Road, Room #102 (First Floor Conference Room)
Please send suggestions, additions, corrections to: Tom or Alice