Minutes for the ITC-Research Computing Management
Meeting
January 22, 2007 at 10:30AM
2015 Ivy Road, 2nd Floor Conference Room
Members: Alice, Hamp, Jim,
Joe S., Mark, Mike, Robin, Terry, Tim S., Tom S., Tim T.
Attending: Alice, Hamp, Jim,
Joe S., Mark, Mike, Robin, Terry, Tom S., Tim T.
Chair: Tim T. Recorder: Alice Howard
I. Corrections to minutes
from last meeting on December 18, 2006? None.
(
Click to review the
agenda from this meeting
)
II. Old Business:
- We
are planning a public input forum on our proposed expenditures – and have
set a meeting for January 30 from 10A to 11:30 in Newcomb Hall
Commonwealth Room to present our situation and our proposal for
allocation. Proposal for allocating $150K (or it could be a bit less) in
hardware funds – Mike will moderate the forum and we’ll spend a few
minutes on the current environment, move on to what options have been
considered and our recommendations.
- Use
Nortel switches on dogwood, rather than spend some of $150K on best
possible switches - difference between better and best. Could it be
characterized at 80% vs 100%?
- Not
renew Birch maintenance contract for another year, rather may have to let
nodes die as equipment fails if parts not available (e.g., can't find
these older myrinet card anywhere), expect at most 2 or 3 nodes to have
such failures.
- Maximizes
allocation to buy 24 to 36 infiniband cluster (for fine-grained parallel)
to replace birch. (Note: there was some discussion about whether or not
we have users who would really use a high-speed interconnect – and we
could get more nodes and memory without the infiniband.)
- PLUS
1 or 2 very large memory computers (not clusters) - 4Way CPU with 32 GB
memory, with Linux OS to replace "athena". Requires 64-bit
kernel. (Hamp estimated that a 2Way box with 8 GB nodes would cost around
$3200/node.; 16GB/node about $4,500.) Intel CPU more slightly more
expensive than comparable AMD.
- Two
Proposals for "buy-in" to the cluster, both new one and existing
dogwood. One gives priority to running jobs anytime you want by dedicating
X purchased nodes to exclusive use of the researcher. The other gives
priority queue access to the entire cluster. Draft proposal of these two
is detailed at:
http://holmes.itc.virginia.edu/ResNotes/cluster-purchase.html
- Jim
& Hamp, at December 18 meeting, were going to look into rate for
buying "priority" time on dogwood cluster: There was some
discussion about what “priority” might actually mean – and about whether
or not we are trying to encourage this service. The technology is easy –
but there could be issues for users (e.g. on the need to code with check
points so that jobs could be restarted if pre-empted). “Priority” could
also mean next-in-queue.
- Or
we could allow users to “buy” nodes for some period-of-time ahead of time
– i.e. reserve a week ahead of time so others would not be allowed to
start – this has appeal.
- And
we could also have a pre-emptive queue for “quick” jobs or jobs that
could get yanked.
- Tom
S: consider VMWare, is a 10% performance hit, but might help solve
checkpoint issue. Hamp thinks 10% performance hit is pretty step given
how maxed out clusters are on usage.
- Mark:
Have a “pre-empitable” queue
- Tim
T. will sketch these up; Jim and Hamp will have a go at figuring a
rate/charge.
- Tom
S. Genomics might want to buy in to cluster – but infiniband type cluster
not appropriate for them. They need large memory computers.
- Linux
Clusters: Birch, Cedar, Dogwood - any issues, concerns or upcoming
updates/changes?
- Status
of cedar test cluster switching to 64-bit OS? Later this week.
- All
Nortel switches installed on dogwood now? Yes.
- Any
update on conversion to authenticate to Eservices via Kerberos for Unix
logins/connections to blue.unix, HSM & Longtmp? (AKA Phase 2 of
Netbadge CDP).
- HSM:
will be a long slow migration moving the data (probably a couple of months)
– might have to suspend service a bit to finish the change-over – will
have a Samba client that does Eservices login.
- Longtmp:
has been moved to a network appliance and is using Eservices.
- This
transition will involve a major user education issue – especially for
blue.unix users. We are doing a pilot with Eservices authentication on
the Student Council server and will learn from this experience.
- No
timeline for this project yet – it has to wait until after the “Unified
Password” web site is up.
- ftp
to blue.unix is being phased out, ITC-Service Transitions CDP working on
schedule.
- Infrastructure
Supporting Research Task Force (https://www1.seas.virginia.edu/itrtf/).
(Maillist is itrtf@virginia.edu)
- CI-TEAM
NSF proposal funded. Course has a time (MWF, 9-10) and course number
(CS494 Computational Mathematics), fully enrolled (40+ undergrads, mostly
3rd & 4th year, mix of E-School disciplines, not all CS.
- UVa's
participation in ORNL's response to NSF RFP for next generation/iteration
of Supercomputing Center. The full proposal is due on 2/2/07 - Any ITC
participation, input? Mike
will contact Mitch Rosen who is coordinating.
- Update
on longtmp and HSM upgrades? See notes above. (HSM scheduled for spring07
semester)
- Longtmp
switched to NetApp - more disk space now. Snapshot frequency should be?
Hamp will check.
- SURA
survey request- TimT. will
contact Phil about pulling together group, possibly by email. No news.
- At
Dec 14 meeting, agreed that TimT. would contact Elec.Engineering about
droppping the 8 Ultra60s in E225 and ITC no longer having a presence in
E225 after Spring 2007 semester. We are willing to transfer the Ultra60s
to Elec Engineering if they want them. Will put a one or two each of
Linux, Sun and Macintosh workstations in ResComp Lab in Clark. No news.
- EnvSci
request to get nodes from Aspen cluster. Agreed 12/14 to have TimT. meet
with EnvSci faculty coordinator of LSP making request to see if other ways
we can meet needs generating this request.
- They
mostly just want to experiment and learn.
III. New Business:
- Jim
and/or Tim S. - Report on StoneSoup.org meeting in early January. Mike was there as well.
- There
was a survcy at the last meeting about how different schools do support,
see www.stonesoup.org for results.
- Other
schools are also having Data Center problems. Indiana is providing a massive
data storage system known as the data capacitor.
- Mike
- report on new Computational Science Advisory Council: this group will be working on implementation
issues from Task Force report from Spring06 – will stay at the strategic
level, so not the kind of advisory group for ITC.
- Any
other items? Tom S. reported
that there is a new program in Public Health Genomics with 18 faculty
spread over 3 locations.
IV. Adjourn by 12:00 and next
meeting
- Next
Scheduled Meeting of ITC ResComp Standing Committee: Monday, February 26th,
10:30 AM in ITC-2015 Ivy Road, Room #220.
Please send suggestions, additions, corrections to: Tim Tolson or Alice Howard

Go to: ITC Research
Computing Committee Home Page