Draft
Date: Thu, 20 Sep 2001 08:55 AM
Folks,
In an effort to get this process completed I have taken the HPC report
that Tim Sigmon sent us a couple weeks ago and have done the
following:
I added a opening cover letter to Bob Reynolds
I rearranged things a bit to move the main recommendations to the
front and to put the details and background information into
appendices.
I made a few changes here and there to try to accomodate the points of
view expressed in the emails reacting to the earlier draft.
The new version is appended.
I welcome comments and corrections, but am bound and determined to
bring this to completion ASAP.
-John
--------------------------------------------------------------------
To: Robert E. Reynolds, M.D.
Vice President and Chief Information Officer
Dear Dr. Reynolds:
This document is the report of the ad hoc committee on High
Performance Computing (HPC). This committee was created as a
consequence of the March 2001 Research Computing Task Force Report
recommendation to "convene an ad hoc panel of technical experts and
involved researchers to identify and acquire appropriate successors to
the IBM SP, one of which should include a parallel system."
Our committee met several times and had discussions ranging from what
are the appropriate goals for HPC at the University, to the detailed costs
and capabilities of specific hardware platforms.
Although the committee did not make cost its primary consideration, we
nevertheless attempted to make realistic recommendations that could be
accommodated within current budgets.
As per the committee's charge, we focused on recommendations relating
to hardware acquisition. It should be noted that the ability to
promote HPC at the University, as well as developing our knowledge and
collaborations with other research centers, including the national
supercomputing centers, goes well beyond hardware. It includes the
staff to support, maintain, and allocate resources on the hardware, to
provide essential services and expertise, and to promote interactions
among research faculty and students that would lead to the most
productive use of the facilities. This underscores the close coupling
between our hardware recommendations (which would provide capabilities)
and the support center recommendations of the RCTF and the ad hoc
Advanced Computational Support Center committee (which provide guidance
on the most effective use of those hardware capabilities).
Finally, we note that the charge of the HPC committee, to assess the
hardware and configuration of the HPC environment at UVA, will need to
be undertaken on a regular basis by a research computing oversite
committee composed of faculty and ITC staff.
Yours Truly,
John Hawley (Astronomy, Committee Chair)
Tim Sigmon (ITC, Committee Facilitator)
Hamp Carruth (ITC)
Ed Hall (ITC)
Katherine Holcomb (CS)
Mark McKeown (ITC)
Matt Neurock (Chem-Eng)
Bill Pearson (Biochem & Mol. Genetics)
Mitch Rosen (Engineering)
Steve Stern (Economics)
Tim Tolson (ITC)
REPORT OF THE AD HOC COMMITTEE ON HIGH PERFORMANCE COMPUTING
1. Goals of ITC's High Performance Computing
----------------------------------------------
The committee identified the following general goals for High
Performance Computing at the University of Virginia:
* Maintain a level of institutional computational capability that is
consistent with peer institutions.
* Provide platforms for high performance computing in support of the
research activities of UVa faculty that require more resources than
can be accommodated with a single user's workstation.
Current University HPC facilities and their limitations are outlined
in Appendix A.
* Identify and maintain an upgrade path that is consistent with national
trends in High Performance Computing, and which would be scalable in
the event that new funding opportunities occur. Representative
systems at the National Centers are described in Appendix B.
* Provide appropriate platforms for development and testing of
applications that may later migrate to national supercomputer centers
for production work. The systems ought to be powerful enough to
demonstrate capability and increase the likelihood for success in
applying for time at national supercomputer centers.
* Provide the appropriate facilities that will permit researchers,
including both graduate and undergraduate students, to gain necessary
experience with parallel systems and parallel programming paradigms.
* Provide platforms for the training of and development by ITC staff in the
administration and uses of highly parallel systems.
2. Current User Requirements
---------------------------------
The committee began its deliberations with a consideration of the
general requirements for researchers using HPC. The following
specific hardware needs were identified:
* A system for developing and running large distributed memory parallel
codes.
* A system for developing and running shared memory parallel codes.
* A system for running many serial jobs. The performance of the system will
be defined by how fast serial jobs can be executed and by the number of
jobs a user can have running concurrently.
* A system for running large memory jobs. Currently the largest memory
available in machines provided by ITC have 1GB.
* Systems that meet platform specific requirements; e.g., certain research
tools are tied to specific platforms or operating systems.
When considering specific hardware acquisitions, these requirements can
be re-expressed as the following questions:
* How many CPUs ought to be provided for a parallel distributed memory
environment?
* How fast should the interconnect be in such a system?
* How many CPUs can ITC reasonably dedicate to a user who needs a
shared memory environment and how much memory can they access?
* What performance levels should centralized ITC hardware provide compared to
the current state-of-the-art machines?
* What is the appropriate level of resources (CPUs, memory, etc.)
that ITC and UVa should supply to researchers, in comparison to what
is provided by the national supercomputer centers? To what extent
should we require cutting-edge research at UVa to be beholden to
outside facilities? How will theses resources be increased
year-to-year to match the growing demand for memory by researchers?
3. Committee Recommendations
----------------------------
The committee concluded that ITC has not invested sufficient
resources into high performance computing for a University of UVa's
stature. It was pointed out that many small research groups at UVa
have HPC resources that are superior to what ITC provides to the
University as a whole. This is not because those groups are so far
ahead, but because the University as a whole has lagged so far behind.
Ideally many of the identified immediate requirements for HPC could be
minimally satisfied by a large SMP machine with 32 or more CPUs, and a
Beowulf system with at least 128 nodes. However, budget considerations
appear to make such a solution unrealistic at present. Instead, the
HPC committee recommends that ITC pursue a two-pronged approach:
acquire a Beowulf cluster which will provide a new, powerful, and cost
effective platform, while upgrading certain existing research
computing systems to satisfy the needs for SMP systems and to maintain
consistency with existing software requirements.
Further details of the committee's deliberations are given in Appendix C.
Beowulf Cluster
---------------
ITC should acquire a Beowulf cluster consisting of 96 to 128 nodes with
either single or dual processors. As much of the cost of such a system
is in the interprocessor communication, for greatest cost-effectiveness
the system should be divided into two parts. One part would have a fast
interconnect, such as Gigabit Ethernet, Dolphin, or Myrinet that would
be used for running/developing parallel codes with high communication
requirements. The second part would use less costly 100Mb switched
Ethernet interconnect and would serve as a substantial hardware pool
for serial jobs and parallel jobs that are not communication
intensive.
All nodes of the cluster need not have the same capabilities. For
example, there should be at least 512MB per CPU across the system, but
some of the nodes should have at least 1024MB per CPU to meet
large memory requirements. Effective use of a system such as this
will require effective administration and allocation.
Future upgrades to this system are possible by replacing CPUs with
faster versions, increasing the memory per processor, increasing the
number of nodes, and/or upgrading the network. The optimal path will
become clear in response to researchers' evolving requirements. There
should be an expectation of routine investment in the Beowulf cluster
over the course of its lifetime.
This recommendation includes considerable flexibility in system
design. A final specific and detailed hardware recommendation can only
be made when a firm acquisition budget has been established.
Rationale for Beowulf
---------------------
A Beowulf system is consistent with the national trend toward
distributed memory programming for parallel processing. The Beowulf
cluster will provide a place for developing and running large
distributed memory parallel codes while also serving as a stepping
stone to running jobs at the national centers. It also provides a very
cost effective and appropriate environment for users with serial jobs,
potentially giving them access to many fast CPUs for running a large
number of jobs concurrently. To facilitate the use of the Beowulf
cluster, ITC should be prepared to assist users in porting their code
as well as administering the system (see Advanced Computational Support
Center Committee recommendations).
Upgrading Existing Systems
--------------------------
The second HPC committee recommendation is to upgrade the existing SGI
and SUN research computing systems to maintain an SMP capability, and
consistency with existing software.
The O200s in the Unixlab would be upgraded to comprise 2 machines with
4 CPUs and 4GB of memory each and be configured as SMP platforms.
Other upgrades to be considered include increasing the memory on some
of the SGI O2s, adding a second CPU to the Ultra 60s to make them dual
processor machines for limited shared memory programming, and increasing
the memory on some of the Ultra 60s for large memory jobs.
We note that this stop-gap solution will only be effective with
improved management and administration of these resources. Priority on
these machines must be given to users who will make use of their unique
capabilities, e.g., those who need to run shared memory code or who
have large memory requirements. This will be possible because the
Beowulf cluster will provide a resource for researchers without these
specific needs. Users who successfully develop shared memory codes on
these machines should have a straight-forward development path to the
NCSA SGI Origin 2000 machines which have up to 256 CPUs.
It is recognized that the upgraded SGI O200s do not represent an ideal
solution for a shared memory programming environment. However it is
viewed as a good compromise considering the cost of a new 12-16 CPU SMP
machine, the currently limited number of users who need a shared memory
programming environment, and the fact that the future of parallel
computing is recognized to be distributed memory programming. If the
upgraded O200s do prove successful and demand grows for a shared memory
programming environment, ITC can consider a future acquisition of
a larger suitable SMP machine.
APPENDIX A:
Existing Research Computing Systems at UVa and their limitations
-------------------------------------------------------------------
1) ITC's IBM SP:
20 nodes with 160Mhz Power2 CPUs and 512MB
2 nodes with 120Mhz Power2 CPUs and 512MB
2 nodes with 120Mhz Power2 CPUs and 1024MB
The SP2 only provides a limited parallel programming environment due
to limited resources (a maximum of 8 CPUs per job) and competition
from serial users. IBM's current Power3 CPUs run at 450Mhz and the next
generation Power4 CPUs will run at 1Ghz.
2) ITC's SGI clusters:
7 O2s with 195Mhz R10000 CPUs and 128MB
3 O2s with 300Mhz R12000 CPUs and 256MB
3 O200s with dual 180Mhz R10000 CPUs and 1GB
The O2s with 128MB do not have enough memory to be useful for a lot of
research computing. The O200's are heavily used to run serial codes
mainly because of their large memory. Current SGI CPUs run at 500Mhz.
3) ITC's Sun clusters:
15 Ultra 60 450Mhz Ultra II CPUs and 512MB
6 Ultra 10 333Mhz Ultra IIi CPUs and 512MB
12 Ultra 10 440Mhz Ultra IIi CPUs and 512MB
6 Netra 500Mhz Ultra IIe and 1024MB
The Sun Ultra II CPU has now been superseded by the Ultra III running
at up to 900Mhz. People find the Sun Unixlab machines convenient to log
onto and run a number of jobs concurrently.
4) ACHS Beowulf cluster: 21-node cluster with 42 processors and a four CPU
SGI O200 dedicated to health sciences research.
5) Various non-ITC clusters: The largest in this category is CS's Centurion
which is an aging collection of commodity PCs based on both Intel and
Alpha CPUs.
Limitations of current ITC research computing systems:
- CPUs are slow and outdated.
- Some machines have relatively small amount of memory; the current memory
limit of 1GB may not be enough for some researchers in the near future.
- No large scale parallel environment.
APPENDIX B:
What are some representative systems available nationwide?
-------------------------------------------------------------
National Supercomputer centers (NCSA and NPACI) provide some of the
largest computers in the world for researchers at US universities. Some
of the flagship systems are:
Distributed Terascale Facility: A 13.6 teraflop distributed Linux
cluster located at SDSC, NCSA, Argonne National Laboratory and the
California Institute of Technology.
http://www.npaci.edu/teragrid/index.html
IBM Blue Horizon (NPACI): 1152 IBM Power 3 processors arranged 8 per node,
4GB shared memory per node. Operating system AIX
http://www.npaci.edu/BlueHorizon/
SGI Origin 2000 (NCSA): Total 1520 processors in a variety of shared
memory configurations, from 48 to 256 processors with 14 to 128 GB memory.
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/Origin2000/
Pittsburgh Terascale machine: Currently 256 Compaq Alpha CPUs arranged
4 per node with a Quadrics interconnection network.
http://www.psc.edu/machines/tcs/
The National Centers and affiliated centers also provide a number of smaller
machines including Vector Supercomputers (Cray MTA), SMPs (SUN E10000, HP
V2500), distributed memory machines (Cray T3E, IBM SP2) and Beowulf
clusters (e.g., Los Lobos Linux cluster at Univ. of New Mexico).
Generally these systems can be used with a message passing (e.g., MPI, PVM)
or threaded (e.g., OpenMP) paradigm, or with a combination of the two (hybrid).
Threads operate only within shared memory nodes, and strictly threaded programs
are only considered "supercomputer" scale on machines such as the SGI
Origin which permit up to 256 shared memory CPUs. Threaded programming is
easier to implement (albeit with less scalability). Message passing is required
in all cases to go outside a shared memory domain. Programming is more complex
than with OpenMP. User applications at NCSA are split 50/50 between OpenMP
and MPI, according to last year's user survey. Hybrid programming (threaded
within a node, message passing between nodes) is a possibility, although the
complexity is greater than either paradigm alone. Consultants at SDSC report
that in no case have they observed an effective speed up using a hybrid
approach versus message passing alone. All these conclusions are, of course,
application dependent.
The trend in national supercomputers is towards CLUMPs (Clusters of
Multi-Processor systems) requiring some form of distributed memory programming
to utilize the full power of the system. Acquiring access to small amounts of
time (10,000 hours) on these machines for testing and developing codes is
relatively easy; for larger time allocations researchers are required to
provide more detailed proposals. To use National Supercomputer Centers
researchers must be prepared to use or develop parallel code.
APPENDIX C:
Details of the solutions considered by the HPC committee
-----------------------------------------------------------
The following solutions (and combinations thereof) were considered by
the HPC committee: Beowulf cluster, large SMP machine, upgrades to
current systems. Each of these is detailed in the following
sections.
Beowulf: The HPC committee considered a Beowulf system with
------- between 64 and 128 nodes with either one or two CPUs
per node.
Pro:
- Can be used for distributed memory parallel jobs.
- Can use expensive fast interconnect or cheap 100Mb switched Ethernet.
- Can use high performance CPUs such as P4 or Athlon 4.
- Can be used as a work farm to run lots of serial jobs.
- Best CPU/memory per dollar ("bang for the buck")
- Easily expandable.
Con:
- More difficult to administer compared to an SMP machine.
- Linux less mature compared to other operating systems.
- Does not support SMP programming beyond two CPUs.
- Availability of 3rd party programs less certain.
- Maximum memory limit of 4GB per node (using P4 or Athlon 4 CPUs)
Large SMP: Within the bounds of cost, the HPC committee considered
--------- a 12-16 CPU machine with 12-16GB of memory such as
an SGI Origin 3000 or a Sun Fire 4800.
Pro:
- Supports either distributed or shared memory programming.
- Easy to administer.
- Mature operating system.
- Can support very large memory ( >4GB ).
- Commercial packages work "out-of-the-box".
Con:
- Very expensive.
- Low capacity with a limited number of CPUs.
- Does not encourage parallel development.
- Not consistent with national trends in high performance computing.
Upgrading Current Systems:
--------------------------
Possible upgrades to current ITC systems could include:
1) Increasing the memory on the SGI O2s and O200s.
2) Upgrading the CPUs in the SGI O2s and O200s.
3) Connecting two of the O200s together to form a 4 CPU SMP machine and
acquiring a fourth O200 to create two 4 CPU SMP machines and dedicating
them to users who wish to develop and run shared memory codes.
4) Adding a second CPU to the Ultra 60s and increasing their memory.
Pro:
- Relatively cheap way to provide a modest shared memory programming
environment.
- Provides two 4 CPU machines with 4GB each to be a SMP platform
- Continues ITC's support of SGI, IBM AIX and Sun systems.
Con:
- Upgraded machines may not match the performance of new machines.
- Limit of 4 CPUs for shared memory codes.
- Limit of 4GB for large memory jobs