Minutes for the ITC-Research Computing Standing Committee Meeting

June 27, 2005 at 10:30AM

 (in ITC-2015 Ivy Road, Room 102)

 

 

Members: Alice, Bill, Dawn, Ed, Hamp, Kathy G., Katherine H., Jim, Mark S., Martha, Robin, Sue Ellen, Steve, Terry, Tim S., Joe S., Tom S.

 

Attending:  Alice, Bill, Dawn, Ed, Hamp, Kathy G., Katherine H., Martha, Steve, Tang, Terry, Tim S.,

Chair: Tim T. Recorder: Alice Howard

I. Corrections to minutes from last meeting on April 25, 2005?  - None.

 

II. Ongoing Discussion Topics:

 

1.  Cluster projects:

·    “Condo” Cluster (aka cedar): Will operate as 32-bit cluster for at least first year. (Note:  we need to advertise this as having 125 compute nodes – we need to reserve 1 node for an AVAKI gateway.)

o        Need firm “go-live” date and have as smooth, problem-free rollout as possible – need to make a very favorable impression if we want to encourage further participation in condo-cluster model. Picked July 6 for the rollout date, even though we will still have some ongoing power and A/C issues.

o        Have four researchers participating: Leo Zhigilei (lz2n) – 8 nodes w/ ETF $; Jeff Shabanowitz/Dina Bai (Chemistry)- 8 nodes; Phil Parrish/Rich Gregory – 4 nodes; Matt Neurock – 4 nodes.

o        Outstanding issues:

§         Problems with insufficient power and A/C in the machine room – Facilities Management will be adding more but do not have a timeframe yet.  Meanwhile experimenting with how many nodes we can have up (~88-96) – will rollout to users even if all 125 nodes are not up.

§         Mitch Rosen wants to buy 10 nodes from this cluster – will go ahead with this and subtract the 10 from the general pool of nodes.

§         Bill thinks that the NPI link-checker software could be quite useful for our clusters but it does cost some $$.

§         Bill thinks he can suppress the RSA warnings.

§         The problem with /state/partition1/ going missing was likely due to some configuration work that was going on – think it’s stable now.

o        PBSPro  -- priority queuing for paying participants and usage reporting.

·    Teak Cluster:

o        Right now, only users on it are using Gaussian– if get Gaussian for Linux, might not need teak anymore.  The goal is to maintain a “rolling” set of 3 Linux clusters at any given time.

·    Oak Cluster:

o        Hamp will see if it is possible to add the modules software that would make it seamless for IMSL and Java users to move from aspen to oak and make it easier for users to access different versions of the Solaris compilers. 

·    Aspen & Birch cluster: Timeline for OS upgrade and upgrade to kernel to bring both clusters into concordance with cedar?

o        Will do this upgrade after the new cluster is settled in – (note:  cedar is at 2.4.) aim for first week in August and schedule 2 days of downtime.

 2.  /longtmp additional disk space is installed, bringing /longtmp to 64GB (450% increase) and the two HSM cache directories expanded to 64GB. Have capacity to increase /longtmp again.

·    Hamp and Tim will collaborate on a script to automate checking usage and reminding users.

3.  Linux license server, cluster of three lm1., lm2., lm3.license.

·    About half of the research software licenses have been migrated off of aix.license onto the linux license server.  Will continue to move as they need to be updated.  Having a problem with ESRI.

·    Although we recommend using a VPN whenever possible, it would help users doing ssh tunneling if we could find a way to designate the master – Hamp will look into it.

 

4.  blue.unix upgrade of nodes to new 64-bit nodes and new version of OS. ITC-Transitions CDP is tracking as well.  Timeline?

·    holmes is being upgraded to AIX 5.3 today.  Then we need to get people using it to see what breaks within the coming week. Won’t have our old load balancing program, but do have some alternatives.

·    Going from 16 nodes to 3 nodes for blue.unix.

 

III. New Business:

 

5. Ganglia web pages:  Need to add cedar.itc.

·    It might be possible to separate the teak and oak clusters from the general research computing cluster so we can see these nodes easily – Steve will look at it.

 

6. Should we increase the CPU limit on blue.unix since the new nodes will be much faster?

·    Since the new nodes are faster and use of blue.unix has declined, one should get a lot more done in an hour of CPU time – so we decided to leave the CPU limit as is for the time being and see what happens – we could relax it later.

 

 

Next Scheduled Meeting of this group: Monday, August 29th, 10:30 AM in ITC-2015 Ivy Road, Room #102.

 

Next Scheduled Meeting of ITC ResComp Management team: Monday, July 25th, 10:30 AM in ITC-2015 Ivy Road, Room #220.

 

 <--- Go to: ITC Research Computing Standing Committee Home Page