January 25
University of Tennessee
Innovative Computing Laboratory
Computer Science Department
Jack Dongarra
Windows Cluster
Project
· Detailed description of project exists
People
Jack Dongarra
George Bosilca
Dave Cronk
Graham Fagg
Julien Langou
Piotr Luszczek
Projects:
¨ Numerical Linear Algebra Algorithms and Software
Ø LAPACK, ScaLAPACK, ATLAS
Ø Self Adapting Numerical Algorithms (SANS) Effort
Ø Generic Code Optimization
Ø LAPACK For Clusters – easy access to clusters
¨ Heterogeneous Distributed Computing
Ø NetSolve, FT-MPI, Open-MPI
¨ Performance Evaluation
Ø PAPI, HPC Challenge, Top500
¨ Software Repositories
Ø Netlib
· Project Plan
· has clear timeline and description of expected outcomes
LAPACK
• Used by Matlab, Mathematica, Numeric Python,…
• Tuned version provided by vendors: AMD, Apple, Compaq, Cray, Fujitsu, Hewlett-Packard, Hitachi, IBM, Intel, MathWorks, NAG, NEC, PGI, SUN, Visual Numerics, by Microsoft and most of Linux distribution (Fedora, Debian, Cygwin,...).
• On going work: performance, accuracy, extended precision, ease of use
ScaLAPACK
• Parallel implementation of LAPACK scaling on parallel hardware from 10’s to 100’s to 1000’s of processors
• On going work: Match functionalities of current LAPACK
• On going work: Target new architectures, new parallel environment. For example port to Microsoft HPC cluster solution
LAPACK for Clusters (LFC)
• Most of ScaLAPACK functionality from serial clients (Matlab, Python, Mathematica)
FT-MPI and Open-MPI
· Define the behavior of MPI in event a failure occurs at the process level.
· FT-MPI based on MPI 1.3 (plus some MPI 2 features) with a fault tolerant model similar to what was done in PVM.
o Complete reimplementation, not based on other implementations.
· Gives the application the possibility to recover from a process-failure.
· A regular, non fault-tolerant MPI program will run using FT-MPI.
· What FT-MPI does not do:
o Recover user data (e.g. automatic check-pointing)
o Provide transparent fault-tolerance
Performance evaluation tools
· Performance Application Programming Interface (PAPI)
o A portable library to access hardware counters found on processors
o Provides a standardized list of performance metrics
· KOJAK (Joint with Felix Wolf)
o Software package for the automatic performance analysis of parallel apps
§ Message passing and multi-threading (MPI and/or OpenMP)
§ Parallel performance
§ CPU and memory performance
Posters for Related Projects
|
Hardware Configuration |
|
Team HPC |
|
Dual Core 4GB AMD Opterons |
|
Team HPC Turnkey Beowulf-Class Supercomputer |
|
26 4GB AMD Opteron DC Compute Nodes, 1 Head Node |
| CPU Manufacturer |
AMD |
| CPU Model |
Opteron 265 |
| CPU Speed |
1.8 GHZ |
| Number of nodes |
26 |
| Number of cores |
2 |
| Interconnect(s) |
Infiniband, Myranet, GigE |
| |
|
Item Description |
QTY |
|
26 Compute Nodes |
|
Supermicro H8DCE Motherboard |
26 |
|
3U Chassis w/ 350W PS with PCI-E riser & Slide Rails |
26 |
|
AMD Opteron 265 1.8GHz with Heatsink |
52 |
|
4GB PC3200 Registered/ECC DDR |
104 |
|
1Gb X4 Total memory |
|
80GB 7200rpm SATA 8 MB cache HDD |
26 |
|
ATI Rage on board |
26 |
|
Dual Gigabit Ethernet Integrated on board |
26 |
|
One Year Standard Warranty |
26 |
|
Opteron Linux Installed and Tested |
26 |
|
Built, Tested & Configured |
26 |
|
Torque, Kick-Start Utility & Web-Based Mon. Software |
|
Head Node 4Gb per Node |
| |
|
Supermicro H8DCE Motherboard |
1 |
|
3U Chassis w/ PS and Slide Rails |
1 |
|
AMD Opteron 265 1.8 GHZ with Heatsink and Fan |
2 |
|
4GB PC3200 Registered/ECC DDR |
4 |
|
1GB X 4 Total memory |
|
DVD Combo Drive |
1 |
|
ATI Rage on board |
1 |
|
Dual Gigabit Ethernet Integrated on board |
1 |
|
42U APC Rack Enclosure with perforated doors, sides and levelers |
2 |
|
APC Masterswitch 3 Phase 208 |
2 |
|
Wiring Harness |
1 |
|
1U All in one KB, VIDEO and MOUSE |
1 |
|
 |