Parallel Computer Projects for Fall
2010 & Spring 2011 - Larry Wittie
10aug10
Please
send email to lw @ ic . sunysb
. edu or find me in the 1308 CSB NetLab.
We
have several masters (CSE523/524), independent study (CSE593) and PhD projects
We have
built a Linux-Debian-based parallel computer simulation grid from 8 (mainly
dual) Intel Xeon servers. The Xeon machines each have one or two Xeon
processors, 1 Gigabyte (GB) to 4 GB of memory, and 0.5 to 1.5 Terabytes (TB) of
local disk. The grid is used for
simulations of large-scale shared cache multiprocessor systems. It is based on the same equipment as used
in the 100+ node CEWITT grid. Our
small grid is accessible only within the NetLab. It offers research students hands-on systems-level access
and Linux experience that is much harder to arrange on larger, publicly shared,
production systems.
Large Parallel
Computer Shared Cache Simulator
Our parallel computer simulations run on the
industrial strength functional architectural simulation system, originally from
Virtutech (www.virtutech.com/our-tech/white-papers.pl) on individual Linux processors. Simics is now owned
by Wind River (www.windriver.com/universities/).
For detailed cache usage and timing information, we are also running the Wisconsin
architecture group Gems cache and parallel processor simulation system, a
front-end for Simics and a standard for computer architecture papers. We need
students to code shared cache objects in C++ for Gems and Simics, to code a
remote central memory controller in C++, and to add intra-Simics process ports
to Simics to allow multiple servers to co-operate for simultaneous simulation
of large parallel computer systems.
4 or 5
MS projects
Bridging Deep
Multicomputer Memory Latencies
Current distributed-shared-memory systems
using network-wide coherence for individual processor caches or multi-threaded
architectures are limited to a few hundred or at most a thousand
processors. This project is
exploring the use of shared caches, cache coherence serialized update methods
such as our own “eagersharing” protocol, and thread support hierarchies to
design practical shared-memory systems with tens of thousands of
processors.
The key problem is that for Petaflops (1015
floating operations per second) systems built from thousands of 10 to 100
Gigahertz (GHz) processors, memory systems are physically so large that signal
propagation delays between massive main memories and individual processors are
1,000 to 10,000 processor cycle times.
Techniques that solve the latency problem for
petaflops multiprocessors will also help solve memory bandwidth and latency
limitations for modern multi-core computer-networks-on-a-chip and for future
worldwide global memory addressing that will become feasible in computers with
128-bit memory addresses.
Simulations will use the Gems-Simics system
and other techniques to demonstrate the effectiveness of the latency bridging
methods.
1 or 2
CS PhD students, in addition to the one ECE PhD student already working on the
project