Parallel Computer Projects for Fall 2010 & Spring 2011 - Larry Wittie   10aug10

Please send email to lw @ ic . sunysb . edu or find me in the 1308 CSB NetLab.

We have several masters (CSE523/524), independent study (CSE593) and PhD projects

 

The NetLab Grid

We have built a Linux-Debian-based parallel computer simulation grid from 8 (mainly dual) Intel Xeon servers. The Xeon machines each have one or two Xeon processors, 1 Gigabyte (GB) to 4 GB of memory, and 0.5 to 1.5 Terabytes (TB) of local disk.  The grid is used for simulations of large-scale shared cache multiprocessor systems.  It is based on the same equipment as used in the 100+ node CEWITT grid.  Our small grid is accessible only within the NetLab.  It offers research students hands-on systems-level access and Linux experience that is much harder to arrange on larger, publicly shared, production systems.

 

Large Parallel Computer Shared Cache Simulator

Our parallel computer simulations run on the industrial strength functional architectural simulation system, originally from Virtutech (www.virtutech.com/our-tech/white-papers.pl) on individual Linux processors. Simics is now owned by Wind River (www.windriver.com/universities/). For detailed cache usage and timing information, we are also running the Wisconsin architecture group Gems cache and parallel processor simulation system, a front-end for Simics and a standard for computer architecture papers. We need students to code shared cache objects in C++ for Gems and Simics, to code a remote central memory controller in C++, and to add intra-Simics process ports to Simics to allow multiple servers to co-operate for simultaneous simulation of large parallel computer systems. 

4 or 5 MS projects

 

 

Bridging Deep Multicomputer Memory Latencies

Current distributed-shared-memory systems using network-wide coherence for individual processor caches or multi-threaded architectures are limited to a few hundred or at most a thousand processors.  This project is exploring the use of shared caches, cache coherence serialized update methods such as our own “eagersharing” protocol, and thread support hierarchies to design practical shared-memory systems with tens of thousands of processors. 

The key problem is that for Petaflops (1015 floating operations per second) systems built from thousands of 10 to 100 Gigahertz (GHz) processors, memory systems are physically so large that signal propagation delays between massive main memories and individual processors are 1,000 to 10,000 processor cycle times.

Techniques that solve the latency problem for petaflops multiprocessors will also help solve memory bandwidth and latency limitations for modern multi-core computer-networks-on-a-chip and for future worldwide global memory addressing that will become feasible in computers with 128-bit memory addresses.

Simulations will use the Gems-Simics system and other techniques to demonstrate the effectiveness of the latency bridging methods.

1 or 2 CS PhD students, in addition to the one ECE PhD student already working on the project