

# DarkHorse a Proposed Peta(FL)OPS Architecture

## Steve Poole Los Alamos National Laboratory Salishan Conference on High Speed Computing April 18-22, 2005 LA-UR-05-2745



Los Alamos
NATIONAL LABORATORY

UNCLASSIFIED



### **Advanced Architecture Team**

- LANL
  - Dave DuBois
  - Andy DuBois
  - **•**Steve Poole
  - Chris Kemper





### **Some History**

- First basic ideas in 1997/1998
- HMM/GA Application (Kestrel, Sequence Alignment Modeling)
- Switch Application (SanNetworks, memory technology)
- 3D FPGA
- Potential Seismic Application (FD, RTM, A/E Modeling, XON)
- Specialized Search/Sort Problem (DB Problem)
- Started @ LANL 2001
  - 3D FPGA
  - 3D CAM
- Early processor disclosures in 2002





### Advanced Architectures Project

Processor & Memory Subsystems

Computer industry collaborations Understand and influence product roadmaps

Semiconductor industry collaborations

3D semiconductor stacking

Co-processor technologies

FPGA accelerators Graphics/Network processor accelerators



#### Dark Horse

Determine the feasibility of developing a PF system in the ~FY08 time frame that is:

> based potentially on a variety of microprocessors,

computationally efficient for LANL algorithms, and straightforward to program. Balanced First Principle Applications & Algorithms

Minimizing time to solution for LANL computational workloads

Adapt algorithms to different architectures

Develop new algorithms that take maximum advantage of computer architectures

Programming model(s)



UNCLASSIFIED



### **Elements of an ASC Simulation Code**



Time evolving coupled multi-physics simulations.



UNCLASSIFIED





### Some Unclassified <u>Testbed</u> Codes

| <u>Code</u>         |
|---------------------|
| (S/R)AGE            |
| MCNP                |
| PARTISN             |
| SWEEP3D             |
| TRUCHAS & TELLURIDE |

### **Association/Support**

**Crestone Project** 

Eolus Project

Sn Transport

Sn sweep strategy

Telluride Project

#### They are:

**<u>Representative</u>** of computer science issues <u>NOT</u> parts of our classified codes used for unclassified applications used for methods & CS testing and R&D often export controlled













UNCLASSIFIED





## 8 SPUs

SMP on a chip

1 PowerPC

L2 (512kB)

- 256 GFlops (SPtotal)
- 256 kB Local Store
- Coherent DMA
- High-B/W Memory
  - 25.6 GB/s (data)

## Configurable I/O interface

- Up to 35GB/s out
- Up to 25GB/s in
- Coherent interface or I/O





### **Stacking Multiple Thin Layers**

Repeat - One Wafer at a Time





Additional Memory ayers to be stacked

BASE



UNCLASSIFIED





### **Stacking Process**

### Three wafers successfully aligned and stacked





UNCLASSIFIED





### **Chip Stacking**







UNCLASSIFIED

CCN

COMPUTING COMMUNICATIONS AND NETWORKING DIVISION



### **Bufferless Optical Crossbar**





### **OSMOSIS** will integrate to be cost competitive with conventional OEO

### Today ~\$50K/port →

### ~\$1.5k/port for commercial



### Provisioning in 16 port increments



UNCLASSIFIED

### **10 Terabit/sec form factor**



- •1.28 Gigapackets/sec in 64 port switch module •Cell-oriented error correction supports
- •Cell-oriented error correction supports 10<sup>-21</sup> BER
- •Goal: 10 Tbit/sec in a single stage module @ first commercial release





### **Development Switch**





As of this month (Test Vehicle/Prototype)

UNCLASSIFIED





#### InfiniBand Roadmap





#### Conclusions

### DarkHorse pushed many design envelopes

- It is the I/O, NOT FLOPS
- 3D Memories
  - Self Healing
- 3D Stacking (S/MOC)
- 3D FPGA/CAM Designs
- Optical Interconnects
  - Networking (OSMOSIS)
  - Chip-to-Chip (ZRL)
- Interconnects
  - 12X-QDR Infiniband
  - 32X-ODR Infiniband (Future)



### **Conclusions (cont)**

- 3D Memories will improve Power/Performance
  - Non-DRAM
- Currently modeling codes against DH design
  - Some new algorithms (Sparse)
  - Potentially new language approaches
  - Future HW/SW designs
- The design is feasible
  - Most of the sub-components exist
- We have started the design of the follow-on







### **Special Thanks (DH)**

#### •LANL

- Gary Grider
- Karl-Heinz Winkler
- John Morrison
- James Peery
- **\***Ken Koch
- Rich Graham
- Mike Boorman

#### • SNL

- Bill Camp
- Jim Tompkins
- Matt Leininger
- Mellanox
- IBM
  - (ZRL,POK,TJW,ARL,STIDC)
- Corning
- Many others...







# **Backup (Movie)**





UNCLASSIFIED



### Data packet bit streams/eye diagrams



40 Gb/s data packet<sup>5 ns/div</sup>





Closer look at bit stream





### Fast integrated optical fiber and color selector

