Hrc Banner
  1. LANL Home
  2. Engage with LANL
  3. Lab Organizations
  4. aldsct
  5. high-performance-computing
  6. Ultrascale Systems Research Center

Ultrascale Systems Research Center Software

USRC Software

USRC developers contribute to a variety of open source projects.

USRC developers contribute to a variety of open source projects. 

fsstats

Python script that collects statistics about a filesystem hierarchy.  This is used in several of the collections available under Data Sources.  Credit for this tool goes to Marc Unangst, Panasas, DOE, SciDAC-PDSI, and CMU.

Download fsstats via FTP

Parallel Fine-grained Soft Error Fault Injector (P-FSEFI)

PFSEFI is a software fault injector that uses a virtual machine (VM) backend to inject emulated faults into running parallel applications.  Users have advanced controls over faults including a complex fault model.  PFSEFI includes support for Docker to ease installation and deployment.

  • PFSEFI on GitHub

MarFS

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.

  • MarFS on GitHub

GUFI

Grand Unified File Index (GUFI) is designed using a new, hierarchical approach to storing file metadata, allowing rapid parallel searches across many internal databases.

  • GUFI on GitHub

Charliecloud

Charliecloud provides user-defined software stacks (UDSS) for high-performance computing (HPC) centers.

  • Charliecloud on GitHub

TensorFI

TensorFI is a TensorFlow Fault Injector (FI) for machine learning applications that enables users to explore the resiliency of machine learning applications to soft errors.

  • TensorFI on GitHub

PFTool

PFTool (Parallel File Tool) can stat, copy, and compare files in parallel.  PFTool is optimized for HPC workloads and uses MPI for message passing.

  • PFTool on GitHub

Kraken

Kraken is a distributed state engine that can maintain state across a large set of computers. It was designed to provide full-lifecycle maintenance of HPC compute clusters, from cold boot to ongoing system state maintenance and automation.

  • Kraken on GitHub