Installation
This guide covers how to build and install Cocoa on your system.
Prerequisites
Before building Cocoa, ensure you have the following dependencies installed:
Compilers
Cocoa requires a C++20 compatible compiler. The minimum versions are dictated by Kokkos 5.0 (included with Trilinos 17), which sets stricter requirements than C++20 alone.
Compiler |
Minimum Version |
Notes |
|---|---|---|
GCC |
10.4.0 |
Recommended for CPU and as CUDA host compiler |
Clang (CPU) |
14.0.0 |
For CPU-only builds |
Clang (CUDA host) |
15.0.0 |
When used as nvcc host compiler |
NVIDIA nvcc |
12.2 |
Requires CUDA Toolkit 12.2+ |
Intel icpx (CPU) |
2022.0.0 |
Intel oneAPI DPC++/C++ Compiler |
Intel icpx (SYCL) |
2024.2.1 |
For SYCL backend builds |
ROCm (HIPCC) |
6.2.0 |
For AMD GPU builds |
NVIDIA HPC SDK (NVC++) |
22.3 |
Alternative to nvcc for NVIDIA GPUs |
Note
These requirements are set by Kokkos 5.0. See the Kokkos Requirements documentation for the most up-to-date information.
Build System
CMake 3.23 or later
GNU Make or Ninja build system
Required Libraries
The following libraries must be pre-installed on your system:
Trilinos 17.0 or later (with Kokkos, KokkosKernels, Tpetra, Belos, Ifpack2, Zoltan2 enabled)
NetCDF-C (4.9.3+ recommended; for mesh and output I/O)
HDF5 (development headers required; installed automatically as a NetCDF-C dependency)
Warning
NetCDF-C versions prior to 4.9.3 have a bug (#2674) that causes spurious HDF5 error messages on stderr when reading variables. Ubuntu 24.04 ships NetCDF-C 4.9.2; if using that distribution, build NetCDF-C 4.9.3+ from source.
ParMETIS (for mesh partitioning, required via Zoltan2 for MPI builds)
Note
Trilinos 17.0+ is required because it includes Kokkos 5.0+ which uses APIs that Cocoa depends on.
Automatically Fetched Dependencies
The following dependencies are automatically downloaded and built via CPM during CMake configuration:
yaml-cpp (configuration file parsing)
spdlog (logging)
fmt (string formatting)
Catch2 (unit testing)
Optional Dependencies
CUDA Toolkit (for NVIDIA GPU support, required for Trilinos CUDA build)
ROCm (for AMD GPU support, required for Trilinos HIP build)
MPI (for distributed computing, if Trilinos was built with MPI)
Building from Source
Clone the Repository
git clone https://github.com/zcobell/cocoa.git
cd cocoa
Configure with CMake
Basic Build:
mkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DNETCDF_DIR=/path/to/netcdf-c \
-DTrilinos_DIR=/path/to/trilinos
With custom install prefix:
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/path/to/install \
-DNETCDF_DIR=/path/to/netcdf-c \
-DTrilinos_DIR=/path/to/trilinos
CMake Options
Option |
Description |
Default |
|---|---|---|
|
Hint path for CMake’s |
(auto-detected; required if not in system paths) |
|
Path to the Trilinos CMake config directory (e.g.,
|
(auto-detected; required if not in system paths) |
|
Combined execution space and MPI configuration. Available options
depend on the Trilinos build. Examples: |
|
|
Build type. Options: |
|
|
Installation directory for |
|
|
Build the unit test suite (requires Catch2, fetched automatically). |
|
|
Enable strict compiler warnings, sanitizers, cppcheck, and hardening. Automatically enabled when building as the top-level project. |
|
|
CUDA memory space. |
|
|
Use the Thrust stream-compaction fast path for the wet/dry index lists (advanced). See Thrust Stream-Compaction Fast Path below. Requires a Thrust/CCCL installation for non-CUDA backends. |
|
|
Print elapsed wall-clock time in screen log output. |
|
Floating-Point Precision
Cocoa computes in double precision (FP64). There is no build-time precision option: to reduce GPU memory traffic, a fixed set of bandwidth-sensitive fields is stored as float and promoted to double on read, while all arithmetic stays in double. See Numerical Methods for the list of mixed-precision fields and the rationale.
Thrust Stream-Compaction Fast Path
Each step, the wet/dry solver rebuilds compacted lists of the currently wet
elements and nodes – a stream compaction (copy_if over an index range).
Cocoa has two implementations of this, selected at build time by
cocoa_USE_THRUST:
Thrust fast path (
cocoa_USE_THRUST=ON):thrust::copy_if. On CUDA this is a single-pass CUBDeviceSelecton the Kokkos execution-space stream, which measured roughly 1–2% of total simulation time on a V100 versus the fallback. On a host backend it runs through the Thrust device system (for example, OpenMP).Portable fallback (
cocoa_USE_THRUST=OFF): a fused Kokkosparallel_scan. Both paths use the same selection predicate and produce identical results.
The option defaults to ON for CUDA backends, where Thrust ships with the
CUDA Toolkit and no extra setup is needed. It is OFF by default for all
other backends and is an advanced option.
To enable the fast path on a non-CUDA (for example, OpenMP) backend you must provide a Thrust installation, typically via NVIDIA CCCL, and point CMake at it:
cmake .. \
-Dcocoa_BACKEND=OPENMP+MPI \
-Dcocoa_USE_THRUST=ON \
-DCCCL_DIR=/path/to/cccl/lib/cmake/cccl
# or, for a standalone Thrust:
# -DThrust_DIR=/path/to/thrust/lib/cmake/thrust
Cocoa configures the host Thrust device system to match the selected backend
(OpenMP or Serial). If cocoa_USE_THRUST=ON is requested but no Thrust/CCCL
installation is found, or the backend is not CUDA/OpenMP/Serial, configuration
fails with an explanatory error. To turn the fast path off (including on a CUDA
build), pass -Dcocoa_USE_THRUST=OFF.
Compile
make -j$(nproc)
Install
make install
Verifying the Installation
Run the test suite to verify your installation:
ctest --output-on-failure