============
Installation
============

This guide covers how to build and install Cocoa on your system.

Prerequisites
-------------

Before building Cocoa, ensure you have the following dependencies installed:

Compilers
^^^^^^^^^

Cocoa requires a C++20 compatible compiler. The minimum versions are dictated by
Kokkos 5.0 (included with Trilinos 17), which sets stricter requirements than
C++20 alone.

.. list-table:: Minimum Compiler Requirements
   :header-rows: 1
   :widths: 30 20 50

   * - Compiler
     - Minimum Version
     - Notes
   * - GCC
     - 10.4.0
     - Recommended for CPU and as CUDA host compiler
   * - Clang (CPU)
     - 14.0.0
     - For CPU-only builds
   * - Clang (CUDA host)
     - 15.0.0
     - When used as nvcc host compiler
   * - NVIDIA nvcc
     - 12.2
     - Requires CUDA Toolkit 12.2+
   * - Intel icpx (CPU)
     - 2022.0.0
     - Intel oneAPI DPC++/C++ Compiler
   * - Intel icpx (SYCL)
     - 2024.2.1
     - For SYCL backend builds
   * - ROCm (HIPCC)
     - 6.2.0
     - For AMD GPU builds
   * - NVIDIA HPC SDK (NVC++)
     - 22.3
     - Alternative to nvcc for NVIDIA GPUs

.. note::

   These requirements are set by Kokkos 5.0. See the
   `Kokkos Requirements <https://kokkos.org/kokkos-core-wiki/get-started/requirements.html>`_
   documentation for the most up-to-date information.

Build System
^^^^^^^^^^^^

- CMake 3.23 or later
- GNU Make or Ninja build system

Required Libraries
^^^^^^^^^^^^^^^^^^

The following libraries must be pre-installed on your system:

- **Trilinos 17.0 or later** (with Kokkos, KokkosKernels, Tpetra, Belos, Ifpack2, Zoltan2 enabled)
- **NetCDF-C** (4.9.3+ recommended; for mesh and output I/O)
- **HDF5** (development headers required; installed automatically as a NetCDF-C dependency)

.. warning::

   NetCDF-C versions prior to 4.9.3 have a bug (`#2674 <https://github.com/Unidata/netcdf-c/issues/2674>`_)
   that causes spurious HDF5 error messages on stderr when reading variables.
   Ubuntu 24.04 ships NetCDF-C 4.9.2; if using that distribution, build
   NetCDF-C 4.9.3+ from source.

- **ParMETIS** (for mesh partitioning, required via Zoltan2 for MPI builds)

.. note::

   Trilinos 17.0+ is required because it includes Kokkos 5.0+ which uses APIs
   that Cocoa depends on.

Automatically Fetched Dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following dependencies are automatically downloaded and built via
`CPM <https://github.com/cpm-cmake/CPM.cmake>`_ during CMake configuration:

- yaml-cpp (configuration file parsing)
- spdlog (logging)
- fmt (string formatting)
- Catch2 (unit testing)

Optional Dependencies
^^^^^^^^^^^^^^^^^^^^^

- **CUDA Toolkit** (for NVIDIA GPU support, required for Trilinos CUDA build)
- **ROCm** (for AMD GPU support, required for Trilinos HIP build)
- **MPI** (for distributed computing, if Trilinos was built with MPI)

Building from Source
--------------------

Clone the Repository
^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   git clone https://github.com/zcobell/cocoa.git
   cd cocoa

Configure with CMake
^^^^^^^^^^^^^^^^^^^^

**Basic Build**:

.. code-block:: bash

   mkdir build && cd build
   cmake .. \
       -DCMAKE_BUILD_TYPE=Release \
       -DNETCDF_DIR=/path/to/netcdf-c \
       -DTrilinos_DIR=/path/to/trilinos

**With custom install prefix**:

.. code-block:: bash

   cmake .. \
       -DCMAKE_BUILD_TYPE=Release \
       -DCMAKE_INSTALL_PREFIX=/path/to/install \
       -DNETCDF_DIR=/path/to/netcdf-c \
       -DTrilinos_DIR=/path/to/trilinos

CMake Options
^^^^^^^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 28 47 25

   * - Option
     - Description
     - Default
   * - ``NETCDF_DIR``
     - Hint path for CMake's ``FindNetCDF`` module. Point to the NetCDF-C
       installation prefix.
     - (auto-detected; required if not in system paths)
   * - ``Trilinos_DIR``
     - Path to the Trilinos CMake config directory (e.g.,
       ``<prefix>/lib/cmake/Trilinos``).
     - (auto-detected; required if not in system paths)
   * - ``cocoa_BACKEND``
     - Combined execution space and MPI configuration. Available options
       depend on the Trilinos build. Examples: ``CUDA+MPI``, ``CUDA``,
       ``HIP+MPI``, ``OPENMP+MPI``, ``OPENMP``, ``SERIAL+MPI``, ``SERIAL``.
     - ``DEFAULT`` -- auto-selects the best available backend from Trilinos
       (prefers GPU over CPU, MPI over non-MPI)
   * - ``CMAKE_BUILD_TYPE``
     - Build type. Options: ``Release``, ``Debug``, ``RelWithDebInfo``,
       ``MinSizeRel``.
     - ``RelWithDebInfo`` (if not specified)
   * - ``CMAKE_INSTALL_PREFIX``
     - Installation directory for ``make install``.
     - ``/usr/local``
   * - ``BUILD_TESTING``
     - Build the unit test suite (requires Catch2, fetched automatically).
     - ``OFF``
   * - ``cocoa_MAINTAINER_MODE``
     - Enable strict compiler warnings, sanitizers, cppcheck, and hardening.
       Automatically enabled when building as the top-level project.
     - ``OFF`` (``ON`` when top-level project)
   * - ``cocoa_CUDA_MEMORY_SPACE``
     - CUDA memory space. ``CUDA`` for device memory, ``CUDAUVM`` for
       unified virtual memory. Only applies to CUDA backends.
     - ``CUDA``
   * - ``cocoa_USE_THRUST``
     - Use the Thrust stream-compaction fast path for the wet/dry index
       lists (advanced). See :ref:`thrust-fast-path` below. Requires a
       Thrust/CCCL installation for non-CUDA backends.
     - ``ON`` for CUDA, ``OFF`` otherwise
   * - ``cocoa_PRINT_LOG_TIME``
     - Print elapsed wall-clock time in screen log output.
     - ``OFF``

Floating-Point Precision
^^^^^^^^^^^^^^^^^^^^^^^^

Cocoa computes in double precision (FP64). There is no build-time precision
option: to reduce GPU memory traffic, a fixed set of bandwidth-sensitive
fields is *stored* as float and promoted to double on read, while all
arithmetic stays in double. See :doc:`/theory/numerical_methods` for the list
of mixed-precision fields and the rationale.

.. _thrust-fast-path:

Thrust Stream-Compaction Fast Path
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Each step, the wet/dry solver rebuilds compacted lists of the currently wet
elements and nodes -- a stream compaction (``copy_if`` over an index range).
Cocoa has two implementations of this, selected at build time by
``cocoa_USE_THRUST``:

- **Thrust fast path** (``cocoa_USE_THRUST=ON``): ``thrust::copy_if``. On CUDA
  this is a single-pass CUB ``DeviceSelect`` on the Kokkos execution-space
  stream, which measured roughly 1--2% of total simulation time on a V100
  versus the fallback. On a host backend it runs through the Thrust device
  system (for example, OpenMP).
- **Portable fallback** (``cocoa_USE_THRUST=OFF``): a fused Kokkos
  ``parallel_scan``. Both paths use the same selection predicate and produce
  identical results.

The option defaults to ``ON`` for CUDA backends, where Thrust ships with the
CUDA Toolkit and no extra setup is needed. It is ``OFF`` by default for all
other backends and is an *advanced* option.

To enable the fast path on a non-CUDA (for example, OpenMP) backend you must
provide a `Thrust <https://nvidia.github.io/cccl/thrust/>`_ installation,
typically via `NVIDIA CCCL <https://github.com/NVIDIA/cccl>`_, and point CMake
at it:

.. code-block:: bash

   cmake .. \
       -Dcocoa_BACKEND=OPENMP+MPI \
       -Dcocoa_USE_THRUST=ON \
       -DCCCL_DIR=/path/to/cccl/lib/cmake/cccl
       # or, for a standalone Thrust:
       # -DThrust_DIR=/path/to/thrust/lib/cmake/thrust

Cocoa configures the host Thrust device system to match the selected backend
(OpenMP or Serial). If ``cocoa_USE_THRUST=ON`` is requested but no Thrust/CCCL
installation is found, or the backend is not CUDA/OpenMP/Serial, configuration
fails with an explanatory error. To turn the fast path off (including on a CUDA
build), pass ``-Dcocoa_USE_THRUST=OFF``.

Compile
^^^^^^^

.. code-block:: bash

   make -j$(nproc)

Install
^^^^^^^

.. code-block:: bash

   make install

Verifying the Installation
--------------------------

Run the test suite to verify your installation:

.. code-block:: bash

   ctest --output-on-failure