==========
Containers
==========

Cocoa is distributed as Docker containers pre-built for a range of GPU and CPU
architectures. This is the easiest way to get started without compiling from
source. The same images also run under Singularity/Apptainer on HPC systems
(see `Running with Singularity / Apptainer`_ below).

.. note::

   Cocoa computes in double precision; a few bandwidth-sensitive fields are
   stored in single precision internally (see :doc:`../theory/numerical_methods`).
   This is built in and requires no configuration, so there is a single build
   per architecture rather than separate precision variants.

Image Families and Naming
-------------------------

Cocoa images are split into three families by accelerator toolchain, identified
by the tag suffix. ``<tag>`` is the Cocoa release (for example ``latest`` or a
version such as ``1.0``):

.. list-table::
   :header-rows: 1
   :widths: 30 25 45

   * - Image
     - Hardware
     - Base
   * - ``cocoaorg/cocoa:<tag>-cpu``
     - CPU only (x86-64 and ARM64)
     - Ubuntu
   * - ``cocoaorg/cocoa:<tag>-cuda``
     - NVIDIA GPUs (CUDA)
     - ``nvidia/cuda`` (Ubuntu)
   * - ``cocoaorg/cocoa:<tag>-rocm``
     - AMD GPUs (ROCm/HIP)
     - ``rocm/dev-ubuntu``

The families are kept separate because the CUDA and ROCm toolkits are mutually
exclusive at the hardware level and each is large; a host is either NVIDIA or
AMD, so a combined image would only add bloat. Splitting the CPU build out of
the GPU images also keeps CPU and CI users from pulling a multi-gigabyte GPU
toolkit they will never use.

The ``-cpu`` image is a single **multi-architecture** tag: it carries both
``linux/amd64`` and ``linux/arm64`` variants behind one name, and ``docker
pull`` automatically selects the one matching your host. The ARM64 variant runs
natively on Apple Silicon Macs through Docker Desktop, as well as on AWS
Graviton, Ampere Altra, and NVIDIA Grace.

The ``-cuda`` and ``-rocm`` images are **GPU-only**: they do not include a CPU
fallback build. Use the ``-cpu`` image to run on a node without a supported GPU.

The Trilinos base images follow the same convention:
``cocoaorg/trilinos_base:<tag>-cpu``, ``-cuda``, and ``-rocm``.

Supported Architectures
-----------------------

**CPU** (``-cpu`` image, x86-64 and ARM64):

.. list-table::
   :header-rows: 1
   :widths: 20 30 35

   * - Architecture
     - Backend
     - ISA tuning
   * - ``serial``
     - Single-threaded
     - Haswell (x86-64) / Neoverse-N1 (ARM64)
   * - ``openmp``
     - Multi-threaded
     - Haswell (x86-64) / Neoverse-N1 (ARM64)

**NVIDIA GPUs** (``-cuda`` image):

.. list-table::
   :header-rows: 1
   :widths: 20 30 30

   * - Architecture
     - Hardware
     - Compute Capability
   * - ``volta70``
     - NVIDIA Volta (V100)
     - 7.0
   * - ``turing75``
     - NVIDIA Turing (T4)
     - 7.5
   * - ``ampere80``
     - NVIDIA Ampere (A100)
     - 8.0
   * - ``ampere86``
     - NVIDIA Ampere (A10)
     - 8.6
   * - ``ada89``
     - NVIDIA Ada (L40S)
     - 8.9
   * - ``hopper90``
     - NVIDIA Hopper (H100)
     - 9.0
   * - ``blackwell100``
     - NVIDIA Blackwell (B100)
     - 10.0

**AMD GPUs** (``-rocm`` image):

.. list-table::
   :header-rows: 1
   :widths: 20 30 30

   * - Architecture
     - Hardware
     - GFX ISA
   * - ``mi300``
     - AMD Instinct MI300X / MI300A
     - gfx942
   * - ``mi200``
     - AMD Instinct MI210 / MI250 / MI250X
     - gfx90a

Within an image, the variant name is simply the architecture name, e.g.,
``ampere80``, ``mi300``, or ``openmp``.

Running the Container
---------------------

**CPU (x86-64 or ARM64, auto-selected):**

.. code-block:: bash

   docker run -it -v $(pwd):/workspace cocoaorg/cocoa:latest-cpu

**NVIDIA GPU (requires the NVIDIA Container Toolkit):**

.. code-block:: bash

   docker run -it --gpus all -v $(pwd):/workspace cocoaorg/cocoa:latest-cuda

**AMD GPU (requires the ROCm kernel driver on the host):**

.. code-block:: bash

   docker run -it --device=/dev/kfd --device=/dev/dri \
       --group-add video --security-opt seccomp=unconfined \
       -v $(pwd):/workspace cocoaorg/cocoa:latest-rocm

The ``-v $(pwd):/workspace`` flag mounts your current directory into the
container's working directory so Cocoa can access your mesh and configuration
files.

Selecting a Variant
-------------------

Each image bundles the variants for its own family and selects a sensible
default: the ``-cpu`` image defaults to ``serial``, while the GPU images default
to a representative architecture (``ampere80`` for ``-cuda``, ``mi200`` for
``-rocm``). Use the ``select_cocoa`` command to switch at runtime:

.. code-block:: bash

   # List the variants available in this image
   source select_cocoa --help

   # NVIDIA: select the A100 backend (in the -cuda image)
   source select_cocoa ampere80

   # AMD: select the MI300 backend (in the -rocm image)
   source select_cocoa mi300

   # CPU: select the multi-threaded build (in the -cpu image)
   source select_cocoa openmp

   # Verify selection
   which cocoa

A variant exists only in the image for its family -- for example ``ampere80`` is
present only in ``-cuda`` and ``mi300`` only in ``-rocm``. The selection
persists for the duration of the shell session. To set it at launch, pass the
``COCOA_ARCH`` environment variable:

.. code-block:: bash

   # Run on an A100
   docker run -it --gpus all -e COCOA_ARCH=ampere80 \
       -v $(pwd):/workspace cocoaorg/cocoa:latest-cuda

Running a Simulation
--------------------

Once inside the container with the appropriate architecture selected:

.. code-block:: bash

   cocoa -i your_config.yaml

An example simulation is included in the container at ``/opt/cocoa/examples``:

.. code-block:: bash

   cp -r /opt/cocoa/examples/* .
   source select_cocoa serial
   cocoa -i simple.yaml

See :doc:`quick_start` for details on configuration files and expected output.

Converting ADCIRC Meshes
------------------------

The container includes the ``cocoa_mesh_tools.py`` utility for converting
ADCIRC model files to Cocoa's NetCDF mesh format. Python 3 with ``netCDF4``
and ``numpy`` are pre-installed.

**Basic mesh conversion (fort.14 only):**

.. code-block:: bash

   python3 /opt/cocoa/utils/cocoa_mesh_tools.py from_adcirc \
       --mesh fort.14 \
       --output mesh.nc

**With nodal attributes (fort.13):**

.. code-block:: bash

   python3 /opt/cocoa/utils/cocoa_mesh_tools.py from_adcirc \
       --mesh fort.14 \
       --attributes fort.13 \
       --output mesh.nc

**With self-attraction and loading (fort.24):**

.. code-block:: bash

   python3 /opt/cocoa/utils/cocoa_mesh_tools.py from_adcirc \
       --mesh fort.14 \
       --attributes fort.13 \
       --sal fort.24 \
       --output mesh.nc

.. list-table:: Conversion Script Options
   :header-rows: 1
   :widths: 25 15 60

   * - Flag
     - Required
     - Description
   * - ``--mesh``
     - Yes
     - Path to ADCIRC fort.14 mesh file
   * - ``--output``
     - Yes
     - Path for output NetCDF file
   * - ``--attributes``
     - No
     - Path to ADCIRC fort.13 nodal attributes file
   * - ``--sal``
     - No
     - Path to self-attraction/loading file (fort.24 ASCII or NetCDF)

See :doc:`../user_guide/mesh_preparation` for details on the NetCDF mesh format
and supported nodal attributes.

Mounting Data Volumes
---------------------

Mount your simulation directory into the container so input files are
accessible and output files persist after the container exits:

.. code-block:: bash

   # Mount a single directory
   docker run -it --gpus all \
       -v /path/to/simulation:/workspace \
       cocoaorg/cocoa:latest-cuda

   # Mount input and output separately
   docker run -it --gpus all \
       -v /path/to/meshes:/data/meshes:ro \
       -v /path/to/output:/workspace \
       cocoaorg/cocoa:latest-cuda

.. tip::

   Use ``:ro`` (read-only) for input data mounts to prevent accidental
   modification of source files.

Non-Interactive Execution
-------------------------

Run a simulation without entering the container interactively:

.. code-block:: bash

   docker run --gpus all \
       -v $(pwd):/workspace \
       -e COCOA_ARCH=ampere80 \
       cocoaorg/cocoa:latest-cuda \
       cocoa -i config.yaml

Running with Singularity / Apptainer
------------------------------------

Many HPC clusters use Singularity (or its successor Apptainer) instead of
Docker, since it runs unprivileged and integrates cleanly with schedulers such
as SLURM. Singularity can pull and convert the same images directly from Docker
Hub -- no separate build or image format is required. The commands below use
``singularity``; substitute ``apptainer`` if that is what your site provides
(the two are command-line compatible).

**1. Pull and convert the image from Docker Hub.** Singularity fetches the
Docker image and converts it into a single ``.sif`` file:

.. code-block:: bash

   singularity pull cocoa-cuda.sif docker://cocoaorg/cocoa:latest-cuda

.. note::

   The GPU images are several gigabytes, and Singularity unpacks them into a
   temporary directory before assembling the ``.sif``. If ``/tmp`` is small (a
   common default on login nodes), point the cache and temporary directories at
   a filesystem with sufficient free space *before* pulling:

   .. code-block:: bash

      export SINGULARITY_CACHEDIR=/path/to/scratch/singularity_cache
      export SINGULARITY_TMPDIR=/path/to/scratch/singularity_tmp
      mkdir -p "$SINGULARITY_CACHEDIR" "$SINGULARITY_TMPDIR"

**2. Confirm the GPU is visible.** The ``--nv`` flag passes the host NVIDIA
driver into the container (use ``--rocm`` for the ``-rocm`` image). The quickest
check is to run ``nvidia-smi`` inside the container:

.. code-block:: bash

   singularity exec --nv cocoa-cuda.sif nvidia-smi

If this prints your GPU's statistics, the card was passed through successfully.

**3. Select the architecture and confirm Cocoa runs.** As with the Docker
images, ``COCOA_ARCH`` selects the GPU variant. Choose the name matching your
hardware from the architecture table above:

.. code-block:: bash

   singularity run --nv --env COCOA_ARCH=ampere80 cocoa-cuda.sif cocoa --version

You should see the version banner, for example:

.. code-block:: text

   cocoa <version>
   Cocoa - Coastal and Ocean Circulation on Accelerators
   (c) 2026 Zach Cobell

.. important::

   Setting ``COCOA_ARCH`` matters: each image bundles builds for many GPUs, and
   this variable selects the one matching your card. Use ``singularity run``
   (not ``exec``) when relying on it -- the container's entrypoint translates
   ``COCOA_ARCH`` into the correct ``PATH`` only on ``run``. Under
   ``singularity exec`` or ``singularity shell`` the entrypoint does not run, so
   select the variant explicitly instead with ``source select_cocoa ampere80``.

**4. Launch a clean session with your data mounted.** HPC login shells often
inject their own environment -- module systems, an activated Conda ``base``, a
custom ``LD_LIBRARY_PATH`` -- and because Singularity inherits the host
environment and bind-mounts your home directory by default, these can leak into
the container and shadow its Python and libraries. Start the container isolated
from the host environment and mount your working directory to ``/workspace``:

.. code-block:: bash

   singularity run -c --cleanenv --nv \
       --env COCOA_ARCH=ampere80 \
       --bind "$PWD:/workspace" --pwd /workspace \
       cocoa-cuda.sif

* ``-c`` / ``--cleanenv`` isolate the container from the host environment so it
  uses its own Python, libraries, and ``PATH``.
* ``--bind "$PWD:/workspace"`` mounts your current directory (any path works)
  so inputs are visible and outputs persist after the container exits.
* ``--pwd /workspace`` starts you in that directory.

From inside this session you can convert an ADCIRC mesh and run a model exactly
as with the Docker image:

.. code-block:: bash

   python3 /opt/cocoa/utils/cocoa_mesh_tools.py from_adcirc \
       --mesh fort.14 --attributes fort.13 --output mesh.nc

See `Converting ADCIRC Meshes`_ above for the full set of conversion options,
and :doc:`quick_start` for configuring the YAML input file. Small example
problems are included in the container under ``/opt/cocoa/examples``.

Building the Container
----------------------

Each family is built in two stages: first the Trilinos base image, then the
Cocoa image on top of it. The three families share one build context per image
and select the family with a per-family Dockerfile (``Dockerfile.cpu``,
``Dockerfile.cuda``, ``Dockerfile.rocm``), so the build scripts and entrypoints
are not duplicated. Substitute the family suffix throughout.

**1. Build the Trilinos base image** (example: CUDA):

.. code-block:: bash

   cd containers/base_trilinos_container
   docker build -f Dockerfile.cuda -t cocoaorg/trilinos_base:latest-cuda .

**2. Build the Cocoa image:**

.. code-block:: bash

   cd containers/cocoa_container
   DOCKER_BUILDKIT=1 docker build --ssh default \
       -f Dockerfile.cuda -t cocoaorg/cocoa:latest-cuda .

The ``--ssh default`` flag forwards your SSH agent for private repository
access during the build. Ensure your SSH agent is running with the appropriate
key loaded (``ssh-add``).

The ``-cpu`` family is multi-architecture. Build and push both platform
variants under one tag with ``buildx``:

.. code-block:: bash

   cd containers/cocoa_container
   docker buildx build --platform linux/amd64,linux/arm64 \
       -f Dockerfile.cpu -t cocoaorg/cocoa:latest-cpu --push .

On a cluster the build is driven by the SLURM batch scripts in
``containers/slurm/``. ``submit_all.sh`` queues every family, making each Cocoa
image depend on its Trilinos base; see those scripts for the exact ``buildx``
invocation and push steps.

.. note::

   Building a GPU family compiles Trilinos and Cocoa once per architecture in
   that family, which is resource-intensive and may take several hours. The
   AMD (``-rocm``) images are currently validated by compilation; runtime
   validation on AMD hardware is ongoing. The ARM64 half of the ``-cpu`` build
   runs natively on an ARM64 builder or, more slowly, under ``qemu`` emulation.