Architecture
This document describes the software architecture of Cocoa.
High-Level Overview
Cocoa follows a modular architecture with clear separation of concerns, built on the Kokkos [Edwards2014] [Trott2022] performance portability framework within the Trilinos [Heroux2005] ecosystem:
Fig. 31 High-level architecture showing the simulation pipeline
Source Library Dependencies
The source libraries have a clear dependency hierarchy:
Fig. 32 Library dependencies (solid = internal, dashed = external)
Data Container Hierarchy
ModelFields is the centralized data container passed through the simulation:
Fig. 33 ModelFields data container hierarchy
Directory Structure
src/
├── cocoa/ # Main application
│ ├── cocoa.cpp # Entry point
│ └── CommandLineArgs.hpp # CLI argument parsing
│
├── cocoa_datetime/ # Date/time library
│ ├── DateTime.hpp # Date/time representation
│ └── TimeDelta.hpp # Time duration
│
├── cocoa_compute_types/ # Small math value types
│ ├── Vec2.hpp # 2D vector
│ ├── Mat2.hpp # 2x2 matrix
│ ├── Mat3.hpp # 3x3 matrix
│ └── MathUtils.hpp # Math helpers
│
├── cocoa_constants/ # Shared constants and scalar/index types
│ ├── Constants.hpp # Physical constants
│ ├── Defaults.hpp # Default parameter values
│ ├── Thresholds.hpp # Centralized numerical thresholds
│ └── Ordinals.hpp # Scalar/index types
│
├── cocoa_io/ # I/O utilities library
│ ├── ConfigurationReader.hpp # YAML config parsing
│ ├── Logger.hpp # Logging facade
│ ├── NetcdfReader.hpp # NetCDF input
│ ├── NetcdfWriter.hpp # NetCDF output
│ └── NetcdfCommon.hpp # Shared NetCDF types
│
├── cocoa_meteo/ # Meteorological I/O library
│ ├── core/ # Base classes and data types
│ │ ├── MeteoReaderBase.hpp # Abstract reader interface
│ │ ├── MeteoReaderConcept.hpp # C++20 concept for readers
│ │ ├── MeteoReaderConfig.hpp # Format-agnostic configuration
│ │ ├── MeteoFormat.hpp # Format enum
│ │ ├── MeteoGrid.hpp # Regular/irregular grid types
│ │ ├── MeteoField.hpp # 2D field container
│ │ ├── MeteoFieldSnapshot.hpp # Time snapshot bundle
│ │ └── CfTimeUtils.hpp # CF time unit parsing
│ ├── readers/ # Format-specific readers
│ │ ├── cf/CfNetcdfReader.hpp
│ │ ├── owi_ascii/OwiAsciiReader.hpp
│ │ └── owi_netcdf/OwiNetcdfReader.hpp
│ ├── interpolation/ # Spatial interpolation
│ │ ├── MeteoSpatialInterpolator.hpp
│ │ ├── MultiDomainInterpolator.hpp
│ │ ├── FieldInterpolation.hpp
│ │ └── InterpolationWeight.hpp
│ └── MeteoReaderFactory.hpp # Reader instantiation
│
└── cocoa_kernel/ # Core computational library
├── core/ # Domain-meaningful primitives
│ ├── Execution.hpp # Execution space selection
│ ├── KokkosProfileRegion.hpp # Performance profiling
│ ├── ModelConfiguration.hpp # Configuration struct
│ ├── NarrowCast.hpp # Checked numeric casts
│ ├── config/ # Per-domain config sub-structs
│ │ ├── FlowBoundaryConfig.hpp
│ │ ├── ForcingConfig.hpp
│ │ ├── GwceConfig.hpp
│ │ ├── HotStartConfig.hpp
│ │ ├── MeshConfig.hpp
│ │ ├── ModelConfigurationFactory.hpp # YAML -> config (ex-io/)
│ │ ├── OutputConfig.hpp
│ │ ├── PhysicsConfig.hpp
│ │ ├── SimulationConfig.hpp
│ │ └── TidalConfig.hpp
│ └── types/ # Generic infrastructure types
│ ├── KokkosAliases.hpp # View type aliases
│ ├── LinearAlgebraTypes.hpp # Trilinos type aliases
│ ├── Precision.hpp # Float/double storage precision
│ ├── RingBuffer.hpp # Ring buffer type
│ └── TemporalField.hpp # Multi-level temporal storage
│
├── data/ # Field data structures
│ ├── ModelFields.hpp # Master data container
│ ├── HydrodynamicState.hpp # Elevation/velocity state
│ ├── PhysicsData.hpp # Physics parameters
│ └── WetDryData.hpp # Wet/dry algorithm state
│
├── geometry/ # Mesh and geometry
│ ├── Mesh.hpp # Top-level mesh: HostMesh + DeviceMesh + dist context
│ ├── HostMesh.hpp # Host-side topology (nodes, elements, boundaries)
│ ├── DeviceMesh.hpp # Device-side FE cache (gradients, areas, CSR)
│ ├── Element.hpp # Triangle element
│ ├── Node.hpp # Mesh node
│ ├── Point.hpp # 2D/3D point
│ ├── NeighborTable.hpp # Mesh connectivity
│ ├── BasisFunctions.hpp # FE shape functions
│ ├── GlobalMeshHandler.hpp # Global mesh operations
│ ├── CoordinateRotation.hpp # Coordinate rotation
│ ├── RotationData.hpp # Rotation matrices
│ ├── VelocityTransform.hpp # Velocity transformations
│ ├── ProjectionScaleFactor.hpp # Map projection scaling
│ ├── ProjectionTransformer.hpp # Coordinate transforms
│ ├── NodalAttribute.hpp # Nodal attribute type
│ ├── NodalAttributeData.hpp # Attribute storage
│ ├── NodalAttributeRegistry.hpp # Attribute registry
│ ├── partition/ # Mesh partitioning (ex-top-level)
│ │ ├── MeshPartitioner.hpp # Zoltan2 partitioning
│ │ ├── PartitionCache.hpp # Cached partitions
│ │ └── PartitionInfo.hpp # Partition metadata
│ └── boundaries/ # Boundary data structures
│ ├── BoundaryData.hpp
│ ├── BoundaryRawData.hpp
│ ├── BoundaryType.hpp
│ └── BoundaryView.hpp
│
├── simulation/ # Simulation control
│ ├── Simulation.hpp # Main simulation driver
│ ├── TimeStepper.hpp # Time stepping logic
│ ├── TimestepLogger.hpp # Per-step status logging (ex-io/)
│ └── Diagnostics.hpp # Solution monitoring
│
├── numeric/ # Numerical algorithms
│ ├── continuity/ # GWCE solver
│ │ ├── GwceSolver.hpp # Solver interface
│ │ ├── GwceVectorAssembler.hpp # RHS assembly
│ │ ├── GwceVectorAssemblyKernels.hpp
│ │ ├── GwcePreprocessingKernels.hpp
│ │ ├── GwceCommonKernels.hpp # Shared GWCE kernels
│ │ ├── OpenBoundaryCoefficients.hpp
│ │ ├── NodePositionSorter.hpp
│ │ │
│ │ ├── consistent/ # Implicit solver
│ │ │ ├── GwceSolverConsistent.hpp
│ │ │ ├── GwceMatrixAssemblerConsistent.hpp
│ │ │ ├── GwceMatrixAssemblyConsistentKernels.hpp
│ │ │ ├── ConjugateGradientSolver.hpp
│ │ │ └── JacobiPreconditioner.hpp
│ │ │
│ │ └── lumped/ # Explicit solver
│ │ ├── GwceSolverLumped.hpp
│ │ ├── GwceMatrixAssemblerLumped.hpp
│ │ ├── GwceMatrixAssemblyLumpedKernels.hpp
│ │ └── GwceSolverLumpedKernels.hpp
│ │
│ ├── momentum/ # Momentum solver
│ │ ├── MomentumSolver.hpp
│ │ ├── MomentumSolveKernels.hpp
│ │ └── MomentumRhsKernels.hpp
│ │
│ └── wetdry/ # Wet/dry algorithm
│ ├── WetDry.hpp
│ └── WetDryKernels.hpp
│
├── forcing/ # Boundary and body forcing
│ ├── ForcingManager.hpp # Forcing orchestration
│ ├── RampFunction.hpp # Time ramping
│ ├── meteorological/ # Meteorological forcing
│ │ ├── MeteoForcingProvider.hpp # Ring buffer + interpolation
│ │ ├── MeteoForcingConfig.hpp # Configuration
│ │ ├── MetSnapshot.hpp # Device-side snapshot slot
│ │ └── DragLaw.hpp # Wind drag formulations
│ └── tide/ # Tidal forcing
│ ├── boundary/ # Tidal boundary conditions
│ │ ├── TideBoundaryForcing.hpp
│ │ └── TideBoundaryConstituent.hpp
│ └── potential/ # Tide potential forcing
│ ├── TidePotentialInterface.hpp
│ ├── astronomical/ # Astronomical tide potential
│ │ ├── AstronomicalTidePotential.hpp
│ │ ├── AstronomicalTidePotentialAdapter.hpp
│ │ ├── AstronomicParameters.hpp
│ │ ├── AstronomicConstants.hpp
│ │ ├── MoonPosition.hpp
│ │ ├── SunPosition.hpp
│ │ ├── MoonSunPositionCalculator.hpp
│ │ └── SiderealTime.hpp
│ └── harmonics/ # Harmonic tide potential
│ ├── TidePotentialHarmonics.hpp
│ └── TidePotentialConstituent.hpp
│
├── physics/ # Physics kernels
│ ├── BottomFriction.hpp # Manning's friction
│ ├── LateralStress.hpp # Lateral stress tensor
│ ├── InlineLateralStress.hpp # Inline stress computation
│ ├── PressureGradient.hpp # Pressure gradient terms
│ ├── NonConservativeAdvection.hpp # Advection terms
│ └── Smagorinsky.hpp # Smagorinsky turbulence
│
├── distributed/ # MPI distributed computing
│ ├── DistributedContext.hpp # MPI context management
│ ├── DistributedConfig.hpp # Configuration
│ ├── CommunicationManager.hpp # Communication orchestration
│ ├── GhostExchange.hpp # Ghost node exchange
│ └── MapFactory.hpp # Tpetra map creation
│
└── io/ # Kernel I/O (split by direction)
├── async/ # Background writer thread
│ ├── OutputWriterThread.hpp # Bounded writer thread pool
│ └── OutputSnapshot.hpp # Recycled write buffers
├── checkpoint/ # Restart/hot-start
│ ├── HotStartReader.hpp
│ └── HotStartWriter.hpp
├── input/ # Read-side
│ ├── MeshReader.hpp
│ ├── MeshGather.hpp
│ ├── NodalAttributeReader.hpp
│ ├── NodalAttributeInitializer.hpp
│ └── SalDataReader.hpp
└── output/ # Write-side
├── OutputManager.hpp
├── OutputFile.hpp
├── OutputVariable.hpp
├── OutputVariableRegistry.hpp
├── NetcdfFileSetup.hpp # Shared UGRID file setup
├── DistributedIO.hpp # Parallel field gather (ex-distributed/)
└── DistributedPartitionWriter.hpp # Partition output (ex-distributed/)
Namespace Organization
Fig. 34 Namespace hierarchy within the Cocoa project
Meteorological Forcing Pipeline
The meteorological forcing subsystem moves atmospheric data from files on
disk through a staged pipeline into device-resident Kokkos views consumed by
the GWCE and momentum kernels. The cocoa_meteo library handles file
reading and spatial interpolation on the host; the cocoa_kernel forcing
subsystem manages temporal buffering and device transfer.
Fig. 35 Meteorological forcing data pipeline
Pipeline Stages
File Read (host). A format-specific reader (
CfNetcdfReader,OwiAsciiReader, orOwiNetcdfReader) reads one snapshot from disk and returns aMeteoFieldSnapshotcontaining the pressure, wind_u, and wind_v fields on the meteorological grid.Spatial Interpolation (host). The
MultiDomainInterpolatorinterpolates from the meteorological grid(s) to the local mesh nodes using precomputed bilinear weights. For multi-domain formats, the innermost valid domain takes priority. The result is three host-side arrays (pressure, wind_u, wind_v) of lengthnum_local_nodes.Host-to-Device Transfer. The interpolated node arrays are uploaded into a device-resident
MetSnapshotslot viaKokkos::deep_copy. Each slot holds threeScalar1DViewKokkos views.Temporal Interpolation (device). At each simulation timestep, the GWCE and momentum kernels read two adjacent snapshots from the ring buffer and blend them using the interpolation weight \(\alpha\) (see Temporal Interpolation).
Ring Buffer
The MeteoForcingProvider maintains a fixed-size circular buffer of 8
MetSnapshot slots on the device. Each slot holds three Kokkos views
(pressure, wind_u, wind_v) with one value per local mesh node. The slots are
allocated once at initialization and reused throughout the simulation.
At any given simulation time, two adjacent slots form the active bracket: the snapshot immediately before and after the current time. The temporal interpolation weight \(\alpha\) selects the blend between them. The remaining slots are pre-filled with upcoming snapshots so that advancing the bracket never requires a synchronous file read followed by a device transfer.
When the simulation time passes the end of the current bracket:
The oldest slot is recycled (the ring buffer pops its front).
The next unread snapshot is read from disk, spatially interpolated, and uploaded into the newly available slot.
The bracket indices advance by one.
This design provides two benefits:
Amortized transfer cost. Because upcoming snapshots are pre-loaded into free slots, the GPU is never stalled waiting for a single
deep_copyat bracket boundaries.No redundant work. Once a snapshot is spatially interpolated and uploaded, it remains device-resident until the buffer rotates past it.
Meteorological Input Distribution
Rank 0 reads the meteorological files and broadcasts the spatially
interpolated node arrays to all compute ranks via MPI_Bcast, pre-reading
the next snapshot on a background task so file access overlaps computation.
Each compute rank maintains its own ring buffer of device-side data for its
local mesh partition.