Benchmarking
Cocoa includes a Google Benchmark suite for tracking performance regressions in individual computational components. Unlike the end-to-end simulation benchmarks in Performance, these micro-benchmarks isolate specific kernels and solver stages to identify the source of any regression.
Building
Benchmarks are gated behind the cocoa_ENABLE_BENCHMARKS CMake option:
cmake .. -Dcocoa_ENABLE_BENCHMARKS=ON
make -j8 cocoa_benchmarks
This produces a single executable at bench/cocoa_benchmarks.
Running Benchmarks
# Run all benchmarks
./bench/cocoa_benchmarks
# Filter by component name (regex)
./bench/cocoa_benchmarks --benchmark_filter="GwceLumped"
./bench/cocoa_benchmarks --benchmark_filter="MomentumSolver"
./bench/cocoa_benchmarks --benchmark_filter="WetDry"
./bench/cocoa_benchmarks --benchmark_filter="Friction"
# Filter by mesh size (the Args parameter in the benchmark name)
./bench/cocoa_benchmarks --benchmark_filter=".*2000$" # ~2k nodes
./bench/cocoa_benchmarks --benchmark_filter=".*20000$" # ~20k nodes
./bench/cocoa_benchmarks --benchmark_filter=".*200000$" # ~200k nodes
./bench/cocoa_benchmarks --benchmark_filter=".*2000000$" # ~2M nodes
./bench/cocoa_benchmarks --benchmark_filter=".*20000000$" # ~20M nodes
# Combine component and mesh size filters
./bench/cocoa_benchmarks --benchmark_filter="GwceLumped.*20000$"
# Control minimum measurement time (default is 0.5s)
./bench/cocoa_benchmarks --benchmark_min_time=5s
# JSON output for regression tracking or CI
./bench/cocoa_benchmarks --benchmark_out=results.json --benchmark_out_format=json
# List available benchmarks without running them
./bench/cocoa_benchmarks --benchmark_list_tests
Available Benchmarks
Each benchmark is registered at five mesh sizes: 2000 (~2k nodes),
20000 (~20k nodes), 200000 (~200k nodes), 2000000 (~2M nodes),
and 20000000 (~20M nodes). The mesh size appears as the Args parameter
in the benchmark name.
Benchmark |
Component |
Counters |
|---|---|---|
|
GWCE RHS vector assembly (element scatter + gather) |
elements/s |
|
GWCE lumped (diagonal) matrix assembly |
elements/s |
|
GWCE lumped diagonal solve phase |
DOF/s |
|
GWCE consistent sparse matrix assembly |
elements/s |
|
CG solver (Belos) |
DOF/s |
|
CG solver (Tpetra single reduction) |
DOF/s |
|
CG solver (Tpetra pipelined) |
DOF/s |
|
Momentum element-parallel RHS scatter + normalize |
elements/s |
|
Momentum per-node: normalize, wind stress, velocity contribution |
DOF/s |
|
Momentum per-node 2x2 Cramer’s rule solve + land BC |
DOF/s |
|
Full momentum solve (element + nodal assembly + 2x2 + flux) |
DOF/s |
|
Manning bottom friction kernel |
nodes/s |
|
Full time step (friction + GWCE + momentum + wet/dry) |
DOF/s |
|
Wet/dry algorithm with partially wet domain |
elements/s |
Scaling Analysis
The bench/analyze_scaling.py script classifies benchmarks as
linear-scaling or cache/memory-bandwidth limited based on how throughput
changes across mesh sizes.
# Run benchmarks with JSON output
./bench/cocoa_benchmarks --benchmark_out=results.json --benchmark_out_format=json
# Print scaling table
python3 bench/analyze_scaling.py results.json
# Print table + save scaling plot
python3 bench/analyze_scaling.py results.json --plot scaling.png
The table shows throughput at each mesh size with a classification and efficiency ratio. The plot produces two panels: absolute throughput (log-log) and normalized scaling efficiency with threshold lines at 85% (linear) and 50% (moderate falloff).
Classification criteria (throughput at largest size / peak throughput):
linear (>= 85%): Throughput scales with problem size
moderate falloff (>= 50%): Some cache/bandwidth pressure
cache/BW limited (< 50%): Throughput degrades significantly
Benchmark Mesh Generation
Benchmarks use programmatically generated rectangular channel meshes, requiring
no file I/O or external dependencies. The BenchMeshFactory in
bench/BenchMeshFactory.hpp creates triangulated rectangular grids by
splitting each quad cell into two triangles with alternating diagonals.
Grid dimensions are computed from the target node count with approximately 2:1
aspect ratio. The number of nodes is (nx+1) * (ny+1) and the number of
elements is nx * ny * 2.
Mesh properties:
Coordinates: Gulf of Mexico region (~-89, 29) with Mercator projection
Bathymetry: 10m uniform flat bottom (the wet/dry benchmark overrides this with a sloped plane to create a partially wet domain)
Boundaries: All four edges are land boundaries (no open or flow boundaries)
Caching: Meshes are built once per size and cached for the duration of the process, so all benchmarks sharing the same mesh size reuse the same data
Regression Tracking
Use JSON output to compare results across commits:
# Before changes
./bench/cocoa_benchmarks --benchmark_out=before.json --benchmark_out_format=json
# After changes
./bench/cocoa_benchmarks --benchmark_out=after.json --benchmark_out_format=json
# Compare (requires google-benchmark's compare.py tool)
python3 <benchmark-src>/tools/compare.py benchmarks before.json after.json
Adding New Benchmarks
To add a new benchmark:
Create a new
.cppfile inbench/(e.g.,BenchMyKernel.cpp)Include
BenchFixtures.hppfor the base fixtureDefine a fixture class inheriting from
CocoaBenchUse
BENCHMARK_DEFINE_FandBENCHMARK_REGISTER_FmacrosAdd the file to
bench/CMakeLists.txt
Example:
#include <benchmark/benchmark.h>
#include "BenchFixtures.hpp"
namespace Cocoa::Bench {
class MyBench : public CocoaBench {};
BENCHMARK_DEFINE_F(MyBench, MyKernel)(benchmark::State& state) {
for (auto _ : state) {
// Call the kernel being benchmarked
my_kernel(fields(), config());
}
set_element_counters(state);
}
BENCHMARK_REGISTER_F(MyBench, MyKernel)
->Apply(apply_mesh_sizes)
->Unit(benchmark::kMillisecond);
} // namespace Cocoa::Bench
If your benchmark needs custom setup (e.g., specialized solver configuration or
modified bathymetry), override SetUp in your fixture class. Remember to add
the using declarations to avoid hiding the base class overloads:
class MyBench : public CocoaBench {
public:
using CocoaBench::SetUp;
using CocoaBench::TearDown;
void SetUp(const benchmark::State& state) override {
CocoaBench::SetUp(state);
// Custom setup here
}
};
Source Organization
bench/
+-- CMakeLists.txt Build rules
+-- BenchMain.cpp Custom main (Tpetra + Google Benchmark init)
+-- BenchMeshFactory.hpp Programmatic mesh generation
+-- BenchFixtures.hpp Shared base fixture (mesh/fields/config pipeline)
+-- BenchGwceLhsAssembly.cpp GWCE RHS/matrix assembly and lumped solve
+-- BenchGwceSolver.cpp CG linear solver variants
+-- BenchMomentumSolver.cpp Momentum solver phases (element, nodal, 2x2, full)
+-- BenchWetDry.cpp Wet/dry algorithm
+-- BenchPhysics.cpp Friction kernels
+-- BenchTimeStep.cpp Full time step
+-- analyze_scaling.py Scaling analysis and plotting script