Trilinos Integration

Cocoa uses Trilinos for distributed linear algebra and iterative solvers. This document describes the specific type aliases and usage patterns.

Trilinos Type Aliases

Cocoa defines Trilinos types in core/types/LinearAlgebraTypes.hpp to ensure consistency with the Kokkos execution space:

LinearAlgebraTypes

namespace Cocoa::Types {

struct LinearAlgebraTypes {
  // Node type wraps Kokkos execution space for Tpetra
  using ExecutionNode =
      Tpetra::KokkosCompat::KokkosDeviceWrapperNode<CocoaExecutionSpace>;

  // Core Tpetra types parameterized by ordinal types and node
  using Map = Tpetra::Map<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
  using Vector = Tpetra::Vector<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
  using MultiVector = Tpetra::MultiVector<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
  using Import = Tpetra::Import<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
  using Export = Tpetra::Export<LocalOrdinal, GlobalOrdinal, ExecutionNode>;

  // Finite element matrix types
  using FECrsGraph = Tpetra::FECrsGraph<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
  using FECrsMatrix = Tpetra::FECrsMatrix<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;

  // Operator interface for solvers
  using Operator = Tpetra::Operator<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
};

}

SolverTypes

namespace Cocoa::Types {

struct SolverTypes {
  // Belos template parameters
  using SC = Scalar;
  using MV = LinearAlgebraTypes::MultiVector;
  using OP = LinearAlgebraTypes::Operator;

  // Ifpack2 preconditioner
  using PreconditionerType = Ifpack2::Preconditioner<
      LinearAlgebraTypes::FECrsMatrix::scalar_type,
      LinearAlgebraTypes::FECrsMatrix::local_ordinal_type,
      LinearAlgebraTypes::FECrsMatrix::global_ordinal_type,
      LinearAlgebraTypes::FECrsMatrix::node_type>;

  // Belos solver types
  using LinearProblem = Belos::LinearProblem<SC, MV, OP>;
  using SolverManager = Belos::SolverManager<SC, MV, OP>;
};

}

GWCE Matrix Assembly

Cocoa uses Tpetra::FECrsMatrix for finite element assembly, which provides efficient element-wise accumulation.

Graph Construction

The sparsity pattern is built from mesh connectivity:

// Create map (one-to-one in serial)
auto map = Teuchos::rcp(new LinearAlgebraTypes::Map(
    num_nodes, 0, comm));

// Create FE graph with estimated entries per row
auto graph = Teuchos::rcp(new LinearAlgebraTypes::FECrsGraph(
    map, max_entries_per_row));

// Insert connectivity for each element
for (int elem = 0; elem < num_elements; ++elem) {
  auto nodes = get_element_nodes(elem);
  for (int i = 0; i < 3; ++i) {
    graph->insertGlobalIndices(nodes[i], nodes);
  }
}
graph->fillComplete();

Matrix Assembly Pattern

From GwceMatrixAssemblyConsistentKernels.hpp:

// Begin assembly phase
matrix->beginAssembly();

// Get local matrix for device access
auto local_matrix = matrix->getLocalMatrixDevice();

Kokkos::parallel_for("GWCE Matrix Assembly",
    Types::DefaultRangePolicy(0, num_elements),
    KOKKOS_LAMBDA(const int elem_idx) {
      // Compute elemental matrix...
      const auto vals = compute_stiffness_values(/* ... */);

      // Insert using sorted column indices
      local_matrix.sumIntoValues(
          row_node,
          sorted_col_nodes.data(),
          3,
          vals.data(),
          true,   // sorted columns
          true);  // atomic
    });

// End assembly phase
matrix->endAssembly();

Boundary Condition Application

Dirichlet BCs are applied by modifying matrix rows:

// For each BC node
Kokkos::parallel_for("Zero Dirichlet Rows",
    Policy(0, num_bc_nodes),
    KOKKOS_LAMBDA(int bc_idx) {
      auto row_view = local_matrix.row(bc_node);
      for (int j = 0; j < row_view.length; ++j) {
        if (row_view.colidx(j) == bc_node) {
          row_view.value(j) = elevation_penalty;  // Diagonal
        } else {
          row_view.value(j) = 0.0;  // Off-diagonal
        }
      }
    });

Linear Solver Configuration

Cocoa solves the GWCE system with a Jacobi-preconditioned Conjugate Gradient solver, configured in ConjugateGradientSolver.hpp. The default implementation is the Tpetra-native pipelined CG (CgSolverType::TpetraCgPipeline). Two other implementations are selectable via CgSolverType: TpetraSingleReduce (Tpetra-native single-reduce CG) and BelosCG (standard Belos CG).

Solver Configuration

The SolverConfig struct controls solver behavior:

struct SolverConfig {
  CgSolverType solver_type{CgSolverType::TpetraCgPipeline};  // CG implementation
  double tolerance{1e-5};            // Convergence tolerance
  int max_iterations{50};            // Maximum CG iterations
  int verbosity{0};                  // 0=silent, 1=summary, 2=detailed
  ResidualScaling residual_scaling{ResidualScaling::NormOfInitialResidual};  // Convergence scaling
};

Residual Scaling Options

The ResidualScaling enum controls how residual norms are scaled for convergence testing:

enum class ResidualScaling {
  NormOfInitialResidual,  // ||r|| / ||r_0|| < tol (default)
  NormOfPrecInitRes,      // ||M^{-1}r|| / ||M^{-1}r_0|| < tol
  None                    // ||r|| < tol (absolute)
};

NormOfInitialResidual is the Cocoa default, providing iteration counts comparable to ADCIRC’s ITPACK JCG solver at similar tolerance values.

Solver Setup

// Create linear problem
auto problem = Teuchos::rcp(new SolverTypes::LinearProblem(
    matrix, solution, rhs));

// Configure solver parameters
auto params = Teuchos::rcp(new Teuchos::ParameterList());
params->set("Maximum Iterations", config.max_iterations);  // Default: 50
params->set("Convergence Tolerance", config.tolerance);    // Default: 1e-5
params->set("Implicit Residual Scaling",
            "Norm of Initial Residual");                   // Default scaling

// Verbosity: base level always includes Errors and Warnings
int verbosity = Belos::Errors | Belos::Warnings;
if (config.verbosity >= 1) {
  verbosity |= Belos::FinalSummary;
}
if (config.verbosity >= 2) {
  verbosity |= Belos::IterationDetails;
  params->set("Output Frequency", 1);
}
params->set("Verbosity", verbosity);

// Create CG solver
Belos::SolverFactory<SC, MV, OP> factory;
auto solver = factory.create("CG", params);

Jacobi Preconditioner

From JacobiPreconditioner.hpp:

// Create Ifpack2 preconditioner via factory instance
Ifpack2::Factory factory;
auto prec = factory.create<MatrixType>("RELAXATION", matrix);

// Configure as Jacobi
Teuchos::ParameterList prec_params;
prec_params.set("relaxation: type", "Jacobi");
prec_params.set("relaxation: sweeps", 1);
prec_params.set("relaxation: damping factor", 1.0);
prec->setParameters(prec_params);

// Initialize sparsity pattern
prec->initialize();

// Compute values (must be called when matrix changes)
prec->compute();

// Set on linear problem
problem->setRightPrec(prec);

Solving

// Set up the problem
problem->setProblem();

// Solve
auto result = solver->solve();

if (result != Belos::Converged) {
  // Handle non-convergence
}

Vector Operations

Tpetra vectors integrate with Kokkos views for device-side access:

// Create vector
auto vec = Teuchos::rcp(new LinearAlgebraTypes::Vector(map));

// Get device view for parallel operations
auto view = vec->getLocalViewDevice(Tpetra::Access::ReadWrite);

// Use in Kokkos kernel
Kokkos::parallel_for("Set values",
    Types::DefaultRangePolicy(0, num_nodes),
    KOKKOS_LAMBDA(int i) {
      view(i, 0) = initial_value(i);
    });

Built-in Vector Operations

Tpetra provides efficient built-in operations that should be preferred over custom Kokkos kernels when applicable. These handle MPI communication automatically and are optimized for the target architecture.

Dot Product:

// Compute sum of squares (for RMS calculations)
const auto sum_sq = vec->dot(*vec);  // = sum(vec[i]^2)

// Elevation penalty = RMS of diagonal
const auto ep = std::sqrt(sum_sq / num_nodes);

Reciprocal:

// Element-wise reciprocal: result[i] = 1/input[i]
result->reciprocal(*input);

Element-wise Multiply:

// result = scalarThis * result + scalarAB * A * B (element-wise)
result->elementWiseMultiply(scalarAB, A, B, scalarThis);

// Example: result = A * B
result->elementWiseMultiply(1.0, A, B, 0.0);

Lumped Diagonal Solve Pattern:

The lumped GWCE solver uses these built-ins for efficient diagonal solve:

// Solve: solution = rhs / diagonal
// Using two Tpetra operations instead of custom kernel:
solution->reciprocal(*diagonal);                        // solution = 1/diagonal
solution->elementWiseMultiply(1.0, *rhs, *solution, 0.0); // solution = rhs * solution

This pattern:

Eliminates custom kernel boilerplate
Handles edge cases (e.g., division safety) consistently
Enables future MPI parallelism without code changes

Norms:

// L2 norm
const auto l2_norm = vec->norm2();

// L1 norm (sum of absolute values)
const auto l1_norm = vec->norm1();

// Infinity norm (max absolute value)
const auto inf_norm = vec->normInf();

Elevation Penalty Computation

The elevation penalty (EP) maintains numerical conditioning for BC rows:

Types::Scalar compute_elevation_penalty(
    const Teuchos::RCP<FECrsMatrix>& matrix,
    int num_nodes) {
  auto local_matrix = matrix->getLocalMatrixDevice();

  Types::Scalar sum_sq = 0.0;
  Kokkos::parallel_reduce("Compute diagonal RMS",
      Types::DefaultRangePolicy(0, num_nodes),
      KOKKOS_LAMBDA(int row, Types::Scalar& local_sum) {
        auto row_view = local_matrix.row(row);
        for (int j = 0; j < row_view.length; ++j) {
          if (row_view.colidx(j) == row) {
            local_sum += row_view.value(j) * row_view.value(j);
            break;
          }
        }
      },
      sum_sq);

  return std::sqrt(sum_sq / num_nodes);
}

Implementation Files

core/types/LinearAlgebraTypes.hpp: Type aliases
numeric/continuity/GwceSolver.hpp: Main GWCE solver class
numeric/continuity/consistent/ConjugateGradientSolver.hpp: Belos CG wrapper
numeric/continuity/consistent/JacobiPreconditioner.hpp: Ifpack2 Jacobi wrapper
numeric/continuity/consistent/GwceMatrixAssemblyConsistentKernels.hpp: Matrix assembly kernels