Trilinos Integration
Cocoa uses Trilinos for distributed linear algebra and iterative solvers. This document describes the specific type aliases and usage patterns.
Trilinos Type Aliases
Cocoa defines Trilinos types in core/types/LinearAlgebraTypes.hpp to ensure
consistency with the Kokkos execution space:
LinearAlgebraTypes
namespace Cocoa::Types {
struct LinearAlgebraTypes {
// Node type wraps Kokkos execution space for Tpetra
using ExecutionNode =
Tpetra::KokkosCompat::KokkosDeviceWrapperNode<CocoaExecutionSpace>;
// Core Tpetra types parameterized by ordinal types and node
using Map = Tpetra::Map<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
using Vector = Tpetra::Vector<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
using MultiVector = Tpetra::MultiVector<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
using Import = Tpetra::Import<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
using Export = Tpetra::Export<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
// Finite element matrix types
using FECrsGraph = Tpetra::FECrsGraph<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
using FECrsMatrix = Tpetra::FECrsMatrix<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
// Operator interface for solvers
using Operator = Tpetra::Operator<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
};
}
SolverTypes
namespace Cocoa::Types {
struct SolverTypes {
// Belos template parameters
using SC = Scalar;
using MV = LinearAlgebraTypes::MultiVector;
using OP = LinearAlgebraTypes::Operator;
// Ifpack2 preconditioner
using PreconditionerType = Ifpack2::Preconditioner<
LinearAlgebraTypes::FECrsMatrix::scalar_type,
LinearAlgebraTypes::FECrsMatrix::local_ordinal_type,
LinearAlgebraTypes::FECrsMatrix::global_ordinal_type,
LinearAlgebraTypes::FECrsMatrix::node_type>;
// Belos solver types
using LinearProblem = Belos::LinearProblem<SC, MV, OP>;
using SolverManager = Belos::SolverManager<SC, MV, OP>;
};
}
GWCE Matrix Assembly
Cocoa uses Tpetra::FECrsMatrix for finite element assembly, which provides
efficient element-wise accumulation.
Graph Construction
The sparsity pattern is built from mesh connectivity:
// Create map (one-to-one in serial)
auto map = Teuchos::rcp(new LinearAlgebraTypes::Map(
num_nodes, 0, comm));
// Create FE graph with estimated entries per row
auto graph = Teuchos::rcp(new LinearAlgebraTypes::FECrsGraph(
map, max_entries_per_row));
// Insert connectivity for each element
for (int elem = 0; elem < num_elements; ++elem) {
auto nodes = get_element_nodes(elem);
for (int i = 0; i < 3; ++i) {
graph->insertGlobalIndices(nodes[i], nodes);
}
}
graph->fillComplete();
Matrix Assembly Pattern
From GwceMatrixAssemblyConsistentKernels.hpp:
// Begin assembly phase
matrix->beginAssembly();
// Get local matrix for device access
auto local_matrix = matrix->getLocalMatrixDevice();
Kokkos::parallel_for("GWCE Matrix Assembly",
Types::DefaultRangePolicy(0, num_elements),
KOKKOS_LAMBDA(const int elem_idx) {
// Compute elemental matrix...
const auto vals = compute_stiffness_values(/* ... */);
// Insert using sorted column indices
local_matrix.sumIntoValues(
row_node,
sorted_col_nodes.data(),
3,
vals.data(),
true, // sorted columns
true); // atomic
});
// End assembly phase
matrix->endAssembly();
Boundary Condition Application
Dirichlet BCs are applied by modifying matrix rows:
// For each BC node
Kokkos::parallel_for("Zero Dirichlet Rows",
Policy(0, num_bc_nodes),
KOKKOS_LAMBDA(int bc_idx) {
auto row_view = local_matrix.row(bc_node);
for (int j = 0; j < row_view.length; ++j) {
if (row_view.colidx(j) == bc_node) {
row_view.value(j) = elevation_penalty; // Diagonal
} else {
row_view.value(j) = 0.0; // Off-diagonal
}
}
});
Linear Solver Configuration
Cocoa solves the GWCE system with a Jacobi-preconditioned Conjugate Gradient
solver, configured in ConjugateGradientSolver.hpp. The default
implementation is the Tpetra-native pipelined CG (CgSolverType::TpetraCgPipeline).
Two other implementations are selectable via CgSolverType:
TpetraSingleReduce (Tpetra-native single-reduce CG) and BelosCG
(standard Belos CG).
Solver Configuration
The SolverConfig struct controls solver behavior:
struct SolverConfig {
CgSolverType solver_type{CgSolverType::TpetraCgPipeline}; // CG implementation
double tolerance{1e-5}; // Convergence tolerance
int max_iterations{50}; // Maximum CG iterations
int verbosity{0}; // 0=silent, 1=summary, 2=detailed
ResidualScaling residual_scaling{ResidualScaling::NormOfInitialResidual}; // Convergence scaling
};
Residual Scaling Options
The ResidualScaling enum controls how residual norms are scaled for
convergence testing:
enum class ResidualScaling {
NormOfInitialResidual, // ||r|| / ||r_0|| < tol (default)
NormOfPrecInitRes, // ||M^{-1}r|| / ||M^{-1}r_0|| < tol
None // ||r|| < tol (absolute)
};
NormOfInitialResidual is the Cocoa default, providing iteration counts comparable to ADCIRC’s ITPACK JCG solver at similar tolerance values.
Solver Setup
// Create linear problem
auto problem = Teuchos::rcp(new SolverTypes::LinearProblem(
matrix, solution, rhs));
// Configure solver parameters
auto params = Teuchos::rcp(new Teuchos::ParameterList());
params->set("Maximum Iterations", config.max_iterations); // Default: 50
params->set("Convergence Tolerance", config.tolerance); // Default: 1e-5
params->set("Implicit Residual Scaling",
"Norm of Initial Residual"); // Default scaling
// Verbosity: base level always includes Errors and Warnings
int verbosity = Belos::Errors | Belos::Warnings;
if (config.verbosity >= 1) {
verbosity |= Belos::FinalSummary;
}
if (config.verbosity >= 2) {
verbosity |= Belos::IterationDetails;
params->set("Output Frequency", 1);
}
params->set("Verbosity", verbosity);
// Create CG solver
Belos::SolverFactory<SC, MV, OP> factory;
auto solver = factory.create("CG", params);
Jacobi Preconditioner
From JacobiPreconditioner.hpp:
// Create Ifpack2 preconditioner via factory instance
Ifpack2::Factory factory;
auto prec = factory.create<MatrixType>("RELAXATION", matrix);
// Configure as Jacobi
Teuchos::ParameterList prec_params;
prec_params.set("relaxation: type", "Jacobi");
prec_params.set("relaxation: sweeps", 1);
prec_params.set("relaxation: damping factor", 1.0);
prec->setParameters(prec_params);
// Initialize sparsity pattern
prec->initialize();
// Compute values (must be called when matrix changes)
prec->compute();
// Set on linear problem
problem->setRightPrec(prec);
Solving
// Set up the problem
problem->setProblem();
// Solve
auto result = solver->solve();
if (result != Belos::Converged) {
// Handle non-convergence
}
Vector Operations
Tpetra vectors integrate with Kokkos views for device-side access:
// Create vector
auto vec = Teuchos::rcp(new LinearAlgebraTypes::Vector(map));
// Get device view for parallel operations
auto view = vec->getLocalViewDevice(Tpetra::Access::ReadWrite);
// Use in Kokkos kernel
Kokkos::parallel_for("Set values",
Types::DefaultRangePolicy(0, num_nodes),
KOKKOS_LAMBDA(int i) {
view(i, 0) = initial_value(i);
});
Built-in Vector Operations
Tpetra provides efficient built-in operations that should be preferred over custom Kokkos kernels when applicable. These handle MPI communication automatically and are optimized for the target architecture.
Dot Product:
// Compute sum of squares (for RMS calculations)
const auto sum_sq = vec->dot(*vec); // = sum(vec[i]^2)
// Elevation penalty = RMS of diagonal
const auto ep = std::sqrt(sum_sq / num_nodes);
Reciprocal:
// Element-wise reciprocal: result[i] = 1/input[i]
result->reciprocal(*input);
Element-wise Multiply:
// result = scalarThis * result + scalarAB * A * B (element-wise)
result->elementWiseMultiply(scalarAB, A, B, scalarThis);
// Example: result = A * B
result->elementWiseMultiply(1.0, A, B, 0.0);
Lumped Diagonal Solve Pattern:
The lumped GWCE solver uses these built-ins for efficient diagonal solve:
// Solve: solution = rhs / diagonal
// Using two Tpetra operations instead of custom kernel:
solution->reciprocal(*diagonal); // solution = 1/diagonal
solution->elementWiseMultiply(1.0, *rhs, *solution, 0.0); // solution = rhs * solution
This pattern:
Eliminates custom kernel boilerplate
Handles edge cases (e.g., division safety) consistently
Enables future MPI parallelism without code changes
Norms:
// L2 norm
const auto l2_norm = vec->norm2();
// L1 norm (sum of absolute values)
const auto l1_norm = vec->norm1();
// Infinity norm (max absolute value)
const auto inf_norm = vec->normInf();
Elevation Penalty Computation
The elevation penalty (EP) maintains numerical conditioning for BC rows:
Types::Scalar compute_elevation_penalty(
const Teuchos::RCP<FECrsMatrix>& matrix,
int num_nodes) {
auto local_matrix = matrix->getLocalMatrixDevice();
Types::Scalar sum_sq = 0.0;
Kokkos::parallel_reduce("Compute diagonal RMS",
Types::DefaultRangePolicy(0, num_nodes),
KOKKOS_LAMBDA(int row, Types::Scalar& local_sum) {
auto row_view = local_matrix.row(row);
for (int j = 0; j < row_view.length; ++j) {
if (row_view.colidx(j) == row) {
local_sum += row_view.value(j) * row_view.value(j);
break;
}
}
},
sum_sq);
return std::sqrt(sum_sq / num_nodes);
}
Implementation Files
core/types/LinearAlgebraTypes.hpp: Type aliasesnumeric/continuity/GwceSolver.hpp: Main GWCE solver classnumeric/continuity/consistent/ConjugateGradientSolver.hpp: Belos CG wrappernumeric/continuity/consistent/JacobiPreconditioner.hpp: Ifpack2 Jacobi wrappernumeric/continuity/consistent/GwceMatrixAssemblyConsistentKernels.hpp: Matrix assembly kernels