====================
Trilinos Integration
====================

Cocoa uses Trilinos for distributed linear algebra and iterative solvers.
This document describes the specific type aliases and usage patterns.

Trilinos Type Aliases
---------------------

Cocoa defines Trilinos types in ``core/types/LinearAlgebraTypes.hpp`` to ensure
consistency with the Kokkos execution space:

LinearAlgebraTypes
^^^^^^^^^^^^^^^^^^

.. code-block:: cpp

   namespace Cocoa::Types {

   struct LinearAlgebraTypes {
     // Node type wraps Kokkos execution space for Tpetra
     using ExecutionNode =
         Tpetra::KokkosCompat::KokkosDeviceWrapperNode<CocoaExecutionSpace>;

     // Core Tpetra types parameterized by ordinal types and node
     using Map = Tpetra::Map<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
     using Vector = Tpetra::Vector<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
     using MultiVector = Tpetra::MultiVector<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
     using Import = Tpetra::Import<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
     using Export = Tpetra::Export<LocalOrdinal, GlobalOrdinal, ExecutionNode>;

     // Finite element matrix types
     using FECrsGraph = Tpetra::FECrsGraph<LocalOrdinal, GlobalOrdinal, ExecutionNode>;
     using FECrsMatrix = Tpetra::FECrsMatrix<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;

     // Operator interface for solvers
     using Operator = Tpetra::Operator<Scalar, LocalOrdinal, GlobalOrdinal, ExecutionNode>;
   };

   }

SolverTypes
^^^^^^^^^^^

.. code-block:: cpp

   namespace Cocoa::Types {

   struct SolverTypes {
     // Belos template parameters
     using SC = Scalar;
     using MV = LinearAlgebraTypes::MultiVector;
     using OP = LinearAlgebraTypes::Operator;

     // Ifpack2 preconditioner
     using PreconditionerType = Ifpack2::Preconditioner<
         LinearAlgebraTypes::FECrsMatrix::scalar_type,
         LinearAlgebraTypes::FECrsMatrix::local_ordinal_type,
         LinearAlgebraTypes::FECrsMatrix::global_ordinal_type,
         LinearAlgebraTypes::FECrsMatrix::node_type>;

     // Belos solver types
     using LinearProblem = Belos::LinearProblem<SC, MV, OP>;
     using SolverManager = Belos::SolverManager<SC, MV, OP>;
   };

   }

GWCE Matrix Assembly
--------------------

Cocoa uses ``Tpetra::FECrsMatrix`` for finite element assembly, which provides
efficient element-wise accumulation.

Graph Construction
^^^^^^^^^^^^^^^^^^

The sparsity pattern is built from mesh connectivity:

.. code-block:: cpp

   // Create map (one-to-one in serial)
   auto map = Teuchos::rcp(new LinearAlgebraTypes::Map(
       num_nodes, 0, comm));

   // Create FE graph with estimated entries per row
   auto graph = Teuchos::rcp(new LinearAlgebraTypes::FECrsGraph(
       map, max_entries_per_row));

   // Insert connectivity for each element
   for (int elem = 0; elem < num_elements; ++elem) {
     auto nodes = get_element_nodes(elem);
     for (int i = 0; i < 3; ++i) {
       graph->insertGlobalIndices(nodes[i], nodes);
     }
   }
   graph->fillComplete();

Matrix Assembly Pattern
^^^^^^^^^^^^^^^^^^^^^^^

From ``GwceMatrixAssemblyConsistentKernels.hpp``:

.. code-block:: cpp

   // Begin assembly phase
   matrix->beginAssembly();

   // Get local matrix for device access
   auto local_matrix = matrix->getLocalMatrixDevice();

   Kokkos::parallel_for("GWCE Matrix Assembly",
       Types::DefaultRangePolicy(0, num_elements),
       KOKKOS_LAMBDA(const int elem_idx) {
         // Compute elemental matrix...
         const auto vals = compute_stiffness_values(/* ... */);

         // Insert using sorted column indices
         local_matrix.sumIntoValues(
             row_node,
             sorted_col_nodes.data(),
             3,
             vals.data(),
             true,   // sorted columns
             true);  // atomic
       });

   // End assembly phase
   matrix->endAssembly();

Boundary Condition Application
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Dirichlet BCs are applied by modifying matrix rows:

.. code-block:: cpp

   // For each BC node
   Kokkos::parallel_for("Zero Dirichlet Rows",
       Policy(0, num_bc_nodes),
       KOKKOS_LAMBDA(int bc_idx) {
         auto row_view = local_matrix.row(bc_node);
         for (int j = 0; j < row_view.length; ++j) {
           if (row_view.colidx(j) == bc_node) {
             row_view.value(j) = elevation_penalty;  // Diagonal
           } else {
             row_view.value(j) = 0.0;  // Off-diagonal
           }
         }
       });

Linear Solver Configuration
---------------------------

Cocoa solves the GWCE system with a Jacobi-preconditioned Conjugate Gradient
solver, configured in ``ConjugateGradientSolver.hpp``. The default
implementation is the Tpetra-native pipelined CG (``CgSolverType::TpetraCgPipeline``).
Two other implementations are selectable via ``CgSolverType``:
``TpetraSingleReduce`` (Tpetra-native single-reduce CG) and ``BelosCG``
(standard Belos CG).

Solver Configuration
^^^^^^^^^^^^^^^^^^^^

The ``SolverConfig`` struct controls solver behavior:

.. code-block:: cpp

   struct SolverConfig {
     CgSolverType solver_type{CgSolverType::TpetraCgPipeline};  // CG implementation
     double tolerance{1e-5};            // Convergence tolerance
     int max_iterations{50};            // Maximum CG iterations
     int verbosity{0};                  // 0=silent, 1=summary, 2=detailed
     ResidualScaling residual_scaling{ResidualScaling::NormOfInitialResidual};  // Convergence scaling
   };

Residual Scaling Options
^^^^^^^^^^^^^^^^^^^^^^^^

The ``ResidualScaling`` enum controls how residual norms are scaled for
convergence testing:

.. code-block:: cpp

   enum class ResidualScaling {
     NormOfInitialResidual,  // ||r|| / ||r_0|| < tol (default)
     NormOfPrecInitRes,      // ||M^{-1}r|| / ||M^{-1}r_0|| < tol
     None                    // ||r|| < tol (absolute)
   };

**NormOfInitialResidual** is the Cocoa default, providing iteration counts
comparable to ADCIRC's ITPACK JCG solver at similar tolerance values.

Solver Setup
^^^^^^^^^^^^

.. code-block:: cpp

   // Create linear problem
   auto problem = Teuchos::rcp(new SolverTypes::LinearProblem(
       matrix, solution, rhs));

   // Configure solver parameters
   auto params = Teuchos::rcp(new Teuchos::ParameterList());
   params->set("Maximum Iterations", config.max_iterations);  // Default: 50
   params->set("Convergence Tolerance", config.tolerance);    // Default: 1e-5
   params->set("Implicit Residual Scaling",
               "Norm of Initial Residual");                   // Default scaling

   // Verbosity: base level always includes Errors and Warnings
   int verbosity = Belos::Errors | Belos::Warnings;
   if (config.verbosity >= 1) {
     verbosity |= Belos::FinalSummary;
   }
   if (config.verbosity >= 2) {
     verbosity |= Belos::IterationDetails;
     params->set("Output Frequency", 1);
   }
   params->set("Verbosity", verbosity);

   // Create CG solver
   Belos::SolverFactory<SC, MV, OP> factory;
   auto solver = factory.create("CG", params);

Jacobi Preconditioner
^^^^^^^^^^^^^^^^^^^^^

From ``JacobiPreconditioner.hpp``:

.. code-block:: cpp

   // Create Ifpack2 preconditioner via factory instance
   Ifpack2::Factory factory;
   auto prec = factory.create<MatrixType>("RELAXATION", matrix);

   // Configure as Jacobi
   Teuchos::ParameterList prec_params;
   prec_params.set("relaxation: type", "Jacobi");
   prec_params.set("relaxation: sweeps", 1);
   prec_params.set("relaxation: damping factor", 1.0);
   prec->setParameters(prec_params);

   // Initialize sparsity pattern
   prec->initialize();

   // Compute values (must be called when matrix changes)
   prec->compute();

   // Set on linear problem
   problem->setRightPrec(prec);

Solving
^^^^^^^

.. code-block:: cpp

   // Set up the problem
   problem->setProblem();

   // Solve
   auto result = solver->solve();

   if (result != Belos::Converged) {
     // Handle non-convergence
   }

Vector Operations
-----------------

Tpetra vectors integrate with Kokkos views for device-side access:

.. code-block:: cpp

   // Create vector
   auto vec = Teuchos::rcp(new LinearAlgebraTypes::Vector(map));

   // Get device view for parallel operations
   auto view = vec->getLocalViewDevice(Tpetra::Access::ReadWrite);

   // Use in Kokkos kernel
   Kokkos::parallel_for("Set values",
       Types::DefaultRangePolicy(0, num_nodes),
       KOKKOS_LAMBDA(int i) {
         view(i, 0) = initial_value(i);
       });

Built-in Vector Operations
^^^^^^^^^^^^^^^^^^^^^^^^^^

Tpetra provides efficient built-in operations that should be preferred over
custom Kokkos kernels when applicable. These handle MPI communication
automatically and are optimized for the target architecture.

**Dot Product:**

.. code-block:: cpp

   // Compute sum of squares (for RMS calculations)
   const auto sum_sq = vec->dot(*vec);  // = sum(vec[i]^2)

   // Elevation penalty = RMS of diagonal
   const auto ep = std::sqrt(sum_sq / num_nodes);

**Reciprocal:**

.. code-block:: cpp

   // Element-wise reciprocal: result[i] = 1/input[i]
   result->reciprocal(*input);

**Element-wise Multiply:**

.. code-block:: cpp

   // result = scalarThis * result + scalarAB * A * B (element-wise)
   result->elementWiseMultiply(scalarAB, A, B, scalarThis);

   // Example: result = A * B
   result->elementWiseMultiply(1.0, A, B, 0.0);

**Lumped Diagonal Solve Pattern:**

The lumped GWCE solver uses these built-ins for efficient diagonal solve:

.. code-block:: cpp

   // Solve: solution = rhs / diagonal
   // Using two Tpetra operations instead of custom kernel:
   solution->reciprocal(*diagonal);                        // solution = 1/diagonal
   solution->elementWiseMultiply(1.0, *rhs, *solution, 0.0); // solution = rhs * solution

This pattern:

- Eliminates custom kernel boilerplate
- Handles edge cases (e.g., division safety) consistently
- Enables future MPI parallelism without code changes

**Norms:**

.. code-block:: cpp

   // L2 norm
   const auto l2_norm = vec->norm2();

   // L1 norm (sum of absolute values)
   const auto l1_norm = vec->norm1();

   // Infinity norm (max absolute value)
   const auto inf_norm = vec->normInf();

Elevation Penalty Computation
-----------------------------

The elevation penalty (EP) maintains numerical conditioning for BC rows:

.. code-block:: cpp

   Types::Scalar compute_elevation_penalty(
       const Teuchos::RCP<FECrsMatrix>& matrix,
       int num_nodes) {
     auto local_matrix = matrix->getLocalMatrixDevice();

     Types::Scalar sum_sq = 0.0;
     Kokkos::parallel_reduce("Compute diagonal RMS",
         Types::DefaultRangePolicy(0, num_nodes),
         KOKKOS_LAMBDA(int row, Types::Scalar& local_sum) {
           auto row_view = local_matrix.row(row);
           for (int j = 0; j < row_view.length; ++j) {
             if (row_view.colidx(j) == row) {
               local_sum += row_view.value(j) * row_view.value(j);
               break;
             }
           }
         },
         sum_sq);

     return std::sqrt(sum_sq / num_nodes);
   }

Implementation Files
--------------------

- ``core/types/LinearAlgebraTypes.hpp``: Type aliases
- ``numeric/continuity/GwceSolver.hpp``: Main GWCE solver class
- ``numeric/continuity/consistent/ConjugateGradientSolver.hpp``: Belos CG wrapper
- ``numeric/continuity/consistent/JacobiPreconditioner.hpp``: Ifpack2 Jacobi wrapper
- ``numeric/continuity/consistent/GwceMatrixAssemblyConsistentKernels.hpp``: Matrix assembly kernels