monolish  0.14.2
MONOlithic LIner equation Solvers for Highly-parallel architecture
GPU device acceleration

Introduction

The following four classes have the computable attribute:

These classes support computing on the GPU and have five functions for GPU programming.

When libmonolish_cpu.so is linked, send() and recv() do nothing, the CPU and GPU code can be shared.

When libmonolish_gpu.so is linked, these functions enable data communication with the GPU.

Each class is mapped to GPU memory by using the send() function. The class to which the data is transferred to the GPU behaves differently, and it becomes impossible to perform operations on the elements of vectors and matrices.

Whether the data has been transferred to the GPU can be obtained by the get_device_mem_stat() function.

The data mapped to the GPU is released from the GPU by recv() or device_free().

Most of the functions are executed on either CPU or GPU according to get_device_mem_stat() . A copy constructor is special, it is a function that copies an instance of a class. So both CPU and GPU data will be copied.

For developers, there is a nonfree_recv() function that receives data from the GPU without freeing the GPU memory. However, in the current version, there is no way to explicitly change the status of GPU memory, so it is not useful for most users.

GPU programs using monolish are implemented with the following flow in mind.

  1. First, generates data on CPU, and then
  2. Transfer data from CPU to GPU
  3. Calculate on GPU,
  4. Finally, receive data from GPU to CPU

It is important to be aware that send() and recv() should not be performed many times in order to reduce transfers.

Compute innerproduct on GPU

A simple inner product program for GPU is shown below:

#include <iostream>
int main() {
// Output log if you need
// monolish::util::set_log_level(3);
// monolish::util::set_log_filename("./monolish_test_log.txt");
size_t N = 100;
// x = {1,1,...,1}, length N
// Random vector length N with random values in the range 1.0 to 2.0
monolish::vector<double> y(N, 1.0, 2.0);
// send data to GPU
// compute innerproduct
double ans = monolish::blas::dot(x, y);
std::cout << ans << std::endl;
return 0;
}

This sample code can be found in /sample/blas/innerproduct/.

This program can be compiled by the following command.

g++ -O3 innerproduct.cpp -o innerproduct_cpu.out -lmonolish_gpu

The following command runs this.

./innerproduct_gpu.out

A description of this program is given below:

  • Each class has send() and recv() functions.
  • monolish::util::send() is a convenient util function that can take variable length arguments.
  • The scalar values are automatically synchronized between the CPU and GPU.
  • The BLAS and VML functions in monolish automatically call the GPU functions when they receive data that has already been sent.
  • When libmonolish_cpu.so is linked, send() and recv() do nothing, the CPU and GPU code can be shared.
  • In this program, x and y do not need to be received to CPU, so the memory management was left to the automatic release by the destructor.

For a more advanced example, sample programs that implement CG methods using monolish::BLAS and monolish::VML can be found in /sample/blas/cg_impl/.

Solve Ax=b on GPU

The following is a sample program that solves a linear equations; Ax=b using the conjugate gradient method with jacobi preconditioner on GPU.

This program requires only two lines of changes from the CPU program.

#include <iostream>
int main() {
// Output log if you need
// monolish::util::set_log_level(3);
// monolish::util::set_log_filename("./monolish_test_log.txt");
monolish::matrix::COO<double> A_COO("sample.mtx"); // Input from file
// Edit the matrix as needed //
// Execute A_COO.sort() after editing the matrix //
A_COO); // Create CRS format and convert from COO format
// Length A.row()
// Random vector length A.row() with random values in the range 1.0 to 2.0
monolish::vector<double> x(A.get_row(), 1.0, 2.0);
monolish::vector<double> b(A.get_row(), 1.0, 2.0);
// Create CG class
// create jacobi preconditioner
// Set preconditioner to CG solver
solver.set_create_precond(precond);
solver.set_apply_precond(precond);
// Set solver options
solver.set_tol(1.0e-12);
solver.set_maxiter(A.get_row());
// if you need residual history
// solver.set_print_rhistory(true);
// solver.set_rhistory_filename("./a.txt");
// Solve Ax=b by CG with jacobi
if (monolish::util::solver_check(solver.solve(A, x, b))) {
return 1;
}
// Recv x from GPU
// Show answer
x.print_all();
return 0;
}

This sample code can be found in /sample/equation/cg/.

  • After creating the vectors and matrix A, send the data to the GPU using the monolish::util::send() function.
  • For x that requires output, explicitly receive data to the CPU using the recv() function. At this time, the memory of x on the GPU is released.
  • The CPU/GPU memory of A, A_COO, and b is released by the destructor at the end of the function.

A sample program for a templated linear equation solver can be found at sample/equation/templated_solver.

Environment variable

  • LIBOMPTARGET_DEBUG= [1 or 0] : Output debug information on OpenMP Offloading at runtime
  • CUDA_VISIBLE_DEVICES= [Device num.] : Specify GPU device number
monolish::solver::solver::set_apply_precond
void set_apply_precond(PRECOND &p)
set precondition apply fucntion
Definition: precond.cpp:58
monolish::equation::CG::solve
int solve(MATRIX &A, vector< Float > &x, vector< Float > &b)
solve Ax = b by BiCGSTAB method(lib=0: monolish)
Definition: cg.cpp:98
monolish_blas.hpp
monolish::solver::solver::set_maxiter
void set_maxiter(size_t max)
set max iter. (default = SIZE_MAX)
Definition: monolish_solver.hpp:89
monolish::equation::CG
CG solver class.
Definition: monolish_equation.hpp:63
monolish::blas::dot
void dot(const vector< double > &x, const vector< double > &y, double &ans)
inner product (dot)
Definition: vector_blas.cpp:973
monolish_equation.hpp
monolish::matrix::COO
Coodinate (COO) format Matrix (need to sort)
Definition: monolish_coo.hpp:45
monolish::util::recv
auto recv(T &x)
recv. and free data from GPU
Definition: monolish_common.hpp:656
monolish::util::solver_check
bool solver_check(const int err)
check error
Definition: equation_utils.cpp:7
monolish::solver::solver::set_tol
void set_tol(double t)
set tolerance (default:1.0e-8)
Definition: monolish_solver.hpp:83
monolish::vector
vector class
Definition: monolish_coo.hpp:32
monolish::solver::solver::set_create_precond
void set_create_precond(PRECOND &p)
set precondition create fucntion
Definition: precond.cpp:11
monolish::equation::Jacobi
Jacobi solver class.
Definition: monolish_equation.hpp:159
monolish::util::send
auto send(T &x)
send data to GPU
Definition: monolish_common.hpp:642
monolish::matrix::CRS
Compressed Row Storage (CRS) format Matrix.
Definition: monolish_coo.hpp:36