Work packages

The project is composed of four different tasks:

Task 1 : Parallel direction preserving preconditioners for large sparse systems of equations.
Task 2 : Graph partitioning and reordering techniques.
Task 3 : Preconditioners on heterogeneous architectures using GPUs.
Task 4: Validation in real applications.

Task 1 focuses on two classes of preconditioners and their multiplicative or additive combination. The first class is based on direction preserving preconditioners, and their implementation in block approached factorizations and two level domain decomposition methods. The second class focuses on incomplete LU factorizations. In this task it is considered that the grids are irregular and that the preconditioners are suitable for parallel execution. In Task 2, the graph partitioning problems associated with the preconditioners developed in Task 1 are studied. In particular the irregularity of the grids is taken into account as well as the influence of the reordering on the quality of the preconditioner. In Task 3 we explore the usage of GPUs for sparse iterative solvers. Task 4 validates the methods proposed in the first three tasks on real applications. In the following a more detailed description of each task is given.

Task 1 : Direction preserving preconditioners for large sparse systems of equations

We develop preconditioners that are suitable for parallel computing and for matrices arising from scalar equations or systems of PDEs on structured or unstructured grids. The starting point corresponds to methods which have already been successfully tested on matrices arising from the discretization of scalar equations on structured grids.

Most of the existing preconditioners have scalability problems in terms of both problem size and number of processors used. In other words, the number of iterations increases significantly, and this is due to a small number of low frequency modes. To deal with this problem, PETALh develops direction preserving preconditioners that have the property of being identicial with the input matrix on a given set of vectors. A judicious choice of the filtering vector allows to deal efficiently with the low frequence modes that hinder the convergence of iterative methods.

The efficiency of the direction preserving preconditioners can be increased on several applications by combining it with the classical ILU preconditioners. In fact, the above preconditioners are complementary. The ILU factorization damps very efficiently the high frequency modes of the original matrix. But often it has no effect on low frequency modes. On the other hand, the effect of the direction preserving preconditioner can be tuned. Therefore, it can efficiently damp low frequency modes of the original matrix. As a result, the combinative approach leads to an efficient preconditioner that deals with both low and high frequencies.

An important issue is the suitability for parallelism of these preconditioners. The parallel algorithms developed in this task explicitely deal with the hierarchical models of peta-/exa-scale machines. For this, at the design level we consider a hierarchical parallelism, in which the first level is obtained from a graph partitioning approach. Then inside each partition (or domain) a second level of parallelism is exploited, that takes into accound that each multicore processor can take advantage of thread level parallelism. This second level consists in performing in parallel the operations associated with each partition that exploit fine level paralellism and take into account the sparsity of each partition.

Task 2 : Graph partitioning and reordering techniques

A major goal of this task is to study reordering techniques that take into account the numerical values of the input matrix. It has been observed in practice that the ordering of the unknowns has an important impact on the quality of an incomplete LU factorization. For example a natural ordering leads to a more efficient ILU preconditioner compared to a nested dissection ordering. However, a nested dissection ordering is very important for parallelism, and hence some compromises need to be made. The following table displays the effect of the reordering and partitioning on a small CEA test case. Left columns (iteration, fill-in) show results obtained using a domain decomposition computed from a nested dissection reordering. Right columns (iteration, fill-in) display results obtained using a domain decomposition obtained from vertex graph partitioning, reconstruction of a narrow overlap and RCM reordering inside the domain.

Domain part	Iteration	Fill-in	Iteration	Fill-in
1	200	3.34	20	4.37
2	200	3.36	200	4.62
4	200	4.32	42	5.5
8	200	4.88	200	5.6

Task 3 : Preconditioners on heterogeneous architectures using GPUs

In addition to the preconditioners developed in Task 1, we also study the parallelization of the building blocks of the preconditioners. These building blocks are classic operations in linear algebra as sparse matrix-vector multiplication, sparse matrix-matrix multiplication, and direct factorizations. We study parallel algorithms for these operations that minimize communication, and so are adapted to future machines for which an increased communication cost with respect to computation cost is observed.

Task 4: Validation in real applications

The preconditioning techniques developed are generic and can be applied to many scientific applications. In this project they will be validated on several complex numerical simulations. One numerical simulation on which we focus is the simulation of compositional multiphase Darcy flow in heterogeneous porous media with different type of applications from IFP and CEA:

simulation of reservoir models: the compositional triphase Darcy flow simulator is a key tool to predict the production of a reservoir and optimize the location of wells; For example, today nearly all major reservoir development decisions are based at least partially on simulation results.
simulation of basin models: compositional multiphase Darcy flow models are used to simulate the migration of oil and gas phases at geological space and time scales. The flow equations are coupled with models accounting for basin compaction, temperature evolution and for the cracking of the source rock into hydrocarbon components. Such models are used at the exploration stage to predict the location of the reservoirs as well as the quality and quantity of oil trapped therein.
simulation of geological CO2 underground storage: the compositional multiphase Darcy model is coupled with the chemical reactions between the aqueous phase and the minerals. This allows to model the physical processes occurring during the injection phase and to study the long term stability of the storage.
simulation of underground nuclear waste disposal: single phase Darcy flow models coupled with reactive diffusive transport of contaminants is used to study and demonstrate the safety of the storage facilities. Two phase gas water Darcy flow models are also used to take into account the generation and migration of the gas phase partially miscible in the water phase.

Also, the preconditioning techniques developed in this project will be applied to two-phase thermal-hydraulics problems in nuclear power plant cores. These studies are made with two primary goals: to study the design of fourth generation nuclear reactors, and to perform safety analysis.

The CEA will also apply the techniques to material studies : degradation of cimentitious material are studied at the CEA for behaviour of material subject to strong constraints in nuclear power plants and for the studies of concrete in nuclear waste storage facilities. Such studies are not restricted to the nuclear energy domain, and are of interest for the community of researchers on building materials behaviour.

Another simulation on which we focus is the radiation transport phenomenon, which is important in a large number of scientific areas such as confined fusion, astrophysics, nuclear reactor system, forest fires, weather prediction, etc. In particular, for nuclear reactor simulations as studied at Argonne, the integral-differential Boltzmann equation for neutral particle transport requires the treatment of seven independent variables: three in space, two in angle, one in energy, and one in time. Therefore, nuclear reactor simulations are among the most memory and computationally intensive in the scientific community.

Sections

Work packages

Task 1 : Direction preserving preconditioners for large sparse systems of equations

Task 2 : Graph partitioning and reordering techniques

Task 3 : Preconditioners on heterogeneous architectures using GPUs

Task 4: Validation in real applications

Document Actions