I detailed how to write a basic pseudo-spectral solver in 2D in python in my previous post. In this one, I will talk about how to make it faster using [Numba] and [pyfftw].
As can be seen in my speed benchmark for matrix multiplication, numba is indeed very fast.
As a starting point for more complex numerical simulations, I propose to write a simple two dimensional pseudospectral Navier-Stokes solver in python. Since the motivation here is to write the simplest such solver, so that we can understand different steps, this will actually be a serial solver. A serial solver can be run on a single machine with multithreading handled by numpy routines. We can also develop a numba version where multithreding can be done explicitly using numba routines. However the simpler numpy version is better for understanding what exactly is going on.
As we have discussed in an earlier post, there is a compact representation of the Fourier coefficients that provides what is actually needed for computing a convolution. This can be used to reduce the number of degrees of freedom of the problem and guarantee hermitian symmetry
However going back and forth between the two may be complicated. We can achieve this in two different ways.
The primary computation in a pseudo-spectral code is the convolution. Here we develop a python class that can compute
parallel convolutions with arbitrary multiplier functions as described earlier using mpi4py-fft.
mpi4py-fft is a parallel fft library for python which uses serial fftw (so not the fftw3_mpi directly) via mpi4py in order to achieve two and three dimensional parallel ffts. It is well documented and therefore we do not need to introduce how it is used here.
It is based on the idea that we can compute 2D ffts by computing 1d ffts in y direction followed by a transpose (with mpi alltoall) followed by computing 1d transforms in the x direction. We can finally transpose the result again or not depending on if we can live with a transposed result.
In various numerical problems, the issue of matrix/vector multiplications arises. This can be implemented in different ways in python. Here we will compare, numpy einsum, numpy multiply, numba and c implementations and try to find out which is faster given a particular configuration and how each scales with number of nodes with MPI.
Even though there are many different ODE solvers in python. It seems that a simple parallel adaptive time step solver that can use mpi is lacking. One either has to use a huge framework, or fallback to fixed time step solvers.
CVODE is a an ODE solver developed by LLNL as part of the SUNDIALS SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers. It is c library that uses various vector formats among which is the NVECTOR_PARALLEL module that can be used for writing an mpi based ode solver.
mpi4py is the standard framework for using mpi under python that provides all kinds of functionality. Here loosely following the DistArray class which is part of the mpi4py-fft, we develop a minimalistic distributed array class. Basic idea is that we can subclass the ndarray class into a distributed array class that contains:
the local data.
information about what part of the global array this data represents.
Instead of going around and adding every different case, we will try to generalize the method for calling the convolution functions
so that we can use it from python more or less directly. In fact we can probably even write multiplier functions in python, implemented in numba so that they can be fast enough.
Instead we will write the multiplier function in C++ but call some generalized routines for creating the convolution class and performing the convolution using this function.
Nominally, “2D arrays” on C, whose elements we can access using the notation F[i][j] are in fact not really arrays, but pointers of pointers. In other words each F[i] is a pointer to a basic type so that (F[i])[j] gives the actual basic element.
To be fair, this is rarely used to describe 2D arrays, even in C. But it can sometimes be used to describe a “collection of arrays”. Under python, on the other hand, a collection of arrays would be naturally described by a list of (say) numpy arrays. So it would be interesting to pass a list of numpy arrays to a c function that uses pointers of pointers.
Say we want to implement a pseudo-spectral solver for 2D Navier-Stokes + passive scalar problem. The problem have
two nonlinearities that we need to compute using convolutions. We need to figure out how to do that in a compact
way using multiplier functions. We also need to understand the form of the wave-number array so that we can use them to compute derivatives.
FFTW++ can be used to compute multiple convolutions that are added together. Looking at the code and in particular
convolution.cc, we find that this can be achieved by using what seems to be called multiplier functions (or classes).
We have already used one of these earlier, which is the basic multbinary function, which computes F[0][j] *= F[1][j] with two arrays.
In the notation used by fftw, the two parameters mx and my determine roughly the independent degrees of freedom, minus the hermitian symmetry of the y=0 axis as discussed in an earlier blog post.
However exactly how many elements the arrays that we use will have depends on the parameters xcompact and ycompact. In practice if xcompact, ycompact are true the array size for HConvolution2 will be *2mx-1** and my whereas if xcompact and ycompact are false, the array sizes will be 2*mx and my+1.
As discussed in my earlier posts FFTW++ is an implicit dealiasing library for turbulence simulations.
In this post we will see how to define the wave-vector arrays that correspond to the convolutions that fftw++ computes and how to compute dervatives
of a 2D field (density, vorticity or stream function) using ffts.
FFTW++ is useful for turbulence simulations. In order to use it in more than one dimensions, we need to understand how it handles convolutions.
Here we will try to discover how it computes convolutions, in particular centered Hermitian symmetric convolutions in two dimensions by writing
equivalent code which basically computes the same convolutions from the same input and output as in the example files of fftw++.
FFTW++ is an implicit dealiasing library for turbulence simulations. It provides a C++ interface to FFTW and a number of
convolution classes and routines. It uses multithreading as well as MPI based parallelization. However, unfortunately it is poorly documented. So in this post
I will report my attempts at trying to understand how it actually works, what it actually computes and what are the different options we have etc. Obviously I don’t
claim to have any kind of mastery of its use and therefore the information I provide can be erroneous. If you find that this is the case, let me know and I will try to correct.
There are many cases where a C/C++ library wants to call a user specified function,
and it is easy to find yourself in a situation where your python code wants to call such a library.
There are also situations where you may want to simply be able to go back and forth between c and python
for various reasons, like you want to plot something, which is much easier in python but do some number crunching
which may be better done in C or fortran.