Manual

From openpipeflow.org
Revision as of 01:05, 8 August 2014 by Apwillis (talk | contribs)
Jump to navigation Jump to search

$ \newcommand{\bnabla}{\vec{\nabla}} \newcommand{\Rey}{Re} \def\vechat#1{\hat{\vec{#1}}} \def\mat#1{#1} $

Getting started

Overview of files

  • Makefile will require modification for your compiler and libraries (see #Compiling_libraries). Sample commands for other compilers can be found near the top of the file.
  • parallel.h Ensure _Np is set to 1 if you do not have MPI. This file contains macros for parallelisation that are only invoked if the number of processes Np is greater than 1.
  • program/parameters.f90, where you’ll find parameters. See #Parameters
  • utils/ contains a number of utilities. Almost anything can be done in a util, both post-processing and analysing data at runtime. There should be no need to alter the core code. See #Making_utils
  • Matlab/, a few scripts. See Matlab/Readme.txt .

Parameters

./parallel.h:
_Np

Number of processors (set 1 for serial)
./program/parameters.f90:


i_N Number of radial points $n\in[1,N]$
i_K Maximum k (axial), $k\in(-K,\,K)$
i_M Maximum m (azimuthal), $m\in[0,\,M)$
i_Mp Azimuthal periodicity, i.e. $m=0,M_p,2M_p,\dots,(M-1)M_p$
(set =1 for no symmetry assumption)
d_Re Reynolds number $Re$ or $Rm_m$
d_alpha Axial wavenumber $\alpha=2\pi/L_z$
b_const_flux Enforce constant flux $U_b=\frac1{2}$.
i_save_rate1 Save frequency for snapshot data files
i_save_rate2 Save frequency for time-series data
i_maxtstep Maximum number of timesteps (no limit if set $<0$)
d_cpuhours Maximum number of cpu hours
d_time Start time (taken from state.cdf.in if set $<0$)
d_timestep Fixed timestep (typically 0.01 or dynamically controlled if set $<0$)
d_dterr Maximum corrector norm, $\|f_{corr}\|$
d_courant Courant number $\mathrm{C}$
d_implicit Implicitness $c$

Input files

State files are stored in the NetCDF data format and can be transferred across different architectures safely. The program main.out runs with the compiled parameters (see main.info) but will load states of other truncations. Copy any state file e.g. state0018.cdf.dat to state.cdf.in It will interpolate if necessary.

ll

state.cdf.in:
$t$ & – Start time. Overridden by d_time if d_time$\ge$0
$\Delta t$ & – Timestep. Ignored, see parameter d_timestep.
$N, M_p, r_n$ & – Number of radial points, azimuthal periodicity,
& $\quad$ radial values of the input state.
& – If the radial points differ the fields are
& $\quad$ interpolated onto the new points.
$u_r,\,u_\theta,\, u_z$ & – Field vectors.
& – If $K\ne\,$K or $M\ne\,$M, harmonics are truncated or
& $\quad$ zeros appended.

Output

Snapshot data

Data saved every i_save_rate1 timesteps:

   state????.cdf.dat
   vel_spectrum.dat

All output is sent to the current directory, and state files are numbered 0000, 0001, 0002,…. Each file can be renamed state.cdf.in should a restart be necessary. To list times $t$ for each saved state file,
grep state OUT
The spectrum files are overwritten each save as they are retrievable from the state data. To verify sufficient truncation, a quick profile of the energy spectrum can be plotted with
gnuplot> set log
gnuplot> plot 'vel_spectrum.dat' w lp


Time-series data

Data saved every i_save_rate2 timesteps:

tim_step.dat $t$, $\Delta t$, $\Delta t_{\|f\|}$, $\Delta t_{CFL}$ current and limiting step sizes
vel_energy.dat $t$, $E$ energies and viscous dissipation
vel_friction.dat $t$, $U_b$ or $\beta$, $\langle u_z(r=0)\rangle_z$, $u_\tau$ bulk velocites, pressure, friction vel.


Compiling libraries

Take a look at Makefile. The compiler and flags will probably need editing. There are suggested flags for many compilers at the top of this file.

Building the main code is successful if running make produces no errors, but this requires the necessary libraries to be present - LAPACK, netCDF and FFTW. Often it is necessary to build LAPACK and NetCDF with the compiler and compile flags that will be used for the main simulation code.

The default procedure for building a package (FFTW3, netCDF) is

tar -xvvzf package.tar.gz
cd package/
[set environment variables if necessary]
./configure --prefix=<path>
make
make install

FFTW3. This usually requires no special treatment. Install with your package manager or build with the default settings.

LAPACK. If using gfortran you might be able to use the binary version supplied for your linux distribution. Otherwise, edit the file make.inc that comes with LAPACK, setting the fortran compiler and flags to those you plan to use. Type ‘make’. Once finished, copy the following binaries into your library path (see Makefile LIBS -L$<$path$>$/lib/)
cp lapack.a <path>/lib/liblapack.a
cp blas.a <path>/lib/libblas.a

netCDF. If using gfortran you might be able to use the supplied binary version for your linux distribution. Otherwise, typical environment variables required to build netCDF are
CXX=""
FC=/opt/intel/fc/10.1.018/bin/ifort
FFLAGS="-O3 -mcmodel=medium"
export CXX FC FFLAGS
After the ‘make install’, ensure that files netcdf.mod and typesizes.mod appear in your include path (see Makefile COMPFLAGS -I$<$path$>$/include/). Several versions can be found at http://www.unidata.ucar.edu/downloads/netcdf/current/index.jsp. Version 4.1.3 is relatively straight forward to install, the above flags should be sufficient.

Installation is trickier for more recent versions [2013-12-11 currently netcdf-4.3.0.tar.gz]. First
./configure --disable-netcdf-4 --prefix-<path>
which disables HDF5 support (not currently required in code). Fortran is no longer bundled, so get netcdf-fortran-4.2.tar.gz or a more recent version from here
http://www.unidata.ucar.edu/downloads/netcdf/index.jsp
Build with the above flags, and in addition
CPPFLAGS=-I<path>/include
export CPPFLAGS
Add the link flag -lnetcdff before -lnetcdf in the Makefile.

Compiling and running

For serial use set _Np to 1 in parallel.h. Otherwise MPI is required and the compile must be updated in the Makefile. First set parameters in program/parameters.f90. Next type

> make
> make install  

The second command creates the directory install/ and a text file main.info, which is a record of settings at compile time. Next an initial condition state.cdf.in is needed,

> mv install ~/runs/job0001
> cd ~/runs/job0001/
> cp .../state0100.cdf.dat state.cdf.in 
> nohup ./main.out > OUT 2> OUT.err &

Press enter again to see if main.out stopped. If it has stopped, check OUT.err or OUT for clues why. If it appears to be running, wait a minute or two then

> rm RUNNING

This will terminate the job cleanly.

NOTE: Any output state, e.g. state0012.cdf.dat can be copied to state.cdf.in to be used as an initial condition. If resolutions do not match, they are automatically interpolated or truncated. This is how I generate almost all initial conditions, by taking a state from a run with similar parameters. If there is a mismatch in Mp, use utils/changeMp.f90.

Monitoring a run

Immediately after starting a job, it’s a good idea to check for any warnings

> less OUT

To find out number of timesteps completed, or for possible diagnosis of an early exit,

> tail OUT

The code outputs timeseries data and snapshot data, the latter has a 4-digit number e.g. state0012.cdf.dat.

To see when the in the run each state was saved,

> head -n 1 vel_spec* | less

I often monitor progress with tail vel_energy.dat or

> gnuplot
> plot 'vel_energy.dat' w l

Use rm RUNNING to end the job.

Making utils

The core of the code in program/ rarely needs to be changed. Almost anything can be done by creating a utility; there are many examples in utils/

In Makefile, set UTIL = utilname (omitting the .f90 extension), then type 'make util' which builds utilname.out.

Spatial representation

Finite differences — $r$

A function $p(r)$ is represented by the values $p_n=p(r_n)$, where $p(r)$ is evaluated on the $N$ radial points $r_n$.

Differentiation

Taylor expansions about central point $x_0$, $k$ neighbouring points each side

$\label{eq:taylorexp} \begin{array}{lclrrrr}

  f(x_{-k}) &=& f(x_0)
  &+& (x_{-k}-x_0)\,f'(x_0) &+& \frac{(x_{-k}-x_0)^2}{2!}\,f(x_0)+\dots
  \\ & \vdots & \\
  f(x_k) &=& f(x_0)
  &+& (x_k-x_0)\,f'(x_0) &+& \frac{(x_k-x_0)^2}{2!}\,f(x_0)+\dots

\end{array}$

The expansions \ref{eq:taylorexp} can be written

$\vec{f}=\mat{A}\,\vec{df}, \quad

  \vec{df}=\mat{A}^{-1}\,\vec{f}, \quad
  \vec{f}=\left[\begin{array}{c}f(x_{-k})\\
          \vdots\\ f(x_k)\end{array}\right], \quad
  \vec{df}=\left[\begin{array}{c}
          f(x_0)\\ f'(x_0)\\ f(x_0)\\ \vdots \end{array}\right]$

Derivatives are calculated using weights from the appropriate row of $\mat{A}^{-1}$.

Integration

Integrating \ref{eq:taylorexp} the indefinite integral may be approximated about $x_0$

$\begin{eqnarray*}

   \int f(x) \, {\mathrm d}x & = &
   \begin{array}{c}(x-x_0)\,f(x_0) + \frac{(x-x_0)^2}{2!}\,f'(x_0) +
   \frac{(x-x_0)^3}{3!}\,f(x_0) + \dots \end{array}
   \\
   & = &
   [\begin{array}{lll}(x-x_0) & \frac{(x-x_0)^2}{2!} &
   \dots \end{array}] \,\, \mat{A}^{-1}\vec{f}\end{eqnarray*}$

The total integral is approximated by

$ \frac1{2} \sum_n \int_{x_{n-1}}^{x_{n+1}} f(x) \, {\mathrm d}x$

Matrix representation of linear operators

Linear equations can be written in the matrix form

$\mat{L} \,\vec{p} = \mat{M}\, \vec{q} + \vec{s}.$

$\mat{L}$ and $\mat{M}$ are $N\times N$ matrices representing linear operators. If $\vec{q}$ and $\vec{s}$ are known then the right-hand side is simple to evaluate. Consider the equation

$\label{eq:matmod}

  \mat{L}\, \vec{p} = \vec{q},$

where $\vec{p}$ is unknown. If the matrix operators are formed by linear combinations of the weights in #Differentiation, using only $k$ neighbouring points to each side, they are banded. Boundary conditions are formed in the same way and are placed in the end rows. The equation is solved by forward-backward substitution, using the banded LU-factorisation of $\mat{L}$.

figure=LU.eps, scale=0.65

Fourier representaion — $\theta$, $z$

Expansion of variables

$A(\theta,z) =

     \sum_{km} A_{km} {\mathrm e}^{{\mathrm i} (\alpha kz + m_0 m\theta)}$

$A$ real implies $A_{km} = A_{-k,-m}^*$ (coefficients are conjugate-symmetric). It is sufficient to keep only $m\ge 0$, and for $m=0$ keep $k\ge 0$.

Orthogonality

Exponentials

$\int_0^{2\pi} {\mathrm e}^{{\mathrm i}(m+n)} \, {\mathrm d}\theta

     = 2\pi \, \delta_{m,-n}\, ,
     \qquad
     \int_0^{\frac{2\pi}{\alpha}} {\mathrm e}^{{\mathrm i}\alpha(k+j)} \, {\mathrm d}z 
     = \frac{2\pi}{\alpha} \, \delta_{k,-j}$

Energy integral

$E \, = \,

     {\textstyle \frac1{2}} \int \vec{A} \cdot \vec{A} \, {\mathrm d}V 
     \, = \,
     \frac{2\pi^2}{\alpha} \, \sum_{km}
     \int |\vec{A}_{km}|^2 \, r\,{\mathrm d}r$

$E = \sum_{m=0}^{\infty} E_m,

  \qquad
     E_m =
     \left\{
        \begin{array}{ll}
           \displaystyle
           \frac{2\pi^2}{\alpha}\int\vec{A}_{00}^2 \, r \, {\mathrm d}r
           + \frac{4\pi^2}{\alpha} \sum_{k>0}
           \int|\vec{A}_{k0}^2| \, r \, {\mathrm d}r,
           & m=0, \\[15pt]
           \displaystyle
           \frac{4\pi^2}{\alpha} \sum_k 
           \int |\vec{A}_{km}|^2 \, r \, {\mathrm d}r,
           & m>0 .
        \end{array}
     \right.$

Volume integral

$E \, = \,

     {\textstyle \frac1{2}} \int A \, {\mathrm d}V 
     \, = \,
     \frac{4\pi^2}{\alpha} \, \int A_{00} \, r\,{\mathrm d}r$

Enforced conditions at the axis

The geometry enforces conditions on each variable at the axis when expanded over Fourier modes in $\theta$. Given a vector A, each mode for $A_z$ is even in $r$ if $m$ is even, and is odd if $m$ is odd. Each mode for $A_r$ and $A_\theta$ is even in $r$ if $m$ is odd, and is odd if $m$ is even.

Model

Governing equations

Non-dimensionalisation / scales

For laminar HPF $U_{cl} = 2\,U_b$.
length $R$,
velocity, fixed flux $2\, U_b$,
velocity, fixed pressure $U_{cl}$ of HPF.

Dimensionless parameters

Reynolds number, fixed flux, $Re_m = 2 U_b R / \nu$
Reynolds number, fixed pressure, $Re = U_{cl} R / \nu$
$1+\beta = Re / Re_m$
$Re_\tau=u_\tau R/\nu = (2\,Re_m\,(1+\beta))^\frac{1}{2}=(2\,Re)^\frac{1}{2}$

Evolution equations

Fixed flux,

$ (\partial_{t} + \vec{u}\cdot\bnabla) \vec{u}

  = -\bnabla \hat{p} + \frac{4}{\Rey_m}\,(1+\beta)\vechat{z}
   + \frac{1}{\Rey_m}\bnabla^2 \vec{u} $

Fixed pressure

$ (\partial_{t} + \vec{u}\cdot\bnabla) \vec{u}

  = -\bnabla \hat{p} + \frac{4}{\Rey}\vechat{z}
   + \frac{1}{\Rey}\bnabla^2 \vec{u} $

Let $\vec{u}=W(r)\vechat{z}+\vec{u}'$. Using the scaling above, $W(r) = 1-r^2$. Then

$ (\partial_{t} - \frac{1}{\Rey_m}\bnabla^2)\,\vec{u}'

  = \vec{u}' \wedge (\bnabla \wedge\vec{u}') 
  - \frac{\mathrm{d}W}{\mathrm{d}r}\,u'_z \vechat{z}
  - W\,\partial_{z}\vec{u}' + \frac{4\,\beta}{\Rey_m}\vechat{z} - \bnabla\hat{p}' \, . $

Decoupling the equations

The equations for $u_r$ and $u_\theta$ are coupled in the Laplacian. They can be separated in a Fourier decompositon by considering

$u_\pm = u_r \pm \mathrm{i} \, u_\theta,$

for which the $\pm$ are considered respectively. Original variables are easily recovered

$u_r = \frac{1}{2} ( u_+ + u_-),

  \qquad
  u_\theta = -\,\frac{\mathrm{i}}{2}(u_+ - u_- ) .$

Governing equations are then

$\begin{eqnarray*}

  (\partial_{t} - \nabla^2_\pm)\, u_\pm
  & = &  N_\pm - (\bnabla p)_\pm , \\
  (\partial_{t} - \nabla^2 )\, u_z
  & = &  N_z - (\bnabla p)_z ,\end{eqnarray*}$

where

$\nabla^2_\pm = \nabla^2 - \frac{1}{r^2}

  \pm \frac{\mathrm{i}}{r^2}\partial_{\theta}$


PPE-formulation with correct boundary conditions

Write the time-discretised Navier–Stokes equations in the form

$\label{eq:NSdisc}

        \left\{\begin{array}{rcl}
    \mat{X}\, \vec{u}^{q+1} & = & \mat{Y}\, \vec{u}^q 
                + \vec{N}^{q+\frac{1}{2}} - \bnabla p \, , \\
                \nabla^2 p & = & \bnabla\cdot(\mat{Y}\, \vec{u}^q 
                + \vec{N}^{q+\frac{1}{2}}) ,
        \end{array}\right.$

where $q$ denotes time $t_q$, which is sixth order in $r$ for $\vec{u}^{q+1}$ and second order for $p$, where the solenoidal condition is not explicitly imposed. Symmetry conditions provide the conditions at the axis. The difficulty is in imposing the remaining four — this system should be inverted, in principle, simultaneously for $p$ and $\vec{u}^{q+1}$ with boundary conditions $\vec{u}^{q+1}=\vec{0}$ and $\bnabla\cdot\vec{u}^{q+1}=0$ on $r=R$ (Rempfer 2006). In practice it would be preferable to invert for $p$ first then for $\vec{u}^{q+1}$, but the boundary conditions to not involve $p$ directly.

Note that the $\mat{Y}\,\vec{u}^q$ term has been included in the right-hand side of the pressure-Poisson equation, the divergence of which should be small. Assume that pressure boundary condition is known: the right-hand side of the Navier–Stokes equation is then projected onto the space of solenoidal functions though $p$ and hence after inversion, $\vec{u}^{q+1}$ will be solenoidal.

Consider the ‘bulk’ solution, $\{\bar{\vec{u}},\bar{p}\}$, obtained from solution of the following:

$\label{eq:NSbulk}

        \left\{\begin{array}{rcl}
    \mat{X}\, \bar{\vec{u}} & = & \mat{Y}\, \vec{u}^q 
                + \vec{N}^{q+\frac{1}{2}} - \bnabla \bar{p} \, , \\
                \nabla^2 \bar{p} & = & \bnabla\cdot(\mat{Y}\, \vec{u}^q 
                + \vec{N}^{q+\frac{1}{2}}) ,
        \end{array}\right.$

with boundary conditions $\bar{\vec{u}}=\vec{0}$ and $\partial_{r}\bar{p}=0$. Introduce the following systems:

$\label{eq:NSp0}

        \left\{\begin{array}{rcl}
         \vec{u}' & = & -\bnabla p' \, , \\
                \nabla^2 p' & = & 0,
        \end{array}\right.$

with boundary condition $\partial_{r}p'=1$ on $r=R$, and

$\label{eq:NSu0}

        \left\{\begin{array}{rcl}
    \mat{X}\, \vec{u}' & = & \vec{0},
        \end{array}\right.$

with boundary conditions $u'_+=1$, $u'_-=1$, $u'_z=\mathrm{i}$ on $r=R$. The system \ref{eq:NSp0} provides a linearly independent function $\vec{u}'_j$ that may be added to $\bar{\vec{u}}$ without affecting the right-hand side in \ref{eq:NSbulk}, but altering (to correct) the boundary condition applied. (Note that $\bnabla^2(\bnabla p')=\bnabla(\nabla^2 p')-\bnabla \wedge\bnabla \wedge\bnabla p'=0$.) Similarly the system \ref{eq:NSu0} provides a further three functions. The superposition

$\label{eq:usuperpos}

  \vec{u}^{q+1} = \bar{\vec{u}} + \sum_{j=1}^4 a_j\, \vec{u}'_j \,$

may be formed in order to satisfy the four boundary conditions, $\vec{u}^{q+1}=\vec{0}$ and $\bnabla\cdot\vec{u}^{q+1}=0$ on $r=R$. Substituting \ref{eq:usuperpos} into the boundary conditions, they may be written

$\mat{A}\,\vec{a} = -\vec{g}(\bar{\vec{u}}) ,$

where $\mat{A}=\mat{A}(\vec{g}(\vec{u}'))$ is a 4$\times$4 matrix. The appropriate coefficients required to satisfy the boundary conditions are thereby recovered from solution of this small system for $\vec{a}$. The error in the boundary conditions $g_j(\vec{u}^{q+1})$ using the influence-matrix technique is at the level of the machine epsilon, typically order 1e-18.

The functions $u'_j(r)$, the matrix $\mat{A}$ and its inverse may all be precomputed. The boundary conditions for $\vec{u}'$ have been chosen so that that $u'_\pm$ are pure real, $u'_z$ is pure imaginary, and $\mat{A}$ is real. For each timestep, this application of the influence matrix technique requires only evaluation of the deviation from the boundary condition, multiplication by a 4$\times$4 real matrix, and the addition of only two functions to each component of $\vec{u}$, each either pure real or pure imaginary. Compared to the evaluation of nonlinear terms, the computational overhead is negligible.

Predictor-corrector

The model equation for each Fourier mode is

$\label{eq:harmmod}

  (\partial_{t} - \nabla^2) f = N,$

where nonlinear terms have been evaluated on each radial point by the transform method, and is solved as in #Matrix_representation_of_linear_operators. The predictor at time $t_q$, with Euler nonlinear terms and implicitness $c$

$\frac{f_1^{q+1}-f^q}{\Delta t}

  - \left( c\nabla^2 f_1^{q+1} + (1-c)\nabla^2 f^q \right)
  = N^q ,$

$\left( \frac1{\Delta t} - c\nabla^2 \right) f_1^{q+1}

  = \left( \frac1{\Delta t} + (1-c)\nabla^2 \right) f^q + N^q .$

Corrector iterations are

$\left( \frac1{\Delta t} - c\nabla^2 \right) f_{j+1}^{q+1}

  = \left( \frac1{\Delta t} + (1-c)\nabla^2 \right) f^q
  + c\,N_j^{q+1} + (1-c)\, N^q ,$

or equivalently, for the correction $f_{corr} = f_{j+1}^{q+1} - f_j^{q+1}$

$\left( \frac1{\Delta t} - c\nabla^2 \right) f_{corr}

  = c\,N_j^{q+1} - c\, N_{j-1}^{q+1} ,$

where $j=1,2,\dots$ and $N_0^{q+1}=N^q$. The size of the correction $\|f_{corr}\|$ must reduce at each iteration. For $c=\frac1{2}$ the scheme is second order such that $\|f_{corr}\| \sim \Delta t^2$.

Timestep control

$\Delta t = \mathrm{C} \, \min(\Delta \, / \, |\vec{v}| ) ,

  \qquad 
  0<\mathrm{C}<1,$

where $\mathrm{C}$ is the Courant number.

The timestep should also be small enough such that the corrector norm $\|f_{corr}\|$ is satisfactorily small. This may be important for integrating initial transients, but ought not to be the limiting factor in general.

The code

Naming conventions

Parameters

Parameters defined within parameters.f90 are given a prefix indicating the data type. For example:

$N$ i_N integer
$\Delta t$ d_timestep double
fixed flux? b_fixed_flux boolean

Note that the naming convention frees the variable name for dummy variables, e.g.:

   integer :: n
   do n = 1, i_N
      ...

Public variables

By default variables declared within a module are accessible to any other module or subroutine that uses the module. To ensure that it is clear where a public variable is declared, variables are given a prefix. Examples:

$\Delta t$ tim_dt timestep.f90
$u_r$ vel_r velocity.f90

Ordering the Fourier modes

$A = \sum_{km} A_{km} {\mathrm e}^{{\mathrm i}(\alpha kz + m_0 m\theta)},$

Coefficients are stored as in the following loop, where rather than two indices k and m, the single index nh labels the modes:

   nh = -1
   do m = 0, M-1
      m_ = m*Mp
      do k = -(K-1), (K-1)
         if(m==0 .and. k<0) cycle
         nh = nh+1
         ...

NOTE: this loop, and its parallel equivalent, are predefinded macros in parallel.h.

Data types

The data types below consist of logical data groups to facilitate the passing of data between functions. For clarity, in the following text they are defined as though we were working on a single processor. See Parallel regarding subtle differences in the definitions due to parallelisation.

coll, spec — storage of coefficients

The collocated data type (coll) is the principle type used for mode-independent operations.

   type coll
      double precision     :: Re(i_N, 0:i_H1)
      double precision     :: Im(i_N, 0:i_H1)
   end type coll

The value Re(n,nh) is the real part of the coefficient for the nh$^{\mathrm{th}}$ harmonic at $r_n$. H1$=24$ for the example in #Ordering_of_the_Fourier_modes and N is the number of radial points.

The spectral type (spec) is the data type used for operations independent at each radial point. It is principally a transitory data format between the coll and phys types.

   type spec
      double precision     :: Re(0:i_H1, i_N)
      double precision     :: Im(0:i_H1, i_N)
   end type spec

Conversions between the coll and spec types involve only a transpose, var_coll2spec() and var_spec2coll().

phys — real space data

The type (phys) is used for data in real space.

   type phys
      double precision     :: Re(0:i_Z-1, 0:i_Th-1, i_N)
   end type phys

The element Re(k,m,n) refers to the value at $z_k$, $\theta_m$ and $r_n$.

mesh, lumesh — storage of banded matrices

The finite-difference stencil has finite width (Finite_Differences) and so matrix operations involve matrices that are banded. The type (mesh), defined in meshs.f90, is used for matrices on the radial mesh involving $kl$ points to either side of each node.

   type mesh
      double precision :: M(2*i_KL+1, i_N)
   end type mesh

For a matrix A, the element M(KL+1+n-j, j) = A(n,j). See the man page for the lapack routine dgbtrf. The LU-factorisation of a banded matrix is also banded:

   type lumesh
      integer          :: ipiv(i_N)
      double precision :: M(3*i_KL+1, i_N)
   end type lumesh

The element M(2*KL+1+n-j, j) = A(n,j). Pivoting information is stored along with the matrix.

rdom — the radial domain

The type (rdom) contains information the radial domain. The public variable mes_D is declared in meshs.f90. Data includes the number of radial points N; powers of $r$, $r_n^p$=r(n,p); weights for integration $r{\mathrm d}r$; matrices for taking the $p^{\mathrm{th}}$ derivative dr(p);…

   type rdom
      integer          :: N
      double precision :: r(i_N,i_rpowmin:i_rpowmax)
      double precision :: intrdr(i_N)
      type (mesh)      :: dr(i_KL)
   end type rdom

Modifications for the parallel implementation

The code has been parallelised using the Message Passing Interface (MPI). Sections of the code with special MPI function calls are separated from the serial sections by preprocessor directives. The number of processors is given by the parameter _Np set in parallel.h. The rank of the local processor is given by mpi_rnk declared in mpi.f90, and mpi_sze$=$_Np.

Data type changes for the model equation (see equations \ref{eq:matmod}, \ref{eq:harmmod}) are as follows:

figure=model.eps, scale=0.75

Data is split over the spherical harmonics for the linear parts of the code — evaluating curls, gradients and matrix inversions for the timestepping. These linear operations do not couple modes. Here all radial points for a particular mode are located on the same processor, separate modes may be located on separate processors. In the definition of type (coll) the parameter i_H1 is replaced by i_pH1 = (_Np+i_H1)/_Np-1. The variable var_H has elements to determine which harmonics are on which processor. The elements pH0 and pH1 are the zeroth index and one-minus the number of harmonics on the local processor. The corresponding values for all processors are stored in the arrays pH0_(rank) and pH1_(rank). For a variable of type (coll), valid values for the harmonic index in Re(n,nh) are nh=0..pH1, which refer to the indices (#Ordering_of_the_Fourier_modes) $nh$=pH0..pH0+pH1.

Data is split radially when calculating Fourier transforms and when evaluating products in real space. In the definitions of type (spec) and type (phys) the parameter i_N is replaced by i_pN = (_Np+i_N-1)/_Np. The location of the radial subdomain on the local processor is given by additional elements of the variable mes_D. The elements pNi and pN are the location of the first (inner) radial point and the number of points. The values for all processors are stored in the arrays pNi_(rank) and pN_(rank). For a variable of type (spec), valid values for the radial index in Re(nh,n) are n=1..pN which refer to the indices of $r_n$, where $n$=pNi..pNi+pN-1.

The bulk of communication between processors occurs during the data transposes in the functions var_coll2spec() and var_spec2coll(). For _Np processors the data is transfered in _Np$-1$ steps. At each step each processor sends and receives a block of data, the source and destination are selected systematically as set in the following loop:

   do step = 1, mpi_sze-1
      dest  = modulo(mpi_rnk+step, mpi_sze)
      src   = modulo(mpi_rnk-step+mpi_sze, mpi_sze) 
      ...

figure=transp.eps, scale=0.65

The sketch is for the case _Np$=4$. The size of each block is approximately i_pN$\times($i_pH1$+1)$, and the data within each block must also be transposed.


Differential operators in cylindrical–polar coordinates

[app:diffops]

$(\bnabla f)_r = \partial_{r}f,

     \quad
     (\bnabla f)_\theta = \frac1{r}\, \partial_{\theta} f,
     \quad
     (\bnabla f)_z = \partial_{z} f .$

$\nabla^2 f = (\frac1{r}\,partial_{r} + \partial_{rr} ) f

     +\frac1{r^2}\,\partial_{\theta\theta} f + \partial_{zz} f .$

$(\nabla^2 \vec{A})_r =

     \nabla^2 A_r - \frac{2}{r^2}\,\partial_{\theta}A_\theta 
     - \frac{A_r}{r^2} ,$

$(\nabla^2 \vec{A})_\theta =

     \nabla^2 A_\theta + \frac{2}{r^2}\,\partial_{\theta}A_r 
     - \frac{A_\theta}{r^2} ,$

$(\nabla^2 \vec{A})_z =

     \nabla^2 A_z .$

$\bnabla \cdot \vec{A} = (\frac1{r} + \partial_{r}) A_r

     + \frac1{r}\,\partial_{\theta}A_\theta + \partial_{z} A_z .$

$(\bnabla \wedge \vec{A})_r =

     \frac1{r}\,\partial_{\theta} A_z - \partial_{z} A_\theta ,$

$(\bnabla \wedge \vec{A})_\theta =

     \partial_{z} A_r - \partial_{r} A_z ,$

$(\bnabla \wedge \vec{A})_z =

     (\frac1{r}+\partial_{r}) A_\theta - \frac1{r}\,\partial_{\theta} A_r .$

$(\vec{A}\cdot\bnabla\,\vec{B})_r =

     A_r\,\partial_{r}B_r + \frac{A_\theta}{r}\,\partial_{\theta}B_r 
     + A_z\,\partial_{z}B_r - \frac{A_\theta B_\theta}{r} ,$

$(\vec{A}\cdot\bnabla\,\vec{B})_\theta =

     A_r\,\partial_{r}B_\theta + \frac{A_\theta}{r}\,\partial_{\theta}B_\theta 
     + A_z\,\partial_{z}B_\theta + \frac{A_\theta B_r}{r} ,$

$(\vec{A}\cdot\bnabla\,\vec{B})_z =

     A_r\,\partial_{r}B_z + \frac{A_\theta}{r}\,\partial_{\theta}B_z
     + A_z\,\partial_{z}B_z .$

Toroidal-poloidal decomposition, axial:

$\begin{eqnarray*}

     &
     \vec{A} = \bnabla \wedge(\vechat{z}\psi) + \bnabla \wedge\bnabla \wedge(\vechat{z}\phi) ,
     & \\ &
     \psi=\psi(r,\theta,z), \quad
     \phi=\phi(r,\theta,z).
     \nonumber &
  \end{eqnarray*}$

$\nabla^2_h f =

     (\frac1{r}\,\partial_{r}+\partial_{rr}) f + \frac1{r^2}\,\partial_{\theta\theta} f .$

$\vec{A}_r =

     \frac1{r}\,\partial_{\theta} \psi + \partial_{rz} \phi ,$

$\vec{A}_\theta =

     -\partial_{r} \psi + \frac1{r}\,\partial_{\theta z} \phi ,$

$\vec{A}_z =

     - \nabla^2_h \phi .$

Toroidal-poloidal decomposition, radial:

$\begin{eqnarray*}

     &
     \vec{A} = \psi_0\,\vechat{\theta} + \phi_0\,\vechat{z}
     + \bnabla \wedge(\vec{r}\psi) + \bnabla \wedge\bnabla \wedge(\vec{r}\phi) ,
     & \\ &
     \psi_0=\psi_0(r), \quad
     \phi_0=\phi_0(r), \quad
     \psi=\psi(r,\theta,z), \quad
     \phi=\phi(r,\theta,z).
     \nonumber &
  \end{eqnarray*}$

$\nabla^2_c f = \frac1{r^2}\,\partial_{\theta\theta} f + \partial_{zz} f .$

$\vec{A}_r =

     -r \nabla^2_c \phi ,$

$\vec{A}_\theta =

     \psi_0 + r \partial_{z} \psi + \partial_{r\theta} \phi ,$

$\vec{A}_z =

     \phi_0 - \partial_{\theta} \psi + (2+r\partial_{r})\partial_{z} \phi .$