SMP code development making progress

Peter has been making great progress with the SMP client. We haven’t been talking about SMP client development much, but there has been a lot going on behind the scenes. We’re in particular trying to smooth out the Windows SMP client and to make a client which would seemlessly run on any number of SMP cores (eg for 8-core, 16-core, etc).

We’ve also updated the SMP FAQ to include some more info regarding development. I’ll copy that below for those who haven’t seen it yet. It’s important for me to stress that the SMP core has been a real boon to our science (with one 4-core machine running SMP much more scientifically useful than perhaps even 10 regular FAH clients), so we are putting SMP development at a high priority.

Why use MPI? Why not threads?
None of our engines are written to be thread-safe or multi-threaded.
The only parallelizable codes (Gromacs and AMBER) both use MPI. Making
Gromacs use only threads for paralellization isn’t possible right now
(we talk with the Gromacs developers frequently on this issue), so MPI
is the only solution.

How well does MPI work?
The short answer is pretty
well on Linux and OSX and not so well on Windows. MPI was originally
delveloped on UNIX, so this is not a surprise (and it’s a great feat in
many ways for it to even run on Windows). The Windows specific quirks
we’re seeing are due to MPI-Windows interaction, and we’re trying to
hunt them down, as well as try out other MPI possibilities.

Why lock to four processes?
Gromacs in all release
versions currently breaks up code to set up calculations and those to
run them and the number of SMP procs is decided at setup (Grompp) not
running (mdrun). MDRUN is the code running in the FAH core, so it has
to have a fixed number of SMP processes. However, this issue has now
been resolved and we are working on a new core (A2) which will allow a
variable number of SMP processes, depending on what’s available in the
hardware (eg 8 processes on 8 core boxes).

Isn’t it needlessly complex to use MPI?
Unfortunately, there aren’t other options right now (see the above).

Isn’t MPI really meant for clustering computers together?
Yes
and no. It originally started that way, but with multi-cpu/multi-core
boxes, it has become a natural solution there too (as one can code for
MPI and run on both architectures).

Does that mean that FAH could support multi-box clusters?
That’s on our mind, but we want to try to get SMP working smoothly before going to far in that direction.