[ale] high performance computing

Fri Jul 27 16:07:57 EDT 2012

From: "Jeff Layton" <laytonjb at att.net>
To: "Atlanta Linux Enthusiasts" <ale at ale.org>
>
> The follow-up question is whether the FFT's are done locally
> or if they are using an MPI based FFT?
>
> However, I think as a starting point, you'll want compute nodes
> that have reasonably fast processors, lots of cache (as Jim
> pointed out) but you also needs tons of memory BW per core.
> FFT's love memory BW!!
>
> If the FFT's themselves are parallelized, then you will definitely
> need InfiniBand. FFT's each networks for breakfast (in fact there
> was a proposal from John Gustafson at Intel to make a 3D MPI
> FFT the new benchmark for HPC since it pushed systems so
> hard).

I sent the PI your questions.  Here are his answers (somewhat abbreviated 
and w/o personal info).

1. 2D FFT's? 3D FFT's?

Both.  Probably 3D more often then 2D.  But I am working on code right
now that would always be 2code (never 3D).

2. Is the code parallelized via MPI or OpenMP or both?

We have never bothered to explicitly parallelize our code.  We have been 
using the built-in parallelization in  calls to FFTW.

3. Is the code written with CUDA?

No.

4. How many cores or processes are used per run?

We need to have the capability to use at least 64 cores per run, maybe 128 
or 256 if possible within our budget.

5. Which compilers do you use or like?

I think ifort is everybody's favorite.  I use the gnu g95 compiler 
sometimes, but I think it produces slower object modules than ifort.

6. How large are the input/output files?

I create 10 Gb of output data from a serial run on my  desktop iMac 
(although it has typically been closer to 1 Gb per run since I have limited 
disk space here).  So if I had a big parallel run on 64 or more cores, I can 
imagine I could be creating 100 Gb of output
data pretty  easily and maybe even 1 Tb or more.