Hey everyone,
I have build ALPS with openmpi and acml. Trying to run an exact diagonalization with 4 processors in parallel by calling
mpirun -np 4 ~/alps/bin/sparsediag --mpi --write-xml param_test.in.xml
succeeds in the sense that sparsediag prints:
Creating a new simulation: 1 on 4 processes All processes have been assigned Starting Lanczos Finished Lanczos Checking if Simulation 1 is finished: Finished Halted Simulation 1 Checkpointing Simulation 1 Finished with all tasks.
However, there seems to be no speedup at all with respect to a calculation with a single process. The runtime is almost exactly the same and when I check "top" only one process appears to be working while the other three are using 0% CPU time. (All processes were started on the same multicore machine.) What could be the reason? Should I build with different libraries or set specific environment variables?
Some potentially relevant lines form CMakeCache.txt: ALPS_ENABLE_MPI:BOOL=ON ALPS_ENABLE_OPENMP:BOOL=OFF MPIEXEC_MAX_NUMPROCS:STRING=2 (Why? There are certainly more than two processors available.) MPI_CXX_COMPILER:FILEPATH=/opt/MPI/openmpi-1.5.3/linux/gcc/bin/mpicxx
best regards, Matt
Hi Matt,
In ALPS 2.0, the exact diagonalization codes do not make use of MPI. Such features are planned for ALPS 3 or 4. MPI parallelization only helps for QMC simulations so far and for large parameter scans where you have more different parameter sets than nodes.
Matthias
On 27 Dec 2011, at 18:34, Matt Hurst wrote:
Hey everyone,
I have build ALPS with openmpi and acml. Trying to run an exact diagonalization with 4 processors in parallel by calling
mpirun -np 4 ~/alps/bin/sparsediag --mpi --write-xml param_test.in.xml
succeeds in the sense that sparsediag prints:
Creating a new simulation: 1 on 4 processes All processes have been assigned Starting Lanczos Finished Lanczos Checking if Simulation 1 is finished: Finished Halted Simulation 1 Checkpointing Simulation 1 Finished with all tasks.
However, there seems to be no speedup at all with respect to a calculation with a single process. The runtime is almost exactly the same and when I check "top" only one process appears to be working while the other three are using 0% CPU time. (All processes were started on the same multicore machine.) What could be the reason? Should I build with different libraries or set specific environment variables?
Some potentially relevant lines form CMakeCache.txt: ALPS_ENABLE_MPI:BOOL=ON ALPS_ENABLE_OPENMP:BOOL=OFF MPIEXEC_MAX_NUMPROCS:STRING=2 (Why? There are certainly more than two processors available.) MPI_CXX_COMPILER:FILEPATH=/opt/MPI/openmpi-1.5.3/linux/gcc/bin/mpicxx
best regards, Matt
comp-phys-alps-users@lists.phys.ethz.ch