I'm attempting to execute tutorial3a.py in the mc-03-magnetization directory using MPI. I'm using a cluster running Ubuntu Server 12.04 LTS and using the SLURM queue manager.
I have run it successfully in serial by submitting it to nodes via SLURM. I have also successfully run it in serial without SLURM.
When running in serial, I find the following line of code inside the .py file.
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True)
According to the ALPS tutorial webpage, the simulation can be run in parallel by simply changing the line above to
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True,MPI=4)
All I have done is appended "MPI=4" as an argument. This generates the following output when run with or without SLURM:
parsing task files ...
Creating a new simulation: 1 on 1 processes
Created run 1 locally
All processes have been assigned
NUMBER_OF_WORMS_PER_SWEEP: 39
Checking if Simulation 1 is finished: Finished
Halted Simulation 1
WARNING: Unclosed tag: SIMULATION!
boost::filesystem::rename: No such file or directory: "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5.bak", "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5"
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
Does anyone have any suggestions regarding this issue? I have tried both defining and not defining the number of cores for SLURM to allocate, but haven't had any luck.
Thanks!
Chad Garland